CPSC 221H: Data Structures & Algorithms
Programming Assignment #4
Spring 2009
General Guidelines for Programs
-
Your code should be well designed and documented. You WILL be graded
on programming style.
-
If the question is not fully specified, you may need to make
some assumptions. In this case, you must state any assumptions you
make, and justify why they are reasonable.
-
Everyone must turn in their own copy of the assignment. You may consult
outside material or work with others (this is encouraged), but you
must reference your sources (people, books, webpages, etc.).
These must be listed on the cover sheet mentioned below.
-
Be clear and precise. Write neatly and legibly.
Justify all answers, even if not specified in the question.
Use good judgement concerning how detailed to make your answer.
-
Code will be turned in electronically using the
turnin program on CSNet.
-
The report portion of the assignment, if any, follows conventions
for other written assignments in this course.
In particular, all assignments in this course must include
a completed and signed
coverpage
available on the course homepage.
Any assignment turned in without a fully completed and signed
coverpage will receive ZERO POINTS.
Assignment
Due: Friday April 10, 2009 by Midnight.
The code and the report should be turned in both electronically
(using the mechanism described above) and in hardcopy.
The hardcopy report should also include a copy of the code.
The hardcopy code and report should be turned in during lab on Tuesday.
The objective of this assignment is to study how the theoretical
analysis of a variety of sorting algorithms compares with their actual
experimentally measured performance.
The sorting algorithms you will study are:
- (1) Insertion sort
- (2) Heap sort
- (3) Merge sort
- Quick sort with different pivot choices:
- (4) The last element in the sequence
- (5) The median of the first, middle, and last elements in the sequence
- (6) A random element in the sequence
The major emphasis of this assignment will be to analyze the performance
of the algorithms (not on coding the algorithms). You will be given C++
templates which implement most of the above algorithms. You will need
to make a few modifications to test out different strategies. Most of
your time should be spent on designing careful test cases and analyzing
your results in order to draw conclusions regarding the performance of
the various algorithms and to study how the experimental performance
is correllated, or not, with the theoretical performance.
Coding Portion (50 points).
-
The C++ templates of the sorting algorithms are given in
Sort.h.
Most of the above algorithms were implemented excluding quick
sort with median and random pivot choices. You should implement the
two missing versions of quick sort, (5) and (6), in Sort.h.
-
For each of the first four algorithms, (1)-(4), determine
the best, average, and worst-case inputs. Write a program Sort.cpp
to time these four algorithms given their best, average and
worst-case inputs.
Time the quick sort (5) and (6) using the test
cases for quick sort (4).
You should also determine the input sizes and the number
of times you will repeat each experiment.
NOTES:
-
To generate a random number, include cstdlib in your program
and use functions rand() and srand() provided in cstdlib.
-
The four sorting algorithms in Sort.h all take a sequence as input.
QuickSort takes a VectorSequence while the other sorting algorithms
take NodeSequence as input. NodeSequence inherits members and functions
from NodeList, which is a list implemented with doubly linked list.
VectorSequence inherits from ArrayVector, which is a vector implemented
with a growable array. You should be able to reason why the sorts should
use different types of sequences. Both classes NodeSequence and
VectorSequence have constructors to initialize a sequence object
with an array. You should input an array and the array size to the
constructor. The constructor will copy each element in that array
to the sequence.
-
You can determine the minimum input size in your experiment by
observing the value n0 for each algorithm. For example, if the predicted
time of your algorithm is O(n^2), you will divide the running times for
different input sizes by n^2. Typically you can expect a horizontal line
after some input size n0.
Report (50 points).
You will write a brief report that includes theoretical analysis,
a description of your experiments, and discussion of your results.
Minimum requirements for your report include:
-
Introduction.
Describe the objective of this assignment.
-
Theoretical analysis.
Give the best, average and the worst-case inputs of the sorting
algorithms (1)-(4) and analyze their complexity.
-
Description of your experiment setup, which includes but is not limited to
- Machine specification.
-
What is your timing mechanism?
-
How did you generate the best, average and the worst-case inputs
for the algorithms?
-
What input sizes did you test? Why?
-
How many times did you repeat each experiment?
-
Experimental results.
-
Graph the average case, best case and worst case running time
versus the input size for all the six sorting algorithms. Use the cases
you determined for quick sort (4) to run all three versions of quick
sorts (4)-(6).
-
Discussion of your results, which includes but is not limited to:
-
Which of the three versions of quick sort seems to perform the best?
-
Which of these four sorts seems to perform the best (consider
the best version of quick sort)?
-
To what extent do the best, average and worst case analyses
(from class/textbook) of each sort agree with the experimental results?
NOTE:
One way to estimate asymptotic complexity by your experimental results
is to graph the running time with a doubly logarithmic (log-log) plot;
that is, to plot log T(n) versus log n. Then use linear regression (the
method of least squares) to fit the doubly logarithmic curve with a line
and find the slope k and the intercept c of the line. The running time
can be approximated by c n^k; thus the estimated asymptotic complexity
is O(n^k). Input sizes that are doubled each time are recommended, for
example, 100, 200, 400, 800, 1600 and etc. They give equal intervals to
every two adjacent points of the plot.
Here is an archive file including all the files for this assignment,
including some matlab templates to help you prepare plots:
prog4.zip (for UNIX and for MS Windows).