c++ - Merging K Sorted Arrays/Vectors Complexity -
while looking problem of merging k sorted contiguous arrays/vectors , how differs in implementation merging k sorted linked lists found 2 relatively easy naive solutions merging k contiguous arrays , nice optimized method based off of pairwise-merging simulates how mergesort() works. 2 naive solutions implemented seem have same complexity, in big randomized test ran seems 1 way more inefficient other.
naive merging
my naive merging method works follows. create output vector<int>
, set first of k
vectors given. merge in second vector, third, , on. since typical merge()
method takes in 2 vectors , returns 1 asymptotically linear in both space , time number of elements in both vectors total complexity o(n + 2n + 3n + ... + kn)
n
average number of elements in each list. since we're adding 1n + 2n + 3n + ... + kn
believe total complexity o(n*k^2)
. consider following code:
vector<int> mergeinefficient(const vector<vector<int> >& multilist) { vector<int> finallist = multilist[0]; (int j = 1; j < multilist.size(); ++j) { finallist = mergelists(multilist[j], finallist); } return finallist; }
naive selection
my second naive solution works follows:
/** * logic behind algorithm simple , inefficient. * want start first values of each of k * vectors, pick smallest value , push our finallist vector. * need looking @ next value of vector took * value don't keep taking same value. vector of vector * iterators used hold our position in each vector. while iterators * not @ .end() of corresponding vector, maintain minvalue * variable initialized int_max, , minvalueindex variable , iterate on * each of k vector iterators , if current iterator not end position * check see if smaller our minvalue. if is, update our minvalue * , set our minvalue index (this later know iterator increment after * iterate through of them). check after our iteration see if minvalue * still equals int_max. if has, iterators @ .end() position, , have * exhausted every vector , can stop iterative on k of them. regarding complexity * of method, iterating on `k` vectors long @ least 1 value has not been * accounted for. since there `nk` values `n` average number of elements in each * list, time complexity = o(nk^2) our other naive method. */ vector<int> mergeinefficientv2(const vector<vector<int> >& multilist) { vector<int> finallist; vector<vector<int>::const_iterator> iterators(multilist.size()); // set iterators beginning of corresponding vectors in multilist (int = 0; < multilist.size(); ++i) iterators[i] = multilist[i].begin(); int k = 0, minvalue, minvalueindex; while (1) { minvalue = int_max; (int = 0; < iterators.size(); ++i){ if (iterators[i] == multilist[i].end()) continue; if (*iterators[i] < minvalue) { minvalue = *iterators[i]; minvalueindex = i; } } iterators[minvalueindex]++; if (minvalue == int_max) break; finallist.push_back(minvalue); } return finallist; }
random simulation
long story short, built simple randomized simulation builds multidimensional vector<vector<int>>
. multidimensional vector starts 2
vectors each of size 2
, , ends 600
vectors each of size 600
. each vector sorted, , sizes of larger container , each child vector increase 2 elements every iteration. time how long takes each algorithm perform this:
clock_t clock_a_start = clock(); finallist = mergeinefficient(multilist); clock_t clock_a_stop = clock(); clock_t clock_b_start = clock(); finallist = mergeinefficientv2(multilist); clock_t clock_b_stop = clock();
i built following plot:
my calculations 2 naive solutions (merging , selecting) both have same time complexity above plot shows them different. @ first rationalized saying there may more overhead in 1 vs other, realized overhead should constant factor , not produce plot following. explanation this? assume complexity analysis wrong?
even if 2 algorithms have same complexity (o(nk^2)
in case) may end having enormously different running times depending upon size of input , 'constant' factors involved.
for example, if algorithm runs in n/1000
time , algorithm runs in 1000n
time, both have same asymptotic complexity shall have different running times 'reasonable' choices of n
.
moreover, there effects caused caching, compiler optimizations etc may change running time significantly.
for case, although calculation of complexities seem correct, in first case, actual running time shall (nk^2 + nk)/2
whereas in second case, running time shall nk^2
. notice division 2
may significant because k
increases nk
term shall negligible.
for third algorithm, can modify naive selection maintaining heap of k
elements containing first elements of k
vectors. selection process shall take o(logk)
time , hence complexity shall reduce o(nklogk)
.
Comments
Post a Comment