c++ - Merging K Sorted Arrays/Vectors Complexity -

while looking problem of merging k sorted contiguous arrays/vectors , how differs in implementation merging k sorted linked lists found 2 relatively easy naive solutions merging k contiguous arrays , nice optimized method based off of pairwise-merging simulates how mergesort() works. 2 naive solutions implemented seem have same complexity, in big randomized test ran seems 1 way more inefficient other.

naive merging

my naive merging method works follows. create output vector<int> , set first of k vectors given. merge in second vector, third, , on. since typical merge() method takes in 2 vectors , returns 1 asymptotically linear in both space , time number of elements in both vectors total complexity o(n + 2n + 3n + ... + kn) n average number of elements in each list. since we're adding 1n + 2n + 3n + ... + kn believe total complexity o(n*k^2). consider following code:

vector<int> mergeinefficient(const vector<vector<int> >& multilist) {   vector<int> finallist = multilist[0];   (int j = 1; j < multilist.size(); ++j) {     finallist = mergelists(multilist[j], finallist);   }    return finallist; }

naive selection

my second naive solution works follows:

/**  * logic behind algorithm simple , inefficient.  * want start first values of each of k  * vectors, pick smallest value , push our finallist vector.  * need looking @ next value of vector took  * value don't keep taking same value. vector of vector  * iterators used hold our position in each vector. while iterators  * not @ .end() of corresponding vector, maintain minvalue  * variable initialized int_max, , minvalueindex variable , iterate on  * each of k vector iterators , if current iterator not end position  * check see if smaller our minvalue. if is, update our minvalue  * , set our minvalue index (this later know iterator increment after  * iterate through of them). check after our iteration see if minvalue  * still equals int_max. if has, iterators @ .end() position, , have  * exhausted every vector , can stop iterative on k of them. regarding complexity  * of method, iterating on `k` vectors long @ least 1 value has not been  * accounted for. since there `nk` values `n` average number of elements in each  * list, time complexity = o(nk^2) our other naive method.  */ vector<int> mergeinefficientv2(const vector<vector<int> >& multilist) {   vector<int> finallist;   vector<vector<int>::const_iterator> iterators(multilist.size());    // set iterators beginning of corresponding vectors in multilist   (int = 0; < multilist.size(); ++i) iterators[i] = multilist[i].begin();    int k = 0, minvalue, minvalueindex;    while (1) {     minvalue = int_max;     (int = 0; < iterators.size(); ++i){       if (iterators[i] == multilist[i].end()) continue;        if (*iterators[i] < minvalue) {         minvalue = *iterators[i];         minvalueindex = i;       }     }      iterators[minvalueindex]++;      if (minvalue == int_max) break;     finallist.push_back(minvalue);   }    return finallist; }

random simulation

long story short, built simple randomized simulation builds multidimensional vector<vector<int>>. multidimensional vector starts 2 vectors each of size 2, , ends 600 vectors each of size 600. each vector sorted, , sizes of larger container , each child vector increase 2 elements every iteration. time how long takes each algorithm perform this:

clock_t clock_a_start = clock(); finallist = mergeinefficient(multilist); clock_t clock_a_stop = clock();  clock_t clock_b_start = clock(); finallist = mergeinefficientv2(multilist); clock_t clock_b_stop = clock();

i built following plot:

plot

my calculations 2 naive solutions (merging , selecting) both have same time complexity above plot shows them different. @ first rationalized saying there may more overhead in 1 vs other, realized overhead should constant factor , not produce plot following. explanation this? assume complexity analysis wrong?

even if 2 algorithms have same complexity (o(nk^2) in case) may end having enormously different running times depending upon size of input , 'constant' factors involved.

for example, if algorithm runs in n/1000 time , algorithm runs in 1000n time, both have same asymptotic complexity shall have different running times 'reasonable' choices of n.

moreover, there effects caused caching, compiler optimizations etc may change running time significantly.

for case, although calculation of complexities seem correct, in first case, actual running time shall (nk^2 + nk)/2 whereas in second case, running time shall nk^2. notice division 2 may significant because k increases nk term shall negligible.

for third algorithm, can modify naive selection maintaining heap of k elements containing first elements of k vectors. selection process shall take o(logk) time , hence complexity shall reduce o(nklogk).

Search This Blog

Facebook Talkie