algorithm - What is the fastest way to get k smallest (or largest) elements of array in Java? -
i have array of elements (in example, these integers), compared using custom comparator. in example, simulate comparator defining i smaller j
if , if scores[i] <= scores[j]
.
i have 2 approaches:
- using heap of current k candidates
- using array of current k candidates
i update upper 2 structures in following way:
- heap: methods
priorityqueue.poll
,priorityqueue.offer
, - array: index
top
of worst among top k candidates in array of candidates stored. if newly seen example better element @ indextop
, latter replaced former ,top
updated iterating through k elements of array.
however, when have tested, of approaches faster, found out second. questions are:
- is use of
priorityqueue
suboptimal?- what fastest way compute k smallest elements?
i interested in case, when number of examples can large, number of neighbours relatively small (between 10 , 20).
here code:
public static void main(string[] args) { long kopica, navadno, sortiranje; int numtries = 10000; int numexamples = 1000; int numneighbours = 10; navadno = testsimple(numexamples, numneighbours, numtries); kopica = testheap(numexamples, numneighbours, numtries); sortiranje = testsort(numexamples, numneighbours, numtries, false); system.out.println(string.format("tries: %d examples: %d neighbours: %d\n time heap[ms]: %d\n time simple[ms]: %d", numtries, numexamples, numneighbours, kopica, navadno)); } public static long testheap(int numberexamples, int numberneighbours, int numbertries){ random rnd = new random(123); long starttime = system.currenttimemillis(); for(int iteration = 0; iteration < numbertries; iteration++){ final double[] scores = new double[numberexamples]; for(int = 0; < numberexamples; i++){ scores[i] = rnd.nextdouble(); } priorityqueue<integer> myheap = new priorityqueue(numberneighbours, new comparator<integer>(){ @override public int compare(integer o1, integer o2) { return -double.compare(scores[o1], scores[o2]); } }); int top; for(int = 0; < numberexamples; i++){ if(i < numberneighbours){ myheap.offer(i); } else{ top = myheap.peek(); if(scores[top] > scores[i]){ myheap.poll(); myheap.offer(i); } } } } long endtime = system.currenttimemillis(); return endtime - starttime; } public static long testsimple(int numberexamples, int numberneighbours, int numbertries){ random rnd = new random(123); long starttime = system.currenttimemillis(); for(int iteration = 0; iteration < numbertries; iteration++){ final double[] scores = new double[numberexamples]; for(int = 0; < numberexamples; i++){ scores[i] = rnd.nextdouble(); } int[] candidates = new int[numberneighbours]; int top = 0; for(int = 0; < numberexamples; i++){ if(i < numberneighbours){ candidates[i] = i; if(scores[candidates[top]] < scores[candidates[i]]) top = i; } else{ if(scores[candidates[top]] > scores[i]){ candidates[top] = i; top = 0; for(int j = 1; j < numberneighbours; j++){ if(scores[candidates[top]] < scores[candidates[j]]) top = j; } } } } } long endtime = system.currenttimemillis(); return endtime - starttime; }
this produces following result:
tries: 10000 examples: 1000 neighbours: 10 time heap[ms]: 393 time simple[ms]: 388
first of all, benchmarking method incorrect. measuring input data creation along algorithm performance, , aren't warming jvm before measuring. results code, when tested through jmh:
benchmark mode cnt score error units counterbenchmark.testheap thrpt 2 18103,296 ops/s counterbenchmark.testsimple thrpt 2 59490,384 ops/s
modified benchmark pastebin.
regarding 3x times difference between 2 provided solutions. in terms of big-o notation first algorithm may seem better, in fact big-o notation tells how algorithm in terms of scaling, never tells how fast performs (see question also). , in case scaling not issue, numneighbours
limited 20. in other words big-o notation describes how many ticks of algorithm necessary complete, doesn't limit duration of tick, says tick duration doesn't change when inputs change. , in terms of tick complexity second algorithm surely wins.
what fastest way compute k smallest elements?
i've came next solution believe allows branch prediction job:
@benchmark public void testmodified(blackhole bh) { final double[] scores = sampledata; int[] candidates = new int[numberneighbours]; (int = 0; < numberneighbours; i++) { candidates[i] = i; } // sorting candidates scores[candidates[0]] largest (int = 0; < numberneighbours; i++) { (int j = i+1; j < numberneighbours; j++) { if (scores[candidates[i]] < scores[candidates[j]]) { int temp = candidates[i]; candidates[i] = candidates[j]; candidates[j] = temp; } } } // processing other scores, while keeping candidates array sorted in descending order (int = numberneighbours; < numberexamples; i++) { if (scores[i] > scores[candidates[0]]) { continue; } // moving larger candidates left, keep array sorted int j; // here branch prediction should kick-in (j = 1; j < numberneighbours && scores[i] < scores[candidates[j]]; j++) { candidates[j - 1] = candidates[j]; } // inserting new item candidates[j - 1] = i; } bh.consume(candidates); }
benchmark results (2x times faster current solution):
(10 neighbours) counterbenchmark.testmodified thrpt 2 136492,151 ops/s (20 neighbours) counterbenchmark.testmodified thrpt 2 118395,598 ops/s
others mentioned quickselect, 1 may expect, complexity of algorithm neglects strong sides in case:
@benchmark public void testquickselect(blackhole bh) { final int[] candidates = new int[sampledata.length]; (int = 0; < candidates.length; i++) { candidates[i] = i; } final int[] resultindices = new int[numberneighbours]; int neighbourstoadd = numberneighbours; int left = 0; int right = candidates.length - 1; while (neighbourstoadd > 0) { int partitionindex = partition(candidates, left, right); int smalleritemspartitioned = partitionindex - left; if (smalleritemspartitioned <= neighbourstoadd) { while (left < partitionindex) { resultindices[numberneighbours - neighbourstoadd--] = candidates[left++]; } } else { right = partitionindex - 1; } } bh.consume(resultindices); } private int partition(int[] locations, int left, int right) { final int pivotindex = threadlocalrandom.current().nextint(left, right + 1); final double pivotvalue = sampledata[locations[pivotindex]]; int storeindex = left; (int = left; <= right; i++) { if (sampledata[locations[i]] <= pivotvalue) { final int temp = locations[storeindex]; locations[storeindex] = locations[i]; locations[i] = temp; storeindex++; } } return storeindex; }
benchmark results pretty upsetting in case:
counterbenchmark.testquickselect thrpt 2 11586,761 ops/s
Comments
Post a Comment