c# - Linq performance: should I first use `where` or `select` -


i have large list in memory, class has 20 properties.

i'd filter list based on 1 property, particular task need list of property. query like:

data.select(x => x.field).where(x => x == "desired value").tolist() 

which 1 gives me better performance, using select first, or using where?

data.where(x => x.field == "desired value").select(x => x.field).tolist() 

please let me know if related data type i'm keeping data in memory, or field's type. please note need these objects other tasks too, can't filter them in first place , before loading them memory.

which 1 gives me better performance, using select first, or using where.

where first approach more performant, since filters collection first, , executes select filtered values only.

mathematically speaking, where-first approach takes n + n' operations, n' number of collection items fall under where condition.
so, takes n + 0 = n operations @ minimum (if no items pass where condition) , n + n = 2 * n operations @ maximum (if items pass condition).

at same time, select first approach take 2 * n operations, since iterates through objects acquire property, , iterates through objects filter them.

benchmark proof

i have completed benchmark prove answer.

results:

condition value: 50 -> select: 88 ms, 10500319 hits select -> where: 137 ms, 20000000 hits  condition value: 500 -> select: 187 ms, 14999212 hits select -> where: 238 ms, 20000000 hits  condition value: 950 -> select: 186 ms, 19500126 hits select -> where: 402 ms, 20000000 hits 

if run benchmark many times, see where -> select approach hits change time time, while select -> where approach takes 2n operations.

ideone demonstration:

https://ideone.com/jwzjlt

code:

class point {     public int x { get; set; }     public int y { get; set; } }  class program {     static void main()     {         var random = new random();         list<point> points = enumerable.range(0, 10000000).select(x => new point { x = random.next(1000), y = random.next(1000) }).tolist();          int conditionvalue = 250;         console.writeline($"condition value: {conditionvalue}");          stopwatch sw = new stopwatch();         sw.start();          int hitcount1 = 0;         var points1 = points.where(x =>         {             hitcount1++;             return x.x < conditionvalue;         }).select(x =>         {             hitcount1++;             return x.y;         }).toarray();          sw.stop();         console.writeline($"where -> select: {sw.elapsedmilliseconds} ms, {hitcount1} hits");          sw.restart();          int hitcount2 = 0;         var points2 = points.select(x =>         {             hitcount2++;             return x.y;         }).where(x =>         {             hitcount2++;             return x < conditionvalue;         }).toarray();          sw.stop();         console.writeline($"select -> where: {sw.elapsedmilliseconds} ms, {hitcount2} hits");          console.readline();     } } 

related questions

these questions can interesting you. not related select , where, linq order performance:

does order of linq functions matter?
order of linq extension methods not affect performance?


Comments

Popular posts from this blog

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -