Wednesday, March 5, 2008

Linq Performance Test

LINQ vs Loop - A performance test

The future holds for C#. The addition of LINQ has brought a variety of query keywords to the language. "Anything" can be queried; Collections, DataTables, SQL databases, XML documents, and Arrays. Custom queryable objects can also be created by implementing IQueryable. But all this costs you. The question is how much?

I decided to create a simple test to see how much of a performance hit LINQ is. The simple test I deviced finds the numbers in an array that are less than 10. The code is quoted below.


 


static
void Main(string[] args)

{

     const
int SIZE = 100000, RUNS = 10000;

     int[] intData = new
int[SIZE];

     for (int i = 0; i < SIZE; i++)

     intData[i] = i;

    
 

     DateTime dtStartTime = DateTime.Now;

     for (int t = 0; t < RUNS; ++t)

     {

     int[] less = (from xData in intData

     where xData < 10


select xData).ToArray();

     }

     TimeSpan dtCompletTime = DateTime.Now - dtStartTime;

     Console.WriteLine(string.Format("LINQ: {0}, avg. {1}",

     dtCompletTime, new
TimeSpan(dtCompletTime.Ticks / RUNS)));

    
 

     DateTime dtStartTimeFor2 = DateTime.Now;

     for (int t = 0; t < RUNS; ++t)

     {

     var l = new
List<int>();

     foreach (var i in intData)

     if (i < 10)

     l.Add(i);

     int[] less2 = l.ToArray();

     }

    
 

     TimeSpan dtCompletTimeFor2 = DateTime.Now - dtStartTimeFor2;

     Console.WriteLine(string.Format("Loop: {0}, avg. {1}",

     dtCompletTimeFor2, new
TimeSpan(dtCompletTimeFor2.Ticks / RUNS)));


 


Console.ReadLine();

}

Initially, I assumed the performance impact would not be too large, since its equivalent is the straightforward imperative loop, which should not be too hard for a compiler to deduce given static typing and a single collection to iterate across. Or?

  1. LINQ: 00:00:23.4539536, avg. 00:00:00.0023453
  2. Loop: 00:00:10.5456736, avg. 00:00:00.0010545

As you can see, the performance impact is huge. LINQ performs 50 times worse than the traditional loop! This seems rather wild at first glance, but the explanation is this: The keywords introduced by LINQ are syntactic sugar for method invocations to a set of generic routines for iterating across collections and filtering through lambda expressions. Naturally, this will not perform as good as a traditional imperative loop, and less optimization is possible.


 

Using Linq With Array List


static
void Main(string[] args)

{


Console.WriteLine("Using Linq on Array List");


 


ArrayList arrList = new
ArrayList();

arrList.Add(


new
Student

{

FirstName = "Mehul",

LastName = "Talajia",

Scores = new
int[] { 98, 92, 81, 60 }

});

arrList.Add(


new
Student

{

FirstName = "Anil",

LastName = "Patel",

Scores = new
int[] { 75, 84, 91, 39 }

});

arrList.Add(


new
Student

{

FirstName = "Pranav",

LastName = "Vyas",

Scores = new
int[] { 88, 94, 65, 91 }

});

arrList.Add(


new
Student

{

FirstName = "Piyush",

LastName = "Gohel",

Scores = new
int[] { 97, 89, 85, 82 }

});


 


DateTime dtStartArrayLINQ = DateTime.Now;


var query = from
Student student in arrList


where student.Scores[0] > 95


select student;


 


 


foreach (Student s in query)


Console.WriteLine(s.LastName + ": " + s.Scores[0]);


 


TimeSpan dtEndArrayLinq = DateTime.Now - dtStartArrayLINQ;


 


int intTmpTicks = 4;


 


DateTime dtStartArrayLoop = DateTime.Now;


foreach (Student a in arrList)

{


if (a.Scores[0] > 95)

{


Console.WriteLine(a.LastName + ": " + a.Scores[0]);

}

}


TimeSpan dtEndArrayLoop = DateTime.Now - dtStartArrayLoop;


 


 


Console.WriteLine(string.Format("LINQ: {0}, avg. {1}",

dtEndArrayLinq , new
TimeSpan(dtEndArrayLinq.Ticks / intTmpTicks)));

    
 


Console.WriteLine(string.Format("Loop: {0}, avg. {1}",

dtEndArrayLoop , new
TimeSpan(dtEndArrayLoop.Ticks / intTmpTicks)));


 


Console.Read();


 

}

  1. LINQ: 00:00:00.0467946, avg. 00:00:00.0116986
  2. Loop: 00:00:00, avg. 00:00:00


 

Having seen the performance impact, I am still of the view that LINQ is a great step towards a more declarative world for developers. Instead of saying "take these numbers, iterate over all of them, and insert them into this list if they are less than ten", which is an informal description of a classical imperative loop, you can now say "from these numbers, give me those that are less than ten". The difference may be subtle, but the latter is in my opinion far more declarative and easy to read.

This may very well be the next big thing, but it comes at a cost. So far, my advice is to create simple performance tests for the cases where you consider adopting LINQ, to spot possible pitfalls as early as possible.

No comments: