Friday, June 10, 2011

Some things don’t change… much

I recently did an impromptu C# code review on a large project that I am playing Development Manager on. Though the code was fairly well written I did noticed that the developer had exclusively used the foreach statement to iterate over arrays and collections, regardless of the contained type or the number of contained items. When I asked him if he understood the performance impact of always using foreach he was not able to tell me.

In June 2003, when I worked on the .Net Common Language Runtime (CLR) Team at Microsoft, I wrote a paper in which I recommended that one use the for statement instead of foreach in performance-sensitive code paths. In early versions of C# foreach was not optimized for simple cases where it was being used to iterate over arrays of built-in types. This was primarily due to unnecessary type instantiations, virtual function calls, and boxing and unboxing. This was substantially improved in later versions of C# to the degree that iterating over an array of integers using for and foreach have about the same performance characteristics, though the IL that is generated is subtly different; the foreach IL has a couple of extra instructions.

Note: There are a number of good tools that will show the IL that is generated by the C# compiler for a given function, but the tool I like the most is LINQPad. Though it is marketed as a tool for querying relational databases and web data services using LINQ, it is also an awesome .Net prototyping tool. It also supports F#, which I currently have a major crush on! I consider LINQPad a “must have” tool for every serious .Net developer.

As the use of foreach deviates from the scenarios that have been optimised for, the performance characteristics diverge significantly from those of for. That is not to say that foreach’s performance is always worse; there are cases where using foreach results in better performance; but the bottom line is that the IL that will be generated in each case will potentially be significantly different, which will more than likely result in significantly different performance characteristics.

So what is my point exactly?

I want to make it clear that I am not picking on foreach; I use it and other potentially expensive language constructs all the time in my own code, primarily because they result in more aesthetically pleasing, elegant code (hence my current love affair with F#). The point that I am remaking is that, if you care about the quality of your code, you need to have some idea of what your code is doing under the covers.

I will admit that as the language constructs have become more sophisticated, the addition of LINQ being a good example, it has become more and more time consuming to grok those constructs all the way down to the hardware, but it remains super easy to surround a number of implementations of a function with a high-resolution timer and measure the difference in performance.

Some things don’t change… much.

2 comments:

  1. remembering this discussion I changed some of my code today -- a loop that applied to 100,000+ records. and ... surprise, surprise... 'foreach' actually worked faster than 'for'. :) the only difference, it was vb.net.

    ReplyDelete
  2. I am not familiar with how the for and foreach are implemented in VB.Net but I imagine that they are very similar to the C# implementations. When you say "records" I assume that implies that you are using an ADO.Net DataSet. I would imagine that the differences in performance that you are observing are a function of the differences in the indexer and enumerator implementations for the DataRowCollection.

    ReplyDelete