Although this blog is going to primarily focus on ALM [Application Lifecycle Management] topics, the reality is that without an application that is well designed and implemented, all of the “management” in the world is not going to yield a successful outcome.
This series will focus on specific techniques that address recurring issues that are found in certain classes of applications. Many of these will be performance related, so examining the adage “Premature Optimization is the Root of all evil” is very relevant.
The actual statement written by Donald Knuth was:
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” – Computing Surveys, Vol. 6, No 4, December 1974
I had started writing software just over two years before just over two years before this was written (I did not actually encounter the article until nearly three years after it was written) on a DEC PDP-8 with 16 K (that is 0.0000152587890625 Gig for modernists) of real (ferromagnetic) core memory. Even with this limited amount of memory it was responsible for administrative tasks of a 6,000 student school district. Combined with a clock speed that was less than 1MHz, writing the smallest, fastest possible code was a requirement. These days, most of my actual coding work is on projects that are extremely demanding in some fashion, often in terms of raw performance and I consistently find myself dealing with “our opportunities in that critical 3%”.
The first step in looking at performance is to have a specification. Without a documented specification it is impossible to say that something is “too slow” and therefore a candidate for performance optimization. The second step is to perform detailed measurements of items that fail (or are dangerously close to) these specifications. In many cases, a focusing on a very small part of the operation under consideration will yield the desired benefits. In other cases, a redesign may be necessary in order to have a robust, maintainable codebase that meets the specification.
As an example of the former case; Last year my firm was contracted by a major financial company who were having problems with application performance. Historically they had programmed in C++, but had recently moved to C#. The codebase was well designed with SOLID principles applied throughout, the algorithm implementation was straight forward, and although there were some general issues with the code, the “killer” problem was the performance of this one algorithm.
When I first arrived, the team was nearly evenly divided into three camps:
I asked them if they had profiled the code, and they answered yes. I then asked to see how they profiled the code and they showed my the following:
Stopwatch sw = new Stopwatch();
int measurement = sw.ElapsedMilliseconds;
They had not done ANY actual profiling of where the time was being spend within the various parts of the code. I fired up my favorite profiler (Red-Gate ANTs 6.0), and within a few minutes identified that replacing elements in a generic Dictionary<K,V> collection was responsible for a large amount of the time. I then did some research (aka I asked questions) and found out that the collection was indeed highly volatile, elements were frequently referenced by their key (although there were actually more Add/Removes’ than Keyed retrieval), but most importantly, the design constrained the collection to never have more than than about 20 entries in the collection!!!!
Because of the size of the collection, and the overhead in maintaining an indexed collection, I made one simple change…to a plain old generic List<T>, with a simple iteration to find the proper element when “keyed access” was required. Not surprisingly the time required to add or replace an element went down dramatically, the times to remove an element went down slightly, and the indexed access was just about the same.
This one simple change, was sufficient to reduce the overall time so that it met the performance specification with a good safety margin. [I only wish some of the other issues on that project were so easily resolved].
The key takeaway is that it really was a small part of the code that was critical, and that their original views on the subject were significantly off-the mark [I am being kind], and had they adopted any of them, they would have been in rather poor shape.
If you have done your due diligence, and determined that there are real performance issues and that specific code is a leading contributor to the problem, then hopefully the techniques I will be posting here will be of help. It is unlikely that any of them will be directly adoptable in their presented form, but my goal is to provide the background so that they can be evaluated and even used as the basis for power programming techniques of your own.
As always, I am looking forward to comments on these items, and suggestions for ways to address specific issues that you are encountering.