[Originally Published Apr 2004 – Updated October 2009]
Everyone “understands” that Microsoft’s .NET and the CLR is a “garbage collector” based environment; but is it really.
First we must establish what is meant be “garbage” in this context. When an object is created there is (typically) one reference by which it can be accessed (the return value of “new”). While the program executes, there may be other references established to the same item; and established references may terminate. When an object can no longer be referenced, it is deemed to be “Garbage”. [note: This is a bit of a simplification but will satisify out needs]
Next we must look at the definition of “collection”, Websters dictionary offers the following:
collection: the act or process of collecting.
collect: to bring together into one body or place.
Now lets look at what happens when a “GC.Collect” occurs…. (For simplicity we will look at generation 0, and ignore the impact of “pinned” objects). The object graph is “walked” starting at the rooted references, and any reachable item that is in Generation 0 is marked. When the walk is complete, the live objects are moved to the Gen1 heap, and the Gen0 heap reset back to the beginning. The result is that the memory occupied by all of the previous Gen0 residents is now available.
This reveals the fundamental problem with calling this process “garbage collection”. Absolutely NOTHING is done with the garbage. Specifically there are no operations which involve moving the garbage so it is “brought together in one place”.
To see what a “real” garbage collection is, consider an anology. In ones house, there are likely to be multiple wastebaskets; one in the kitchen, one in the bathroom, and other scattered throughout the residence. On trash day (or earlier if the Wife has anything to say about the matter), one goes through the residence and collects all of the garbage from multiple locations, places it in one bag, and brings it outside to the rubbish container. The amount of work is dependant on the number of original locations of garbage, and the amount of garbage in each location. The amount of “precious” (non-garbage) item in the house has absolutely no bearing on the process or the effort it will involve.
But when we look at the .NET situation, the exact opposite is true. It is the number of LIVE objects that impacts the performance as these are what must be scanned and moved. It does not matter if there is a single small “garbage” object on the heap, or if there are tens of thousands (of varying sizes). Once the live (precious) objects have been moved out of harms way, it is a single, constant time operation to reset the heap to be ready to get new objects.
This shows that .NET implements a Live Object Preservation pattern, and NOT a grabage collection pattern.
While this entire post may seem like a “symantic quibble”, it has serious ramifications when dealing with .NET architecture/design and implementation. In other environments there is NO overhead (aside from the actual memory) to keeping references to heap based object which will be needed (or even just possibly needed) later. In many cases, the cost of allocating [always higher in a conventional heap than in a CLR heap] and deleting (updating the freelist) far outwieghs the memory utilization issue, and so references are kept for an extended period of time.
When this approach is taken in a .NET application, these live objects represent a performance hit everytime (neglecting some optimizations) that the GC runs – simply because the GC deals with processing live objects. On the otherhand, allocating a (non-large) object in .NET is typically a simply pointer increment, and abandoning it (assuming no finalizer) is a 0 time issue.
Over the past few years, I have been involved with a number of projects where clients were complaining that “.NET was slow” and could not meet their perfomance demands. In the vast majority of cases, this was directly tracked to the implementation not having proper (for .NET) object lifetime management..
When one looks at environments such at C/C++, the conventional/standard implementation (pre C++0x) do not include “garbage collection”. The heap is (typically, and simplified) implemented as a structure containing the “free blocks” of items that were previously deleted. This means that (a pointer to) memory that is not longer in use [i.e. garbage] IS actually MOVED. Each time there is a call to “delete” or “free(…)” there is a synchronous [i.e. it completes before delete/free() returns] collection of information about the garbage that occurs.
In .NET the large object heap [LOH] is used for items which exceed a threshold size [80,000 bytes]. This particular heap IS operated in a manner nearly identical to a C/C++, in that the “live” objects are NOT moved, and it is a set of references to the avilable memory (garbage) that is manipulated.
To see comments associated with the original posting, please visit: http://geekswithblogs.net/TheCPUWizard/archive/2009/09/24/sorry-johnny—there-is-no-garbage-collection-in-.net.aspx