Home > Rant > C# strikes back

C# strikes back

Disclaimer: The usual. A post about programming in C#. This entry has some harsh criticism which might offend people. No, it is almost only criticism though it applies to (MS) libraries in general not just .NET.

I’m currently trying to make a class for holding data in a non-binary tree structure.  Although I started this as a continuation of my last uploaded project to see how fast searching can get if I keep the look-up tree in memory (similarly how it is done in zkanji), but creating such a tree is pretty trivial. As I do all this to see what C# and “managed” code can give that C++ couldn’t, I thought I’ll do this class “the right way”. That means I have to make a generic class with generic types for keys and values, but also that many of the tree’s properties have to be accessed through built in interfaces so it uses the full potential of the language.

My first decision was to create a tree class (I call it ZTree, hehe), which was a node itself, and each node of the tree were really different objects of the same class. But then again I guessed this is not the C# way, so the second step was to create a separate ZTree and ZTreeNode. This decision wasn’t simply based on that though. If there is a separate tree class, I can hold some values that influence how the whole tree works without “bloating memory” (my second “favorite” phrase from forums after “strongly typed”.)

I have tweeted yesterday that I wasted time with a blog that only stated the obvious. The writer of that blog not only made a separate class for the nodes, he also made a class for the list of nodes to hold sub-nodes inside the nodes. .NET (and many other libraries) do this. They have a class A, a class B holding many objects of class A, a class C listing class B type objects, a class D that needs a class C for listing class A inside class B etc. (To be fair the iostream and many other IO classes used in C++ work in a similar fashion.) I don’t want to get into a fight with anyone about this. I know this makes sense in a purely philosophical way. In object oriented programming the aim is to make code which has few errors and is easy to change. If someone needs to replace a tiny part of some structure without “breaking code”, they just inherit a class to write their own handling for the specific problem. This all is well, but I often have the feeling library creators overdo it for their own perverse satisfaction. Actually I only wanted to write that my tree class won’t be doing this even if it’s not as nice and elegant, because I prefer practical over elegant.

The ZTreeNode class in the current implementation is derived from the generic SortedList<TKey,TValue> because I wanted to have keys for nodes not just indexes, and the keys must be unique and sorted. (Maybe I should call the main class ZSortedTree with ZSortedTreeNodes?) Each node has a value and holds its own sub-nodes directly without an additional layer of sugar. The reason why someone might want to create a separate class for a node list is probably because they want that list to be enumerable through interfaces and the node should do something else through other interfaces. Of course my code wouldn’t resemble anything in C# if I didn’t have my own enumerable stuff, so I created a few properties (namely to enumerate the values of a node, the nodes of a node by index (via IList) and once again the nodes of a node by key (via IDictionary). This latter part is probably unnecessary but once you start creating properties it’s hard to stop.)

This was a really good experience with C# and I have learned a lot, but the work is far from done. I’ll have to replace the SortedList ancestor with maybe a normal list, or an entirely new type derived from object that holds (gasp) arrays. because though a SortedList might be fast, it wasn’t created with inheritance in mind. (Which is strange when the .NET library is intentionally complex so people can customize everything.)

I made a function called “Add” to add new nodes to a parent node. The function should check whether the key for the new node already exists and if it does, stop right there. The generic SortedList has convenient functions to check for the existence of keys, and it also throws an exception if we try to add something with an existing key. For the purpose of my own “Add” function this could work, but maybe as I mostly code in C++, I feel uncomfortable for having to do things twice or wasting memory. Either looking up things twice or creating an object just to throw it away.

If I check for the existence of a key first, SortedList will have to find the place for the key in order to see whether it exists or not. If the key doesn’t exist and I try to insert a new object, SortedList will have to find the position of the key again for the insertion. On the other hand if I simply add a new object and catch the exception, I have created an object for nothing. This might not be such a serious sin in C# where the GC automatically cleans up spilled stuff, and probably nobody would use this class to try to add thousands of existing keys, I still feel bad about creating and throwing away memory for nothing. It occurred to me just now that if I add a null value there won’t be anything wasted, but I see that the “Add” function of SortedList doesn’t return the index for the added null value. Too bad, I could have used that. This is the feeling I have with .NET a lot. It could be good, but…

My other bad feeling with using a SortedList is its incredible number of functions that operate on keys and values. It takes a lot of work to hide all those so people won’t break the structure of the tree. For example placing a node as a sub-node of itself or a child node would create a closed circle and trees are usually famous for not having those. I’ll have to check, maybe all those functions are harmless and decide after that whether I really want to create my own sorted list because I hate reinventing the wheel. (Why do I make a tree then, right?) UPDATE: turns out I’m just not observant enough. Many functions are provided by the interfaces that SortedList implements without really having an effect on the tree structure.

As the closing words for this long and boring rant, here is the reason why I need a sorted tree: I want to hold words in it for the dictionary! The key of each node will work like the labels in my original tree in zkanji which I explained in this old post, and the values for the nodes will each be lists to indexes of words in the SQL database. For that I first need to finish my generic tree, then inherit a tree from that which can manage the words, insert them in the right position etc. I’m curious whether this will result in less memory consumption compared to holding not just the look-up tree but the whole database in memory like in my C++ code. I’m prejudiced that C# eats memory for breakfast. I’ll see.

Categories: Rant Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: