Disclaimer: Only programmers will be able to appreciate this entry. And even that’s not certain.
I wrote a much longer entry yesterday but after some thought I realized that nobody would read it, and there was no point to share all that, because pure C# programmers wouldn’t learn anything new and pure C++ programmers wouldn’t get it.
Last year just before Christmas I installed the free Visual Studio 2010 Express on my machine, namely its C# and C++ compilers. I was curious about VS, as I haven’t seen it for some time and never tried C# before. I can confirm that it’s very easy to start with that language (at least if someone has 12 years experience with C++), because I finished my very first Windows forms application with it, a simple clone of minesweeper, with dynamically created cells and all, in one and a half hours after first starting up VS, without knowing anything about C#. (This time includes reading documentation and hunting for information online, the actual writing didn’t take longer than half an hour at most.)
Of course this didn’t make me an experienced C# programmer, and I didn’t spend enough time getting used to the IDE of VS either. VS has some glitches that became apparent from a little use though. For example compiling in the C# IDE is invoked with the F6 key, while in the C++ IDE it’s F7. I would expect the same keys to execute the same basic functionality in both interfaces. It is a single company after all. The other glitch was with the compiler paths. I can’t remember whether I have changed the install path for the compilers, but the C++ IDE couldn’t find the compiler’s “Cl.exe”. I found nothing about it after searching online, but then checked the compiler macros, namely the macros used for the program paths, and noticed an error in one of them. There was no way to change that path from the program itself, but fortunately the paths are not encoded, and a little work with regedit helped me find and correct the error. (Yep, MS still hasn’t learned not to hard code paths in its programs or installers.)
Every program has bugs, and I haven’t spent enough time with VS to discover more, so I’ll leave it at that. The only thing I could do in such a short time was to compare some features of VS with the ones in Rad Studio that I’m using to write zkanji. The form designer of VS is an obvious rip off of Rad Studio’s (or rather the original in Delphi from 1995), which is not a problem in itself, though I found the designer in VS a bit unresponsive and slow when moving controls around a form. I found some inconveniences as well that work so nicely in Rad Studio but not in Visual Studio for some reason. The most obvious is how some controls can only be moved in VS by grabbing a small icon at the top border of a control. I might be a bit unfair here as I’m used to RS too much, but its solutions feel much better to use.
On the other hand the text editor part of VS is light years ahead of the one in RS. It’s faster, and the IntelliSense, which helps writing the code works much better than the CodeInsight of RS. They both suggest possible names for variables and their functions, the functions’ arguments etc. but VS makes it look like it’s easy, while RS is clearly struggling each time. (This may be due to the huge code of zkanji, but I can’t be sure until I have written something just as big in VS.) The way how VS completes code and formats the lines, so brackets always start at a new line and all those little things are done much better than in RS as well.
Up till now was my impression about Visual Studio, so let me write about C# and .NET as well. I have 12 years of C++ experience, so I can’t help but to compare the two languages. It’s clear, that C# was derived from C++, because they share many keywords and the way loops for example work is very similar as well. Of course C# has foreach and an sql like expression to collect items from lists and arrays, and C++ only gained some of the functionality with C++11 (and we have to see until every compiler starts to support it.) If someone knows how to use C++, C# won’t be difficult at all. Starting with it is just a matter of minutes, not even hours. They usually say that this is one of the strong points of the C# language (I heard something similar about Java a decade ago). In my opinion this is not necessarily a good thing, but not bad in itself either. After spending days reading the C# language reference I think that it is a language that looks simple, and it is easy to start with some simple programs, but using something that has simple rules can get very complicated with time.
I never liked garbage collection so I’m prejudiced, and because of that I’d rather not get into this part of the language, just mention what I think about languages with no pointers in general. When we start hiding pointers the language loses features. Even in C++ you could write and use classes that hide the pointers, and it is probably better for safety as well, but at least they don’t take away the tools. This doesn’t have much to do with C# so I stop here. I’d rather write about my first impressions, if that was the title of this entry.
As I was writing, it’s true that C# seems much easier to learn than C++. The biggest headache I had in C# was with class inheritance and mainly virtual functions and properties. I find it odd and a bad design that you can’t change the visibility of C# class members. Why is that, that when a virtual property was declared protected in a base class, it can’t be made public in a derived class? Or what is even more problematic, if it was public, it can’t be made protected? I found an answer on a site where someone asked the same question. One of the first people to answer said that it would be against the principles of Object Oriented Programming, though I have never heard of such principle. If you know about it, please tell.
Imagine the situation that you create a new listview control, that can only show its items as rows in a detailed list. (Listview is the control that lists files in the right part of Explorer. It can show the files as icons, or it can show details etc.) In C# it is impossible to change the visibility of the public virtual “View” property to protected. With a little research I found the solution, hiding the property with a new one with the same name, that only has a public getter function, while the setter is either private or doesn’t exist. It is strange though that such thing is not directly supported by the language, and you have to use tricks. This seems to be a much worse alternative.
In the short time I tried C#, this was the only real problem I had with it. It’s a true programming language. I don’t know if it is good or bad because I really haven’t used it enough, but against all my prejudice, it seems to be quite usable. I think there are some design flaws in .NET, and it would be difficult to separate C# from .NET, but unless I used it much more, I’d rather not comment on that aspect of the language. Also MS seems to be in love with interfaces which clearly appear in C#, and that in my opinion is an unnecessary solution to a problem that doesn’t exist, but it has little to do with this blog post.
I would like to create a “solution” (as projects are called in VS), that requires me to dig deeper into the depths of both the language and .NET, but I can’t think of anything right now that would be worth it. Suggestions are welcome 😉
Disclaimer: This entry is about programming problems when handling Unicode. One part mentions people who call themselves programmers and try to answer questions at Q&A sites, so it might not be suitable for non-programmers, while many programmers might find it offending. (Though I hope they won’t.)
The conversion of zkanji to Unicode is almost completed, but as a consequence a completely new family of problems has arisen. This is my first time trying to make a program that works on many systems with different Language settings, and although zkanji did work till now as well, its users were not capable of sharing their data between each other. At least if they were using different languages. Because of that I didn’t even have to think about what would happen, if someone got the idea to distribute a custom made dictionary in a language, that is not the one supported by every operating system. (Which would be English, but I’m not even sure about that.)
The problem: As I have written in a previous entry, zkanji uses a special dictionary tree to look up words. Each node in the tree has a label corresponding to the words under the node and the branches starting from the node. These nodes must be in alphabetical order of their label to be able to walk the tree and the labels must be in lowercase. When someone searches for a word, that word can be of mixed case, so the first step is to convert that word into lowercase for comparison with node labels. The problem arises when different languages convert a given uppercase letter to a different one in lowercase. The first problem with this is that when the user searches for a word in the English dictionary, the entered text after converting it to lowercase might not match anything in English. (This could happen for the letter
I in Turkish locales, as it apparently will be converted to an
ı character. – this might not be true. I just repeat what I have read on a Q&A site.) The second problem is the ordering of entered words in newly created user dictionaries. The nodes will probably be ordered in a different order under different systems if their languages differ.
The only solution that seems viable at the moment is to use a conversion function that converts a given uppercase character to the same lowercase one on every single system, without ever looking at the system’s own language. This should be possible as there is supposed to be a default conversion table for Unicode characters somewhere hidden in the system. Unfortunately the documentation and even the c++ language itself is in turmoil when it comes to Unicode. There are several functions for Unicode character conversion, but the documentation about them does not always mention whether those functions use the system’s locale or not. Even when it mentions that, there are contradictory remarks about those function, and when looking for help online, it turns out the way those functions behave might differ in several implementations of the same c++ library.
The only thing I can do in such cases is to use an online search engine to look for a solution that works.
Many years ago search engines were not as “smart” as today. They only returned results that contained the exact words one was looking for, and they couldn’t find forum entries at all, only relatively static sites. In recent years the makers of these search engines realized, that people are not interested in sites like those. They don’t want to find anything about what they entered in the search field, rather they need everything else. So search engines were developed further to make them give us sites that had the search terms inflected differently, divided or written as a single word, or even had similar words, but not those entered, even when they were inserted between quotation marks. The other great innovation of search engines is the inclusion of social activity in the search results. This means that it is almost guaranteed, that when one searches for a technical term, the first 1000 results must be forum messages, tweets or personal sites from social sites.
Thanks to these innovations in search technology, it once again became a challenge to find something useful. This is a good thing, because us programmers love challenge, or we wouldn’t be programming in the first place, right?
Q&A sites (question and answer sites, where anyone can ask a question in a given topic and get answers from people all over the world) is among the results, that today’s search engines return trying to pamper us. Of course I have nothing against sites like those. It’s good that so many experts try to be helpful for free. Or at least I thought for first. Unfortunately as it turned out, most of these “experts” don’t know what they are talking about, and don’t want to admit it either. There have been several questions regarding the conversion of Unicode strings to lowercase, all getting the same answers not regarding the needs of the one asking the question.
General Answer #1: converting to lowercase the same way on every system is impossible, because there are languages where the upper/lowercase version of some characters are different than in others.
General Answer #2: why do you even want to do that? We all speak English!
General Answer #3: use the case conversion of [insert any library or function name here]! It’s using the current locale! You don’t want that? Do it anyway!
General Answer #4: use [insert any library]! It does what you need, converts from anything to anything else, with or without using the locale, it’s perfect in every way! Though I have only heard of it. And it uses [some license not compatible with most others]. And you will have to link another 1MB to your exe just because you needed a single function.
Of course this is not the first case when I had to face such helpful answers after a day’s search online, but I had to rant about it. If one is persistent enough, there are really good, helpful answers out there as well, they just have to be found. But it seems that whenever I need an answer for something, it turns out to be one of the rarest problems on earth… Or it’s so simple that everyone knows the solution but me.