Archive

Posts Tagged ‘kanji recognition’

zkanji under-the-hood – handwriting recognition 2.

March 21, 2011 Leave a comment

DISCLAIMER: This entry is about the inner workings of zkanji. It might not be suitable for non-programmers while programmers might find it too trivial. Don’t expect a mathematically elegant explanation either. My primary aim is to make the explanation easy to understand.

I have been working hard in the past 2 weeks to improve the handwriting recognition in zkanji that I ruined a bit in the latest version. (It wasn’t completely broken but it didn’t work as well as before.) In a previous post I have explained how the algorithm works for single strokes. That explanation was simplified compared to the real algorithm at the time, though it is more or less close to how the original form of the algorithm worked when I first made it.

The reasons behind the change was the inclusion of hiragana symbols in the recognition data. Kanji are mainly composed of strokes made up of short straight lines with sharp corners. Because of this, the original approach to compare the angle of specific parts of strokes was straightforward, but when hiragana were added to the mix, this kind of comparison failed half the time. At least it seemed to fail at first. After a little experimenting I found, that I only need to use a little trick when measuring angles of the hiragana strokes and everything would work as usual. Because of the round forms of most hiragana strokes, the only thing I had to do is to decide on artificial corner points inside the strokes, and make the long sections between these corner points give a unified angle. That is, I made the round shapes less round.

After using it for some time I realized that it just doesn’t work as good as I would like it to. The algorithm started confusing kanji more often than before, so I decided to dive into it once more trying to figure out what went wrong, and while I’m at it, I could just as well make it recognize a specific difference between kanji easier too.

When writing in the handwriting recognition I tend to be on the lazy side, leaving out some parts of the strokes, especially the short line endings that look like a hook: 亅
Unfortunately when I did that with kanji like 捕, I often got 埔 at the first place of the candidate list, with the real one nowhere to be found. The stroke count is the same for both kanji, and even the stroke models are the same, especially if I forget the hook from the end of the second stroke. The only difference is that the second stroke is longer for 捕, and crosses the third one, instead of ending just above it. At first I tried to fine-tune the penalty for placing strokes at a unwanted position and length, but playing a bit with numbers made me realize that it just won’t be that easy. If the penalty was too high, even those kanji were unrecognizable that worked perfectly fine before.

The solution to this problem was VERY complicated. At least while figuring out how to do it, but I got it eventually. When computing the difference between strokes, hooks are simply removed first. It’s easy to say “simply”, but I also had to find a good way to find the hooks. Currently those parts are considered hooks that are not very long in absolute and relative length: not longer than 2000 units (each kanji fits in a 10000*10000 square) and takes up less than a given percent of the stroke itself.

I can imagine that even what I wrote up to this point sounds a “bit” complex. I don’t want to complicate it even more so I won’t describe my algorithm in greater detail, and will leave out all the mistakes I made while finding the solution. Let’s just say that there were a lot.

By the way there was a bug in the handwriting recognizer that made recognizing hiragana more difficult at first and messed with the recognition of normal kanji too. It had nothing to do with my trick to recognize hiragana easier. That was fixed as well of course.

 

So there will be a new version very soon that will include:

  • A better handwriting recognition engine (the word “engine” sounds so professional and mysterious that I just couldn’t leave it out).
  • Kanji reading test at the end of long-term study sessions. (With settings to control what readings we want to test and under what condition.)
  • Click on the ON reading of kanji in the kanji information window to filter by that reading in the kanji list.
  • Kanji stroke order diagram and recognition data for all 6355 kanji in the JIS X 0208-1990 character set.
  • Smaller fixes that are required for every worthy update.

Won’t be included:

  • Undo in the long-term study test. I was thinking of implementing it, but after trying to understand my own code I realized that it wouldn’t be so easy as I first thought.
  • Kanji writing test. (Actually this is something I really want to do, but not for the next version yet.)
  • A mascot character for zkanji, even though I really want it! But at the moment I can’t think of one.

v0.571 – and an extra experimental feature.

February 12, 2011 Comments off

Another update. This time I’m fixing some problems the previous update caused to users who create their own dictionaries. I would recommend updating even if you haven’t started a custom dictionary (majority of users), but if you have v0.57 already, you will only need to download the “executable only” archive of v0.571 (at the bottom of the zkanji download page).

I will now have to think about my next step. The decision is always difficult. I could start working on something new everyone benefits from, make some small feature that a few users have needed since ages now, or start creating a help file with pretty images, because a picture is worth 1000 words – or something like that.

Before I forget, if you like experimental stuff there is something for you! Close zkanji. I know that it’s always running. Open the zkanji.ini file in a plain text editor. Please don’t use SomeOffice or other monster programs because they can ruin stuff! Plain old notepad will do. Add the following two lines at the top of the ini file:

[Debug]
debug=1

Save the ini file. (This why I prefer ini files over any kind of registry.)

Now start up zkanji and select a single kanji you like. Press Ctrl-B or click “BETA Test BETA” in the Tools menu. Voila, you can test your knowledge of the kanij strokes and stroke order!

As I said this is just experimental at the moment, so the kanji testing window is plain and ugly and there are controls you won’t know what they are for. You can uncheck “Show debug data” if you have the courage in you and select “Next stroke hint” if you need to know where to place the next stroke. There is nothing else to it. This test doesn’t record anything, doesn’t tell you what you did wrong and in general doesn’t do anything except for showing a red X or a green circle, depending on how your stroke turned out. If the stroke wasn’t correct, you have to draw it again.

The algorithm used in this experimental test is, as the name suggests, experimental too, so if you feel that it needs fine-tuning it’s because it needs fine-tuning. Unless I receive feedback, I won’t be able to fix it or do anything with it. It might not even get released in a nice and clean user friendly window in that case. So please, send your suggestions!

Some nice people I have shown this already told me, that the algorithm is too strict, and if they made some strokes that they felt should be passed as acceptable but was still a bit “non-standard”, the algorithm didn’t accept it. BUT. (There is always one, right?) I’ve never intended this test to accept so loose kanji like the working handwriting recognition window would. Why is that? Because this thing is made for seeing how accurately can a student reproduce a kanji. Please only complain, if you tried your best, and still can’t draw kanji. That best can be anything, I know that it is difficult to write with a mouse, so don’t be shy!

zkanji under-the-hood – handwriting recognition

January 9, 2011 21 comments

DISCLAIMER: This entry is about the inner workings of zkanji. It might not be suitable for non-programmers while programmers might find it too trivial. Don’t expect a mathematically elegant explanation either. My primary aim is to make the explanation easy to understand.

The IME (Input Method Editor) pad in Windows allows its users to write Japanese characters with the mouse, and as they add stroke after stroke, it offers a list of character candidates that resemble the drawn image. Any text editor or input box that’s prepared to use the IME can accept these characters. Zkanji is not one of them for several reasons. When I started writing it, I didn’t have any idea about how the IME works, but most of all I didn’t like the idea that I, as a user, would have to manually switch between English and Japanese when searching in the dictionary. At first I only wrote an input box where anything typed in turns into kana and which accepts Japanese characters from the clipboard. Because the only requirement for this is to have one or two Japanese fonts installed, zkanji could be run on most systems, even those that have no IME installed. (For some reason many people don’t have it.) I didn’t want to add handwriting recognition, because I thought that it would be very difficult to do and a lot of work that’s not worth it, but as zkanji evolved I wanted some new challenge that’s different from what I was doing.

When I made up my mind, I started looking for a character recognition algorithm for Japanese kanji. You would think these algorithms are easy to find, because on the internet everything is free and everyone is eager to help others. After days of fruitless search I finally found an entry in a blog that gave me the solution. I could not understand it though (at least not fully), but it led me to the Levenshtein distance which I’m grateful for. Understanding that, and why it works was the first step towards writing the handwriting recognition. I won’t explain the Levenshtein distance, it’s quite straightforward in that wiki article.

My algorithm is very similar to Levenshtein’s with changes proposed by the above mentioned blog (which also just quotes some other algorithm, that I was too lazy to look up). Its principle is that we build a matrix from two sequences. The columns and rows represent elements of the two sequences, where every element of the matrix is the numerical distance between the corresponding two elements of the sequences, plus the smallest neighboring element from the previous row and/or column. The very last element of the matrix gives us a number that we can use as the shortest relative distance of the two sequences. After this is done the algorithm is pretty easy. We just go through every stroke of every kanji, sort their distances from the mouse drawn strokes, and we have a list of kanji in the order of their distance from our writing. This is what the blog suggested, and that’s exactly what I tried to implement.

The modified algorithm for Levenshtein distance is simple. Let me show you with pseudo-code:

double compare(SEQUENCE a, SEQUENCE b)
{
  int n = a.length;
  int m = b.length;
  // Create a n*m matrix of floating point numbers.
  double matrix[n][m]; 

  loop x from 1 to n {
    loop y from 1 to m {
      // Set the cell of the matrix at x,y to the distance of sequence points.
      matrix[x][y] = distance(a.pointarray[x],b.pointarray[y]);

      // Add the smallest neighboring cell's value to the distance.
      if (x > 1 and y > 1)
        matrix[x][y] = matrix[x][y] +
                       lowest(matrix[x-1][y-1],matrix[x-1][y],matrix[x][y-1]);
      else if (y == 1 and x > 1)
        matrix[x][y] = matrix[x][y] + matrix[x-1][y];
      else if (x == 1 and y > 1)
        matrix[x][y] = matrix[x][y] + matrix[x][y-1];
    }
  }

  // The last element of the matrix is the shortest relative distance.
  return matrix[n][m];
}

Basically this is all that I got from the sources I mentioned and I had to invent everything else. The most difficult part was building the two sequences from the stroke order database and the stroke the user draws with the mouse for comparison. All strokes are first smoothed and the points of little importance removed. This is done relatively simply so I won’t explain the method.

The author of the blog suggested the coordinates of the points in the strokes for making the sequences, since the distance between two points are very easy to calculate. I tried this approach at first, but soon found that it’s not flexible enough.

Figure 1.: A stroke from the database. Figure 2.: Mouse-drawn stroke.

After cleaning them up, the data left from the two strokes are the points alone (the big red dots). Getting these points is a lot trickier, because I have to take into account curves that don’t have end points in-between, but let’s say we have it all done correctly. Because drawing with the mouse can result in quite ugly strokes, I have to recognize these two as similar. The distance would be pretty big if I just took the points and tried to calculate a distance, so I resized both strokes into 10000*10000 unit squares:

Figure 3.: The original strokes resized and placed on each other.

This worked relatively well, but you can see that if the user’s own stroke was flatter than the original, the calculated distance would still be pretty big. So much, that in many cases the algorithm found smaller distance between strokes that were very different instead of being similar, just because their size fit the drawn stroke better. After this I experimented quite a lot with many things, until I ended up using “model strokes”:

Figure 4.: Model strokes in zkanji used for handwriting recognition.

I made these in a text editor as a comma separated list of coordinate sequences. I had experimented a lot until I found the right shapes that were recognized best. Of course this in itself did not solve the problem of different sizes, so I made a single segment of each model stroke “elastic”. When I resized these strokes to the exact size of the user drawn stroke, the elastic segment didn’t hold its shape. It became shorter or longer if it didn’t fit the user drawn stroke exactly:

Figure 5.: Resized model stroke with an elastic segment and the user drawn stroke.

This finally worked satisfactorily, but it was very complicated and still not as precise as I wanted. So I finally had enough, scraped everything I had done so far (I kept the stroke models but removed the elastic segment) and rewrote the whole recognition engine in a single day, which is still used in the latest version of zkanji.  I only had to fine-tune it a bit to work better with the new hiragana stroke models. Yes, most of what I wrote until now was just a little introduction to the final algorithm.

After all these experiments I realized that the less precise the comparison is, the better. All this time I wanted it to be precise instead of the direct opposite, so I tried everything to move the stroke model points as close to the user drawn ones as possible. In the final version instead of calculating the distance of every point in the two sequences, I found, that calculating their relative angle works much better. The relative angle is not clear in many cases, so I just find the smallest angle between segments (neighboring points of a stroke), taking their direction into account as well. This way the actual size of the strokes and their parts do not matter much, so even the badly drawn mouse strokes can be matched with greater accuracy, yet less precision. The modified Levenshtein’s distance is still used, but I deleted all code so I had to write it again. (Not that it was difficult or anything.)

A number representing the distance between two strokes in itself is not much, because there can be many kanji with the same strokes in either a different order or different position. But since the most difficult part of the algorithm (at least the theory) was finding out how to calculate the distance between two strokes, everything else was “just” work. For the algorithm to be fast I calculated and stored which model strokes make up each kanji in the stroke order database beforehand.

  1. When the user draws a stroke with the mouse, it is checked against the model strokes, storing their distance from the user drawn stroke.
  2. Then the program loops through each kanji and checks which model stroke was their first (or second or whatever) stroke.
  3. The distance of the user drawn stroke and the model stroke used in a kanji is then added to a value. This value is the distance of the user’s handwriting and the kanji. Each kanji has one.
  4. The difference between the relative position of this stroke to the other strokes in the kanji, and the relative position the user drawn stroke to the other user drawn strokes also increase the value.  This does not require any special algorithm. I just add a multiplied constant, depending on which “grid position” each stroke is.
  5. Finally, adding these distances together for each stroke it’s easy to calculate the distance of any kanji and the user drawn one, thus making an order of every kanji in the database.

Write a comment if you want to implement your own algorithm based on mine and need help.

What takes so long? :)

December 16, 2010 4 comments

I guess this kind of thought often appears in the minds of many users, who are not programmers themselves and are waiting for a new version to pop out of nowhere. In order to be able to release the final v0.57 of the program, I have set some difficult and some not so difficult goals for myself. A single new feature sometimes requires solving a hideous but hidden problem that wasn’t obvious at first. Let’s take for example a seemingly simple addition to the long-term study list.

I have filled a few notebooks (made of paper, not plastic) while studying kanji, because I have decided that I not just want to know how to read them, but also how to write them. Because environmentally friendly (also called green, no idea why) things are so popular these days, I thought that letting students practice inside the program without the need to waste valuable paper, ink and money is a good idea. I wouldn’t have thought about this in the first place, if I didn’t already have a good “hand”writing recognition engine built in (made also by me of course). Adding a single button next to the input box and popping up the recognizer window was cake. The problems surfaced soon afterward.

Take for example the word 日々. Everyone studying Japanese should know what 々 is. It is a repetition symbol for kanji, but it is not kanji itself. Thus, the handwriting recognition engine can’t recognize it as it was made solely for kanji. Why was it made like that and why it couldn’t handle anything else is a lot more complicated than to explain it here, so don’t ask. I didn’t want to force anyone to go and hunt for the 々 character elsewhere to copy and paste it into the input box, so I either had to add a fake kanji, which would make everything more painful later, or modify the whole recognition engine to be able to work with other characters too. So I have decided to do the latter. And if I’m doing it anyway, I will have to include all kana characters too, because typing a word with the keyboard and drawing with the mouse simultaneously is very difficult when you should type and draw 60 of them one after the other.

I can tell that modifying the engine wasn’t easy. It wasn’t too difficult either, just took a lot of time.

I wouldn’t have posted about this all if the problems were to stop here. Unfortunately what comes after this will be even more tiresome. I will have to add all the hiragana and katakana characters, and also all the kurikaeshi (kanji/kana repetition) symbols one by one to the stroke order database. And if my plans become reality and zkanji will handle words that have all kind of symbols and Latin characters in their written form I will probably have to create the stroke order drawing of all the full-width Latin characters and numbers too.

I only want to finish the kana and kurikaeshi symbols till the next release, but it will still take some time.

The compulsory screenshot:

Most users will probably never find this window in any of their menus nor will accidentally open it. Unless you add a [Debug] section in the zkanji ini file and a debug=1 value under it, the stroke order diagram editor won’t show up. There is currently no way to share single diagrams among users, so even if someone wanted to help me by creating the remaining 1200 S.O.D. (5100 is already done!) plus all the kana, they couldn’t send them to me. (Well, they could, and I could modify the program to get the required data from the received file, but that’s more work than how much I would save with it. (Probably not, but I don’t need the remaining 1200 S.O.D. right now and most people don’t either.))