Archive

Posts Tagged ‘updates’

Data handling in zkanji mini-series, Part IV.

February 1, 2013 Leave a comment

I’ve mentioned most of the following before, so I’ll just summarize what considerations made it necessary to change the file format which will be used in the next release of zkanji (hopefully) soon:

  1. Safer dictionary updates: Because the underlying JMDict dictionary changes with every release, words could sometimes disappear from word and study groups without notice. This only happened if the word’s kanji form or kana writing (pronunciation) in the dictionary changed. As I mentioned before, the only way to identify words when loading the dictionary and groups was by those parts. For example in the current JMDict, the word バケツ (baketsu – bucket) has no kanji, but the dictionary at the time of the last program release contained the 馬穴 ateji (kanji selected by pronunciation only) for this word. If I don’t change how the program handles such cases, users could end up with words disappearing from their groups. Also if a word’s meaning is added to a word group and even the order of meanings change, the word group cannot be automatically fixed to reflect that.
  2. English dictionary user changes: In the currently released program it is not possible to change the English definition of words, nor to add new words or to remove existing ones. It was a request by many users to be able to do that, but without the changes I made to handle the previous point, it would have been very difficult to handle dictionary updates. Fortunately the additional work for allowing dictionary changes was nothing compared to that.
  3. Multiple meanings for single word entries in word groups: In the current release when a word is added to a word group, a meaning has to be selected, and only that meaning will be added to the group. This doesn’t seem to have any direct connection to dictionary updates, but if word meanings (the order or number of the meanings) changed compared to an old dictionary, it would have created an even greater problem that is more difficult to handle.

This is all that I could think of right now, but I sometimes remember other features that I could have implemented long ago if zkanji could handle dictionary updates better.

In the next post I’ll write about what changes had to be made to the file formats. Without knowing that, this and the previous entries might seem a bit mysterious. 🙂

Advertisements
Categories: Under-the-hood Tags: ,

When the dictionary is updated…

April 27, 2012 Leave a comment

The words you might have in groups and tests are often changed in the JMDict project, so there should be a way to control the update of the English dictionary. Another reason is that I want to add new features to zkanji that are more sensitive to such changes. I will soon release a beta tester version of the program which starts with a new dialog asking the user to check dictionary changes in the hope, that somebody will look at it and comment. (If nobody does you will get it unchanged, this is a warning :D)

This is the dialog that is shown on startup if the program detects changes that might affect your groups or tests. The items shown in the window are changes that happened in the JMDict project since January. As you can see the word ちゃんと was considerably modified, and if you had it in a group and updated with a previous version of zkanji, you would be in for a surprise, as the “perfectly, property, exactly” definition would have been changed to “diligently, seriously, earnestly, …”, which are not exactly matching meanings.

From the next zkanji you will be able to do the following:

  1. Use copy – This copies the word definitions untouched, overwriting the entry in the updated dictionary, so it will still have the old word definitions.
  2. Remove word data – If you decide that it doesn’t worth the trouble, you can simply throw out anything related to this word from your groups and tests. The new dictionary will keep the updated entry though.
  3. [Meanings that were in groups or tests and need change] and
  4. [Meanings of the same word in the updated dictionary] – You can go through all meanings that need change in 3. and select the corresponding meaning you want in the updated data from 4.
  5. Once you made your choice, click “Next word >>” and your choices are registered.
  6. There is also an “Abort” button (unnumbered on the picture). If you want, you will be able to skip this update and use the old data. But be aware that it will mean that you will keep using the old English dictionary, and this dialog will be shown again when you start the program the next time.

This is fine for words that can be found in the updated dictionary, but in some cases the words are changed in a way that the program cannot find the corresponding entry.

For example the word “bucket” was written as 馬穴 in the original English data. The new dictionary doesn’t have that word with such kanji, only with a written form of バケツ (same as its kana pronunciation). Because zkanji recognizes words by [written form]+[kana pronunciation], it will think that this word is not in the new dictionary, and if this were an older version, it would simply remove all traces of the word from any groups and tests the user added it to. In the next version you will be able to find another word in the dictionary that you think matches closely enough, and then press the “Select” button. Once you do that you will be presented with the previous page of meanings to select their corresponding definitions.

Only those words will be listed here that need user interaction so hopefully there won’t be more than 2-3 words needing update. There are currently 13 in this beta that piled up in 3 months, and I had all N3 marked words in groups, so it is not that much.

I believe that this update is important for future development so much, that once it is released, anyone using zkanji is recommended to download it. Not this one, but the version coming after this won’t run with your old user data! There is a lot of junk code to be thrown out that was in there for compatibility reasons, and I want to get rid of all of them.