Home > Development, Under-the-hood > Data handling in zkanji mini-series, Part VI.

Data handling in zkanji mini-series, Part VI.

I am back on coding zkanji since yesterday and I’m now much more clear about the difficulties that made me put the project on hold for almost a year. I should read my blog more often, because I completely forgot about my attempt at making the example sentences data independent from the main dictionary. I stopped because the code of the whole program became so complicated that everything depends on everything else and even the smallest change requires if not rewriting but rereading and understanding everything.

For example I got to a point with the example sentences data handling that a new dictionary doesn’t necessarily make old example sentences data unusable. It would work without complaining, but it is still a lie that it does not depend on the original dictionary. When the sentences file is generated, it only allows sentences that have at least one word from the dictionary, in the exact form of the word in the main English dictionary which was loaded in the program at the time. If the example sentences contains something not in the dictionary yet, but a later update adds that word, the sentences panel on the interface won’t reflect that change. This is not a big deal, but any perfectionist would say that it won’t do. I might only be a half perfectionist (is that an oxymoron?) but as it turns out, the JLPT data uses the same structure for linking with words in the dictionary. The new “zdict.zkj” file not only contains data about the kanji, but also a big list of words that have a JLPT level and the level itself. And that list is for the dictionary version at the time of compiling the database. Anything I might add in the future that is in theory independent from the dictionary (that is, more languages could use it) would do the same too, so I have to fix this. Fixing this adds another half megabyte memory usage. I’ll have to get used to this, adding features takes memory. So good bye to my dreams of making a program that won’t grow exponentially. Though in my opinion it is still better to make something that works correctly, than not taking up that extra megabyte and breaking the code altogether.

I want to finish fixing this problem today or tomorrow, and if I succeed, I will have to face the other great obstacle: export/import, because with the new dictionary format it is now broken. (Once that’s out of the way there are a few things I have to tune but nothing serious and you will get a test version.)

(UPDATE: fix done without running it once, testing comes tomorrow!)
(UPDATE2: testing revealed a few bugs that I fixed. There is a small performance problem with loading the example sentences though. I’m not sure whether the new loading time is acceptable on low-end machines.)

Writing the code for export and import is not difficult. I could even say it is pretty easy compared to some other stuff in zkanji. The problem is not the writing, but designing a good file format, and an easy to use interface. Around last April I was getting ready to do it, but after racking my brains for 2 whole days I couldn’t come up with anything useable, and gave up for the time (telling myself I will do it in a week). Why? Because I have no idea what my users need. I would have no use for that feature (except when I have to share the Hungarian-Japanese dictionary I’m working on) so I don’t know what makes sense. I realized fast that exporting and importing everything in every combination is not possible or at least can break the data easily. I imagined a window where you can check what you want to export: groups (kanji and words separately), study group data, long-term study data, words from the dictionary (as the dictionary is editable people would obviously want to share their changes) etc. I also don’t know yet how to handle it when people want to share data from different dictionary versions. Should the exported group data contain the whole word with all meanings and word types or only the indexes? Or even make this selectable creating another heap of problems to solve?

So I had to realize that I need help (and I don’t mean introducing me a good psychologist). What real world application can you think of export/import? I want to design this feature like I usually do with other features, by asking what people would use it for, and not from the programmer point of view, that is, what is the largest set of functionality that is easy to write (and probably hard to use).

With this I close this mini-series, which didn’t have that much to do with data handling, but helped me start working on zkanji again. I’ll have to make up a new post title for future writings…

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: