Home > Under-the-hood > Data handling in zkanji mini-series, Part III.

Data handling in zkanji mini-series, Part III.

In the past two entries I described what is in the data files included with the program. In this part let me write a bit about data that is generated by the user, which must be saved and restored.

Since Vista, files cannot be created in some folders by running programs, unless they are given administrator privileges or run by an administrator. For example the Program Files folder is one such location. Unless zkanji is “installed” in such a folder, it keeps user data files in the “data” folder which is next to the executable. Otherwise user data files are saved in the user’s document folder. There can be 2 user files for each user dictionary. Though if you are only using the English dictionary, there is a single file only, as there is no dictionary data to be saved.

As I wrote in the previous entry, user made dictionaries are saved in exactly the same format as the main dictionary data file, but the kanji data, which stay the same for all languages (so everything apart from the meaning of a kanji) are not written. There is no JLPT data stored in these files either. The other file saved is the group / study progress file with the .zkd extension. User dictionary is unnecessary for English, but the group file is created even in that case. The group file obviously stores which kanji are moved to kanji groups, and which words are moved to word groups. It must also contain study progress, which is mainly a list of words and their standing in some study group or the long-term study list.

It is less obvious how a word or its identifier is saved and loaded in groups and the study list. One possibility would be to store a unique index number for each word (and probably meaning where it makes sense) that gives the word’s position in the dictionary, but unfortunately this approach wouldn’t work. With each update the English dictionary is also updated. I have no control over how that is done, and unfortunately when the JMDict data changes, it is very common that the index of words change as well. If I just saved the user data with an index number, once the program and its dictionary is updated, most words in word groups and study groups would not be the ones that should be there. The best workaround I could find was saving both the kanji and the kana form of every single word that was added to a group or to a study list. This can increase the size of user files considerably, but what is more important, influences loading times, loading huge user data files much slowly than it would otherwise be necessary. Even if you keep zkanji on an SSD drive, it’s not reading the file to memory which is slow, but looking up every word in the user data in the dictionary, to find their current indexes. This is a pity even more, since updates are rare nowadays (sorry) and indexes only change if the dictionary changes as well. The situation is even worse though, because the meaning of words often change in the dictionary as well. What was the first meaning could be the third in the next release, or some meaning might be split to several meanings. Unfortunately I couldn’t find any solution to this problem (until now).

Once I release the next zkanji, what I wrote in this entry will be out of date. The next version, which is mostly already done for more than 6 months now, does things differently and more reliably, with the price of taking up a lot more disk space. This “lot more” is in the tens of megabytes, an amount which in my opinion is nothing to fret about. In the next part I’ll try to explain the changes. I have already written about what the user will see when the dictionary is updated, but this time I will explain how it concerns data files.

Categories: Under-the-hood Tags: ,
  1. himselfv
    February 5, 2013 at 8:37 am

    > Unless zkanji is “installed” in such a folder, it keeps user data files in the “data” folder which is next to the executable.

    I wonder if you keep that preference in some configuration file or just check for if the current folder is protected (or maybe if it’s just the Program Files). If latter, how do you handle the border cases? Running from Program Files while still having user files (non-writeable) in the same dir, or running from a writeable non-Program Files folder while not having user files here but having them in the user data folder…
    If you just check for the Program Files, what about installing the app standalone into some other folder? “C:\My Programs\zkanji” – nope? Only portable version?

    • February 5, 2013 at 11:31 am

      zkanji is only portable if you downloaded its zip package and uncompressed it to some user writable folder. The installer (zkanji_?_???_Setup.exe) creates registry entries for the uninstaller. Also if you check in the settings of the program to run at system startup, it writes to the system registry, which is non-portable.

      When zkanji starts, it first checks whether the data folder is writable (by trying to write a dummy file there). If it can, it will look no longer, otherwise it will try to find the current user’s “application data” folder in its “user folder”, and writes there.

      In the first case the program does not look further for user data. It’s either in the “data” folder where the executable is, or it gets created there. In the second case the program tries to locate the user files. First in the “roaming” user folder (which can be on another computer) and if unsuccessful the “local” user folder. If none found, the user data is written to one of them, depending which one it can write to (first checking the “roaming” user folder here too). If none is possible zkanji won’t run.

      So to answer your question, the installer makes the program non-portable, and the program can make itself non-portable if you put it to some folder with read-only access or if you make it run at system startup.

  2. himselfv
    February 6, 2013 at 2:56 pm

    Interesting, thank you for the explanation.

    I considered your method (I’m solving the same task), but ultimately went with this: If there’s ini file in the same directory, do as it says (standalone/portable) and don’t try to be smart. This allows to ship pre-configured standalone or portable packages. Also, no matter where you copy your portable version it’ll stay portable and just die if the folder is not writeable. (I think that’s better than silently switching to AppData because the user will be confused: “I copied my portable app to Program Files and now my vocabulary appears empty! Where did my words go?”)

    If there’s no ini file then pop up dialog asking the user to choose mode, and create one.

    • February 6, 2013 at 4:49 pm

      There is no point of uploading a pre-configured version of zkanji, so there can’t be a problem with such packages. I chose to make it work this way when I was making the setup program because it makes the program non-portable anyway. If someone is used to computers enough to be able to handle zip files (many people are afraid to use stuff which are not automatic) they will be sensible enough to put the program to some accessible location. If not, they can still come to complain.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: