Posts Tagged ‘backup’

User data backup problems and solution(?)

May 5, 2013 2 comments

Creating a usable and safe backup system is my last aim for the next release, before I go over the user reported bugs and complaints. Just like most other things that seem simple at first glance, this is also not as easy as it looks like.

In past releases, zkanji created a copy of successfully loaded files in the user data folder with the TEMP extension, after loading them. (Thinking about it, isn’t the TEMP extension a bit misleading?) The user had a single safe(?) copy of data files that loaded correctly, or at least which didn’t generate an immediate error. Past backups were overwritten. This solution worked fine in the utopian world in my mind, that is, if errors occurred on load (which is not very likely). Unfortunately there has been a case at least once, when a user only noticed a few days late, that something is not right with his/her data. This situation is obviously not solved with our simple backup.

The obvious solution would be to keep a backup of all user data files for the past few days, or even weeks. I started working on this solution, but there have been a few things that bugged me about it all along. In the new data handling system, users will be able to change their main English dictionary, so a safe copy must be made. The dictionary file is nearly 25 megabytes, and even without the few additional kilobytes of user data, making several backups of this size is not an acceptable solution. As I’m working on a dictionary in a different language, the total size for me would be nearly 35 megabytes. In my experience, at least 2 weeks of backup is necessary to be on the safe side, which equals to 350 megabytes normally, and in my case nearly half a gigabyte! We can probably do better than that.

If someone never changes his or her main dictionary, and the only files to save are groups or study data, not saving unchanged files can keep the size of backups to the minimum. This is seemingly a good solution to the problem, unfortunately this brings the complications to a whole new level.  How can we know that the main data file has not been changed? We could read it and compare it to the unchanged dictionary data (there is a data file which is not touched, but is required for the update system to work). Comparing files is slow, and nobody would want to wait the additional seconds every time zkanji creates new backups. I also thought of comparing file times, but if a user unintentionally changed the main dictionary, and reverted the changes later, the file times would be different while the data is the same. Not to mention the case when the user data is on some central server and files have to be read and written several times over a network. (I know of at least one such case.)

As I have decided not to do any kind of complicated magic that can be slow as well, a compromise is forming in my head. (This is just the current idea which can be rejected in the next second.) Keeping 2-3 backups of each file doesn’t seem to be that much of a burden. If the files are backed up at some longer intervals, for example every 4-5 days, and are not kept for too long, the user can enjoy a relative safety which is relatively cheap. Data loss happens, but this way only a few days worth of data would be lost. If the user only notices some problem a week later, this is still better than losing everything. (In case you have terabytes of space for backups, you will be able to tweak the interval of days and number of backups in the settings.) Safe copies of data would be created once on startup, or if you are the kind of person who doesn’t power off their computers for months, I’m considering checking the running time of the program as well, and creating a copy when the time comes.