New students might not know, but before 2010 there were official lists with approximately 80% of words that might come up in the JLPT test. These lists were published in books, and if you learned all of them (and knew the necessary grammar and had a decent listening comprehension etc.), it wasn’t difficult to achieve a passing score. I’m not trying to say that learning thousands of words is easy, but at least you had an idea what was expected of you. Since 2010 the official lists were abandoned, and with the introduction of a new level (N3, which came in the middle of the previously 4 level system) it became even less obvious what must be studied. Several sites popped up that claimed they have a reworked list of the old JLPT vocabulary for the new system, so students still have some clutches to help them.
This is probably not news for most people coming here, but every post must have some kind of introduction, right? :p
I was thinking about what could be the reason behind the decision not to publish official vocabulary lists, and I think the answer is that the test makers don’t have such lists anymore. And not because they want to hide these lists from test takers. Most proficiency tests for any language don’t have lists either, so this is not surprising, but in that case how do you decide which level you are on? I don’t know the answer, but there is discussion between test makers and teachers, so there must be some guidelines at least.
The next obvious question is whether we can use old JLPT lists for studying and whether sites with the original or updated lists are useful. I think the answer is yes, because even the originally published vocabulary didn’t cover 100% of the words needed to be learned, but it helped. All languages evolve with time, but not that much. In conclusion I believe that learning from available lists still help, though I wouldn’t go as far as to say that they give as much confidence as they did before the change.
After 3 months break I’m working on the JLPT list again. I have to check the definition of the remaining 1600 items, which can be done in a few weeks hopefully. The definitions will mainly come from the dictionary, but I want to shorten the longer ones. Once I’m done, I plan to release the list, though I don’t know what format would be the best.
After some work on N3/N2 word placements I found that it makes no sense to keep the progress on the side bar. I went through a book that was written for N3 and marked most words that I found in it. I don’t agree with some choices but most part it’s OK. I’ll remove the mark from words where I can’t agree and when that’s done, it’ll be a little programming work to set the marked words as N3 (plus some similar words) and the final list will be done. I can hopefully post my N3 kanji list very soon, and then I’ll upload a new zkanji version as well that will not only have N3 kanji but N level for words will be indicated too. (The automatic word selection for the long-term study list will only come after that.)
I haven’t yet decided whether I want to share my version of the JLPT vocabulary list as public domain, but unless someone asks me to use it in their program, it has no importance anyway. (Even less before the list is done.)
UPDATE: The N3 kanji list is now final, but it seems that there are some important words missing from the current JLPT vocabulary list, which are most probably included in the updated JLPT since 2010. These are mainly words that don’t have any kanji, for example インターネット, just to name one that was not part of the old list. The first word missing that I noticed while checking the example test on the official JLPT site was ホテル, but for some reason it was not included in previous vocabulary. Yet it can be in N5.
In case you haven’t noticed, I have added a little progress report on my progress of the JLPT word list on the right side of this blag. It won’t be worth coming back to check the numbers every day though. If you make it a weekly visit, you might see some progress.
I want to finalize a meaning for all words, and decide on most words whether they should be in the N2 or N3 word list. Because I have automatically used the definition of words from the dictionary if their length was less than 45 characters long, there seems to be some great progress already, but the truth is, I can hardly check 100 words daily (no time and no patience). So it will take quite some time still.
I’m making progress in creating an N3 list that can point students in the right direction, but be aware that it won’t be a definitive list, just an “opinion”. I have more or less finished writing the algorithm to create an initial list from the more frequent words, and I’m about to explain how it works. I have seen how others did the list generation and as I wrote in a previous post, I wasn’t convinced.
For example www.jlptstudy.com included kanji in the N3 list that were from Jouyou grades 1 to 4, but not in N4 or N5 (3 and 4kyuu). This way it got a believable kanji count, but kanji from the JLPT didn’t have a direct connection with kanji from the Jouyou grades in the old system so why would they have now? (Jouyou is the kanji Japanese children learn in schools, and the Jouyou grades correspond to the school years.)
I asked the author of www.tanos.co.uk about his list of N3 words, who told me that it was generated from the old 2kyuu word list, and the decisions were based on the Tanaka Corpus, the example sentences data zkanji uses too. (The Tanaka Corpus is a collection of sentences made with help from many people, still being revised by enthusiasts. It was meant to help students to see how the words in the dictionary are actually used, and not for creating a study plan by it.) I don’t know whether kanji were taken into consideration when picking the word list, but the words on the page contain 1073 kanji (or 1305 if higher levels are included), which is impressive if we consider that it’s near the number required for N2. (N2 words have 1633 kanji, though probably only around 1200 are really required at the JLPT test.)
Now I don’t want to say that my method is better or more reliable, but it’s only fair to tell you about how it works so you know what to expect, without getting into technical details nobody really cares for.
First step is to create an order of all kanji. The order is based on many things, kanji frequency from KANJIDAT, number of words the kanji is in, frequency of the words the kanji is in, number of example sentences of those words etc. These are all weighted, for example I don’t consider the example sentences count too important. I change these weights until I get an order I like. BUT, this is not the N3 kanji list as it contains kanji from all levels.
In the second step I create an order of all words, but this time only include those that were in the old 2kyuu list (new N2), because that’s the only official data I have. The order is set on weighted parameters again. These are, the average order of kanji in the previously generated list, the word frequency, average old JLPT level of kanji and finally example sentences count (once again not given too much weight.) This is still NOT the N3 word list.
In the third and final step, the program goes over the generated word list in order, collecting the kanji that were old 2kyuu (or N2, everyone learned these numbers by now) until it reaches a set amount. (365 currently, as with 3 and 4kyuu kanji, the sum is 649.) I consider the collected kanji and words N3, but the algorithm won’t stop there, it keeps collecting words but only with kanji already in my decidedly N3 kanji list. This way I can generate both N3 kanji and N3 words lists that are connected to each other.
But this is how far automatic algorithms can go. The fourth and really final step is manually going over all N3 and N2 words and change their levels if I decide that they were put in the wrong place. I mainly base my opinion on intuition, but the sample N3 test concentrates on everyday topics like study and work, so I can pay attention to words that might come up in such topics. After the final manual decision is made, I’ll compute a new N3 kanji list based on those words. Then I can mark words not having such kanji but still in N3 vocab list as “don’t test kanji for this word on this level” and work done.
…No, it will only begin, because after that I’ll have to check each and every word and give them a different definition if I don’t like what they already have. (around 7000 words – all the others are either duplicates or same word with “no kanji” / “kanji” versions)
I have reached the decision that I’ll either not have an N3 list of words and kanji (there are no reliable sources, those that are free are made up of guesses that are too wild for my taste), or I’ll make my own list based on kanji/word frequency data and my own wild guesses (=experience with the language, though only through the internet, TV and novels).
There is a slight problem with frequency data. It was based on frequency of words in newspapers, ignoring general usage (which is probably way too difficult to measure). Though that might be an advantage regarding the JLPT.
UPDATE: I don’t believe that the results would change drastically, so this poll is closed! If you missed the voting but would still like to tell your thoughts about the question, please write a comment.
Don’t throw away your JLPT 1 vocabulary lists you got from enthusiasts’ Japanese sites just yet. Let me explain why. Though I have expanded the list I got from the previously mentioned site with around 300 new words from other lists I found online, the number of “unused” JLPT 1 kanji hasn’t decreased much.
Here is the complete list of those kanji:
It might not be obvious at first, but most of these kanji are not ones you would often see in actual use. So why were they listed for the JLPT 1? My guess is, that these kanji are mainly used in names. I don’t know how many names and which ones are required at N1 level (in the 2kyuu I passed, there was no question about names at all), but if I’m right, there must be a list somewhere with the names that were part of the requirements of the old 1kyuu.
Unfortunately zkanji has no name dictionary (yet!), but if it is true that “only” the names are missing, I can continue working on the JLPT list I got. There is still a lot of work to do. (I have to check the meaning of every single word…)