New students might not know, but before 2010 there were official lists with approximately 80% of words that might come up in the JLPT test. These lists were published in books, and if you learned all of them (and knew the necessary grammar and had a decent listening comprehension etc.), it wasn’t difficult to achieve a passing score. I’m not trying to say that learning thousands of words is easy, but at least you had an idea what was expected of you. Since 2010 the official lists were abandoned, and with the introduction of a new level (N3, which came in the middle of the previously 4 level system) it became even less obvious what must be studied. Several sites popped up that claimed they have a reworked list of the old JLPT vocabulary for the new system, so students still have some clutches to help them.
This is probably not news for most people coming here, but every post must have some kind of introduction, right? :p
I was thinking about what could be the reason behind the decision not to publish official vocabulary lists, and I think the answer is that the test makers don’t have such lists anymore. And not because they want to hide these lists from test takers. Most proficiency tests for any language don’t have lists either, so this is not surprising, but in that case how do you decide which level you are on? I don’t know the answer, but there is discussion between test makers and teachers, so there must be some guidelines at least.
The next obvious question is whether we can use old JLPT lists for studying and whether sites with the original or updated lists are useful. I think the answer is yes, because even the originally published vocabulary didn’t cover 100% of the words needed to be learned, but it helped. All languages evolve with time, but not that much. In conclusion I believe that learning from available lists still help, though I wouldn’t go as far as to say that they give as much confidence as they did before the change.
I’m making progress in creating an N3 list that can point students in the right direction, but be aware that it won’t be a definitive list, just an “opinion”. I have more or less finished writing the algorithm to create an initial list from the more frequent words, and I’m about to explain how it works. I have seen how others did the list generation and as I wrote in a previous post, I wasn’t convinced.
For example www.jlptstudy.com included kanji in the N3 list that were from Jouyou grades 1 to 4, but not in N4 or N5 (3 and 4kyuu). This way it got a believable kanji count, but kanji from the JLPT didn’t have a direct connection with kanji from the Jouyou grades in the old system so why would they have now? (Jouyou is the kanji Japanese children learn in schools, and the Jouyou grades correspond to the school years.)
I asked the author of www.tanos.co.uk about his list of N3 words, who told me that it was generated from the old 2kyuu word list, and the decisions were based on the Tanaka Corpus, the example sentences data zkanji uses too. (The Tanaka Corpus is a collection of sentences made with help from many people, still being revised by enthusiasts. It was meant to help students to see how the words in the dictionary are actually used, and not for creating a study plan by it.) I don’t know whether kanji were taken into consideration when picking the word list, but the words on the page contain 1073 kanji (or 1305 if higher levels are included), which is impressive if we consider that it’s near the number required for N2. (N2 words have 1633 kanji, though probably only around 1200 are really required at the JLPT test.)
Now I don’t want to say that my method is better or more reliable, but it’s only fair to tell you about how it works so you know what to expect, without getting into technical details nobody really cares for.
First step is to create an order of all kanji. The order is based on many things, kanji frequency from KANJIDAT, number of words the kanji is in, frequency of the words the kanji is in, number of example sentences of those words etc. These are all weighted, for example I don’t consider the example sentences count too important. I change these weights until I get an order I like. BUT, this is not the N3 kanji list as it contains kanji from all levels.
In the second step I create an order of all words, but this time only include those that were in the old 2kyuu list (new N2), because that’s the only official data I have. The order is set on weighted parameters again. These are, the average order of kanji in the previously generated list, the word frequency, average old JLPT level of kanji and finally example sentences count (once again not given too much weight.) This is still NOT the N3 word list.
In the third and final step, the program goes over the generated word list in order, collecting the kanji that were old 2kyuu (or N2, everyone learned these numbers by now) until it reaches a set amount. (365 currently, as with 3 and 4kyuu kanji, the sum is 649.) I consider the collected kanji and words N3, but the algorithm won’t stop there, it keeps collecting words but only with kanji already in my decidedly N3 kanji list. This way I can generate both N3 kanji and N3 words lists that are connected to each other.
But this is how far automatic algorithms can go. The fourth and really final step is manually going over all N3 and N2 words and change their levels if I decide that they were put in the wrong place. I mainly base my opinion on intuition, but the sample N3 test concentrates on everyday topics like study and work, so I can pay attention to words that might come up in such topics. After the final manual decision is made, I’ll compute a new N3 kanji list based on those words. Then I can mark words not having such kanji but still in N3 vocab list as “don’t test kanji for this word on this level” and work done.
…No, it will only begin, because after that I’ll have to check each and every word and give them a different definition if I don’t like what they already have. (around 7000 words – all the others are either duplicates or same word with “no kanji” / “kanji” versions)
I have reached the decision that I’ll either not have an N3 list of words and kanji (there are no reliable sources, those that are free are made up of guesses that are too wild for my taste), or I’ll make my own list based on kanji/word frequency data and my own wild guesses (=experience with the language, though only through the internet, TV and novels).
There is a slight problem with frequency data. It was based on frequency of words in newspapers, ignoring general usage (which is probably way too difficult to measure). Though that might be an advantage regarding the JLPT.
UPDATE: I don’t believe that the results would change drastically, so this poll is closed! If you missed the voting but would still like to tell your thoughts about the question, please write a comment.
Don’t throw away your JLPT 1 vocabulary lists you got from enthusiasts’ Japanese sites just yet. Let me explain why. Though I have expanded the list I got from the previously mentioned site with around 300 new words from other lists I found online, the number of “unused” JLPT 1 kanji hasn’t decreased much.
Here is the complete list of those kanji:
It might not be obvious at first, but most of these kanji are not ones you would often see in actual use. So why were they listed for the JLPT 1? My guess is, that these kanji are mainly used in names. I don’t know how many names and which ones are required at N1 level (in the 2kyuu I passed, there was no question about names at all), but if I’m right, there must be a list somewhere with the names that were part of the requirements of the old 1kyuu.
Unfortunately zkanji has no name dictionary (yet!), but if it is true that “only” the names are missing, I can continue working on the JLPT list I got. There is still a lot of work to do. (I have to check the meaning of every single word…)
Although the program is progressing well, I have run into just another problem with the available data. While trying to create my own N3 list of words, (as I’ve decided not to trust the available naive attempts blindly) I have identified all the kanji that were in the words of specified JLPT levels. The result: total chaos (mainly) in 1kyuu/N1.
http://www.tanos.co.uk/, the site from where I borrowed the list of words, used the lists (with all the mistakes in them) available at http://www.jlptstudy.com. The latter only has word lists till 2kyuu/N2, but those were taken from official JLPT material so they must be relatively good. (I have passed JLPT 2kyuu (now N2) with them)
But how trustworthy is the JLPT 1kyuu/N1 word list? I have never tried to study for N1 and I can only guess. So let’s just look at the facts. (You can skip the following few paragraphs if you are only interested in the final result.)
There are ~3450 words in the N1 list (not including the other levels, together the number would be around 9000).
In these 3450 words, 564 kanji are N1 kanji, 607 N2 kanji, 161 N4 kanji, 93 N5 kanji, and 210 kanji are not in any JLPT level (from old official lists, so the newly introduced N3 is not counted). The sum is 1425 JLPT kanji + 210 non JLPT kanji. That is 1635 kanji used altogether. Officially there were 2230 JLPT kanji from all the levels (the real number was less, but the official JLPT kanji were changed during the years, and this 2230 includes them all.) So there are around 800 kanji missing, not used in words of the N1 list. This is an interesting result, but we might be able to find the missing ones.
There are 480 kanji in words of lower levels, not used in words at N1, which leaves us with 320 kanji missing! We are talking about JLPT kanji, and yet they were not used in any JLPT word?
I have also counted that although only 564 N1 kanji were used in N1 words, there are 199 N1 kanji that were only used in words at lower levels. So 763 N1 kanji are used in all the words of the supposed JLPT words. But there should be 1207 N1 kanji. That makes it 444 missing N1 kanji.
If you compare the numbers, 480 JLPT kanji (from all levels) are not used in any JLPT word, while 444 N1 kanji are not used in any JLPT word. Which means that almost all the missing kanji are from N1, and that’s not a small number! If you also consider that there were 210 non-JLPT kanji in the list of N1 words, that’s enough to make anyone uncertain. I would rather not doubt the validity of the official 1kyuu/N1 kanji list, but there is no assurance about the validity of the unofficial N1 word list.
So once again, I have to find another site with a different N1 word list (or rather more sites) just to make sure. Unfortunately this will slow down my progress quite a bit…
I’ve contacted Jonathan Waller of http://www.tanos.co.uk/ who was kind enough to let me use his JLPT vocabulary lists in zkanji. (Actually anyone can use them freely if they comply to the CC-BY license.) The hard work comes only after this, because I will have to create my own data from them, and this cannot be done automatically. I could probably write a little script that does the hard work, but then I would have to go through the lists checking for missing words and differences, so doing everything by hand seems to be a safer method. But the work won’t stop there.
Have you ever wondered what is the best order to study vocabulary? I think this is a difficult question, and there is no definite answer to it. Programs like Anki or the current zkanji just throw the words at you in random order or in the case of Anki, in the order you want. This probably allows the student to come up with a study plan or to use some textbook’s vocabulary, and learn it in order. But is this really the best approach when you want to get ready for the JLPT? Especially if kanji is added to the mix and you have a limited knowledge of them.
When I was getting ready for the JLPT’s level 2 in 2009, I came up with my own order of study. I based that order on kanji, and it worked pretty well. The main idea was to study 3-6 words with each kanji at most, and only study words that have a single kanji that I haven’t seen before. For example I picked 速い. After learning that, I went for 時速, 急速 and 速報. I have seen 報 now so I could go for words like 電報 or 情報. I also had to take kanji readings into consideration, as I wanted to be able to read unknown words I see for the first time as well, to be able to look them up in the dictionary easily. So I picked several words where the same kanji had the same reading, and repeated this with most common readings of that kanji. With time I acquired all of JLPT2’s vocabulary this way.
Unfortunately the example order I have just shown wouldn’t work for beginners. Newcomers to the kanji world have trouble remembering simple kanji with stroke order and they equate the number of strokes in a kanji with its difficulty. With my current knowledge this way of thinking seems a bit naive, but I did the same years ago. Thus here comes the problem.
If I wanted students not to be overwhelmed by the 2-3000 common kanji, but still wanted to teach only relevant words, what order should I choose? Should I prioritize words with simple kanji having few strokes, or should I not care and put the more frequent words on top of the list? This might depend on the level of the student as well. Some will want to only study the words (how to say them and what they mean) without even touching kanji. Although this seems counterproductive to me, but should I deny this possibility from students? (Actually this is more of a technical issue than a question of study methods, and a difficult one on top of that.)
I think the best approach would be to teach words with simple kanji that are frequent as well first, and postpone those that are infrequent, but this has a disadvantage as well. If I can’t include a bit more complex kanji at the beginning of study or even after the student has acquired a hundred words, some frequent kanji combination can only come pretty late.
What do you think?