New students might not know, but before 2010 there were official lists with approximately 80% of words that might come up in the JLPT test. These lists were published in books, and if you learned all of them (and knew the necessary grammar and had a decent listening comprehension etc.), it wasn’t difficult to achieve a passing score. I’m not trying to say that learning thousands of words is easy, but at least you had an idea what was expected of you. Since 2010 the official lists were abandoned, and with the introduction of a new level (N3, which came in the middle of the previously 4 level system) it became even less obvious what must be studied. Several sites popped up that claimed they have a reworked list of the old JLPT vocabulary for the new system, so students still have some clutches to help them.
This is probably not news for most people coming here, but every post must have some kind of introduction, right? :p
I was thinking about what could be the reason behind the decision not to publish official vocabulary lists, and I think the answer is that the test makers don’t have such lists anymore. And not because they want to hide these lists from test takers. Most proficiency tests for any language don’t have lists either, so this is not surprising, but in that case how do you decide which level you are on? I don’t know the answer, but there is discussion between test makers and teachers, so there must be some guidelines at least.
The next obvious question is whether we can use old JLPT lists for studying and whether sites with the original or updated lists are useful. I think the answer is yes, because even the originally published vocabulary didn’t cover 100% of the words needed to be learned, but it helped. All languages evolve with time, but not that much. In conclusion I believe that learning from available lists still help, though I wouldn’t go as far as to say that they give as much confidence as they did before the change.
After some work on N3/N2 word placements I found that it makes no sense to keep the progress on the side bar. I went through a book that was written for N3 and marked most words that I found in it. I don’t agree with some choices but most part it’s OK. I’ll remove the mark from words where I can’t agree and when that’s done, it’ll be a little programming work to set the marked words as N3 (plus some similar words) and the final list will be done. I can hopefully post my N3 kanji list very soon, and then I’ll upload a new zkanji version as well that will not only have N3 kanji but N level for words will be indicated too. (The automatic word selection for the long-term study list will only come after that.)
I haven’t yet decided whether I want to share my version of the JLPT vocabulary list as public domain, but unless someone asks me to use it in their program, it has no importance anyway. (Even less before the list is done.)
UPDATE: The N3 kanji list is now final, but it seems that there are some important words missing from the current JLPT vocabulary list, which are most probably included in the updated JLPT since 2010. These are mainly words that don’t have any kanji, for example インターネット, just to name one that was not part of the old list. The first word missing that I noticed while checking the example test on the official JLPT site was ホテル, but for some reason it was not included in previous vocabulary. Yet it can be in N5.
I’m making progress in creating an N3 list that can point students in the right direction, but be aware that it won’t be a definitive list, just an “opinion”. I have more or less finished writing the algorithm to create an initial list from the more frequent words, and I’m about to explain how it works. I have seen how others did the list generation and as I wrote in a previous post, I wasn’t convinced.
For example www.jlptstudy.com included kanji in the N3 list that were from Jouyou grades 1 to 4, but not in N4 or N5 (3 and 4kyuu). This way it got a believable kanji count, but kanji from the JLPT didn’t have a direct connection with kanji from the Jouyou grades in the old system so why would they have now? (Jouyou is the kanji Japanese children learn in schools, and the Jouyou grades correspond to the school years.)
I asked the author of www.tanos.co.uk about his list of N3 words, who told me that it was generated from the old 2kyuu word list, and the decisions were based on the Tanaka Corpus, the example sentences data zkanji uses too. (The Tanaka Corpus is a collection of sentences made with help from many people, still being revised by enthusiasts. It was meant to help students to see how the words in the dictionary are actually used, and not for creating a study plan by it.) I don’t know whether kanji were taken into consideration when picking the word list, but the words on the page contain 1073 kanji (or 1305 if higher levels are included), which is impressive if we consider that it’s near the number required for N2. (N2 words have 1633 kanji, though probably only around 1200 are really required at the JLPT test.)
Now I don’t want to say that my method is better or more reliable, but it’s only fair to tell you about how it works so you know what to expect, without getting into technical details nobody really cares for.
First step is to create an order of all kanji. The order is based on many things, kanji frequency from KANJIDAT, number of words the kanji is in, frequency of the words the kanji is in, number of example sentences of those words etc. These are all weighted, for example I don’t consider the example sentences count too important. I change these weights until I get an order I like. BUT, this is not the N3 kanji list as it contains kanji from all levels.
In the second step I create an order of all words, but this time only include those that were in the old 2kyuu list (new N2), because that’s the only official data I have. The order is set on weighted parameters again. These are, the average order of kanji in the previously generated list, the word frequency, average old JLPT level of kanji and finally example sentences count (once again not given too much weight.) This is still NOT the N3 word list.
In the third and final step, the program goes over the generated word list in order, collecting the kanji that were old 2kyuu (or N2, everyone learned these numbers by now) until it reaches a set amount. (365 currently, as with 3 and 4kyuu kanji, the sum is 649.) I consider the collected kanji and words N3, but the algorithm won’t stop there, it keeps collecting words but only with kanji already in my decidedly N3 kanji list. This way I can generate both N3 kanji and N3 words lists that are connected to each other.
But this is how far automatic algorithms can go. The fourth and really final step is manually going over all N3 and N2 words and change their levels if I decide that they were put in the wrong place. I mainly base my opinion on intuition, but the sample N3 test concentrates on everyday topics like study and work, so I can pay attention to words that might come up in such topics. After the final manual decision is made, I’ll compute a new N3 kanji list based on those words. Then I can mark words not having such kanji but still in N3 vocab list as “don’t test kanji for this word on this level” and work done.
…No, it will only begin, because after that I’ll have to check each and every word and give them a different definition if I don’t like what they already have. (around 7000 words – all the others are either duplicates or same word with “no kanji” / “kanji” versions)
I have reached the decision that I’ll either not have an N3 list of words and kanji (there are no reliable sources, those that are free are made up of guesses that are too wild for my taste), or I’ll make my own list based on kanji/word frequency data and my own wild guesses (=experience with the language, though only through the internet, TV and novels).
There is a slight problem with frequency data. It was based on frequency of words in newspapers, ignoring general usage (which is probably way too difficult to measure). Though that might be an advantage regarding the JLPT.
UPDATE: I don’t believe that the results would change drastically, so this poll is closed! If you missed the voting but would still like to tell your thoughts about the question, please write a comment.
Don’t throw away your JLPT 1 vocabulary lists you got from enthusiasts’ Japanese sites just yet. Let me explain why. Though I have expanded the list I got from the previously mentioned site with around 300 new words from other lists I found online, the number of “unused” JLPT 1 kanji hasn’t decreased much.
Here is the complete list of those kanji:
It might not be obvious at first, but most of these kanji are not ones you would often see in actual use. So why were they listed for the JLPT 1? My guess is, that these kanji are mainly used in names. I don’t know how many names and which ones are required at N1 level (in the 2kyuu I passed, there was no question about names at all), but if I’m right, there must be a list somewhere with the names that were part of the requirements of the old 1kyuu.
Unfortunately zkanji has no name dictionary (yet!), but if it is true that “only” the names are missing, I can continue working on the JLPT list I got. There is still a lot of work to do. (I have to check the meaning of every single word…)
Although the program is progressing well, I have run into just another problem with the available data. While trying to create my own N3 list of words, (as I’ve decided not to trust the available naive attempts blindly) I have identified all the kanji that were in the words of specified JLPT levels. The result: total chaos (mainly) in 1kyuu/N1.
http://www.tanos.co.uk/, the site from where I borrowed the list of words, used the lists (with all the mistakes in them) available at http://www.jlptstudy.com. The latter only has word lists till 2kyuu/N2, but those were taken from official JLPT material so they must be relatively good. (I have passed JLPT 2kyuu (now N2) with them)
But how trustworthy is the JLPT 1kyuu/N1 word list? I have never tried to study for N1 and I can only guess. So let’s just look at the facts. (You can skip the following few paragraphs if you are only interested in the final result.)
There are ~3450 words in the N1 list (not including the other levels, together the number would be around 9000).
In these 3450 words, 564 kanji are N1 kanji, 607 N2 kanji, 161 N4 kanji, 93 N5 kanji, and 210 kanji are not in any JLPT level (from old official lists, so the newly introduced N3 is not counted). The sum is 1425 JLPT kanji + 210 non JLPT kanji. That is 1635 kanji used altogether. Officially there were 2230 JLPT kanji from all the levels (the real number was less, but the official JLPT kanji were changed during the years, and this 2230 includes them all.) So there are around 800 kanji missing, not used in words of the N1 list. This is an interesting result, but we might be able to find the missing ones.
There are 480 kanji in words of lower levels, not used in words at N1, which leaves us with 320 kanji missing! We are talking about JLPT kanji, and yet they were not used in any JLPT word?
I have also counted that although only 564 N1 kanji were used in N1 words, there are 199 N1 kanji that were only used in words at lower levels. So 763 N1 kanji are used in all the words of the supposed JLPT words. But there should be 1207 N1 kanji. That makes it 444 missing N1 kanji.
If you compare the numbers, 480 JLPT kanji (from all levels) are not used in any JLPT word, while 444 N1 kanji are not used in any JLPT word. Which means that almost all the missing kanji are from N1, and that’s not a small number! If you also consider that there were 210 non-JLPT kanji in the list of N1 words, that’s enough to make anyone uncertain. I would rather not doubt the validity of the official 1kyuu/N1 kanji list, but there is no assurance about the validity of the unofficial N1 word list.
So once again, I have to find another site with a different N1 word list (or rather more sites) just to make sure. Unfortunately this will slow down my progress quite a bit…
This is just a progress report so you won’t think that I have disappeared. I have converted all levels of the JLPT word lists (from http://www.tanos.co.uk) to be importable by my program. The original lists contained many words that were not found in the dictionary, either because the words’ written and kana forms contained mistakes, or because the words were simply not in the dictionary in the given form (eg. inflected). The only thing I have done so far was that I went through the lists and made the words “findable” in the dictionary, fixing mistakes and uninflecting those words where it made sense.
This was already a lot of work and the next one will be even more time consuming. I’ll have to check the word definitions (meanings) for around 9000 words and change them to be appropriate for study. Some words have a huge definition in the dictionary that’s not very suitable for memorizing. For example 上げる means:
- to raise, to elevate
- to do up (one’s hair)
- to fly (a kite, etc.), to launch (fireworks, etc.), to surface (a submarine, etc.)
- to land (a boat)
- to show someone (into a room)
- to send someone (away)
- to enrol (one’s child in school) (enroll)
- to increase (price, quality, status, etc.)
- to make (a loud sound), to raise (one’s voice)
- to earn (something desirable)
- to praise
- to give (an example, etc.), to cite
- to summon up (all of one’s energy, etc.)
- to give
- to offer up (incense, a prayer, etc.) to the gods (or Buddha, etc.)
- to bear (a child)
- to conduct (a ceremony, esp. a wedding)
- (of the tide) to come in
- to vomit
- (after the -te form of a verb) to do for (the sake of someone else)
- (after the -masu stem of a verb) to complete
- used after the -masu stem of a humble verb to increase the level of humility
To make things even more interesting, there is a different kind of mistake in the JLPT lists I have borrowed which will be more difficult to correct. (Probably originated from mistakes by companies that released official JLPT vocabulary lists.) The word 上げる appears in the N5 list as 上げる (あげる), then again in the N4 list as あげる, this time without kanji! Then once again in the N2 list with a different definition. N5 and N4 list it as “to give”, while N2 as “to do for”. Both meanings are correct, N2 just extends what was required on lower levels.
N4 and N5 has the same word with the same definition which must be a mistake for sure. I could understand this if the word without kanji would be in N5 and the same word with kanji in N4, but it’s the other way around, and 上 is marked as 4kyuu (that is N5 since 2010), so it makes sense to learn the word with kanji for N5.
I don’t know how I will solve the inconsistency in the meanings yet either, because in zkanji only a single “card” is available for each word item. I’ll probably have to include all important meanings at N5, but that doesn’t mean all meanings from 1 through 22. The definition will probably contain “give”, “do for” and maybe “raise”, but the meanings in the long-term study test are changeable, so you will be able to change this definition if you wanted to.
Another problem I’ll face is with kanji. It was “common knowledge” in tests before 2010 that kanji had a JLPT level as well. I have no idea what that means though, because in the JLPT vocabulary lists I have seen so far, were many words with kanji that should only appear on a more difficult level. I will have to decide whether to ask the kana/meaning of a word on the given level and leave the kanji test item for later, or ask all of them right away.
So the next step: make a JLPT vocabulary list importer/editor. (Programmers often have to create their own tools)