Text To Speech – zkanji might talk?

Not yet! (I admit the title might be a bit misleading.) But I could make zkanji talk with a great open-source project, called Open JTalk. You can test the output it creates at its demo page. Just enter Japanese text in the field next to 合成テキスト (Only enter short sentences because it can’t handle long ones.) and press the button below it. You don’t have to install add-ons (I don’t have what it asks for), the generated sound can be saved in wav format as well.

I think the speech it generates is great. It’s much better than other TTS engines I have heard before. (Google’s translator can speak too, and its sound quality is amazing but somehow it can’t put together words that well.) It is free to use, it has an open-source license. So why don’t I want use it in zkanji? Actually I’m not yet sure about this, but at the moment I think that it is not appropriate for use by students. While it can say short sentences really well, it doesn’t work as great with single words or longer text. When it says single words, the sound is a bit shaky, and even if we find a long sentence that it can say without problems, it doesn’t sound as natural as a human would speak.

I have found another problem with its integration into zkanji. You probably know that some words in Japanese, though written with the same kanji, might sound very differently in different context. The simplest example is 何 that can be said as なに(nani) and also as なん(nan). There is no way at the moment to tell Open JTalk to say one or the other. This little problem would prevent zkanji to say the selected word in the dictionary when its reading differs from the most used one, and can also cause problems when the engine doesn’t recognize the correct form of the word in sentences, when it reads example sentences.

If you have listened to the speech that Open JTalk generated, you will probably agree that it is doing a great job. In the future it might become almost as good as a human speaker, or at least good enough to be used for studying. I’m sure the authors would appreciate any help they can get, so if you’re a genius, head over there and make it even better!

  1. Ramon.
    February 9, 2011 at 12:43 pm

    What about this?

    It speaks quite well jap.

  2. February 12, 2011 at 2:48 am

    Web sites that can read text are not very useful, because I can’t integrate them into zkanji. Also because computers can’t really decide how a sentence should be read, even the best ones would have trouble reading single dictionary entries. I don’t want to confuse beginners that’s why I have decided not to include TTS with zkanji.

  3. Dmitry
    March 9, 2011 at 4:50 pm

    That engine sounds quite good. And it seems to simulate intonation as well. Some sort of reduction is also taken into concideration. I tried “私が散歩する庭には桜がたくさんいます”

    • Dmitry
      March 9, 2011 at 4:54 pm

      And if you type comma after main subject, it spells that sentence even better.

  4. Dmitry
    March 9, 2011 at 6:00 pm

    Yes, you’re right about words being spelled differently. For the moment it really doesn’t differentiate forms of 何. And phrase like “これは、何ですか。” sounds a bit strange. But that is a techical problem which they can solve easily. When we meet something like “私” (spelled わたくし, わたし, あたし, あし) not even a human can always understand which is correct without a deep understanding of a context. As for telling Open JTalk how to say correct form you can replace problematic words with desired kana as long as your tool knows which is correct. It seems they have also kana-based dictionary and it spells 私 with same stress as わたし. I know I may it’s extremly hard for example sentences because here you meet the same problems, but for single words that may work even if it confuses stresses. Anyway it’s more comfortable than compiling an audio database for the entire dictionary by powers of oneself. The better solution is to give enthuasists some tool to make audio vocabulary DB and spread the work among them. It’s for dictionary. Something similar to collect examples with audio. And as a simultaneosly existing alternative – speech synthesizer.
    Please don’t pay too much attention if I said something weird.

  5. March 9, 2011 at 6:22 pm

    As I wrote there’s potential in this engine, but writing the words with kana doesn’t solve the problem because I would have to decide somehow which words should be spelled out and which shouldn’t. For example the 3 ways to say はし would make it very impractical to spell those words with kana, and even if I could decide with such simple examples when they should be spelled out and when not, there are a many others where the decision would be more difficult. The only real solution in my opinion would be to input both kana and kanji to the engine, but at the moment it is not capable of handling that.

    All the good solutions would require work but that’s the job of the creator of such systems. Developing zkanji is more than enough for me at the moment 🙂

