lemonad said:

lemonad

Added google translation to jaikungfu and noticed that it can usually figure out the source language given enough comments on a page.

1 week ago.

21 comments so far

  • BUGabundo

    great news... but aint it a bit heavier?

    1 week ago by BUGabundo.

  • lemonad

    Chinese it can figure out right away but unfortunately it's not the same with Finnish.

    1 week ago by lemonad.

  • lemonad

    @BUGabundo: I meant that I've just added a link to translate jaiku pages :)

    1 week ago by lemonad.

  • BUGabundo

    i know

    1 week ago by BUGabundo.

  • jkniiv

    Yay for our zuper-zecreto FI language! ;)

    1 week ago by jkniiv.

  • lemonad

    @jkniiv: You just wait, Google has got @jyri now ;)

    1 week ago by lemonad.

  • lemonad

    It's actually quite weird because Google only seems to need about four Swedish comments to figure out the language but with Finnish, it needs a lot more... This one works as it has 59 comments.

    1 week ago by lemonad.

  • cybette

    google needs more info to make sure it isn't japanese?

    1 week ago by cybette.

  • Nikke

    I think it's because of all those damn Finnish cases to the noun. It must be really difficult to find enough unstemmed words in just a few comments.

    1 week ago by Nikke.

  • lemonad

    @cybette: Ah, so Google and I are not so different after all, we both understand Finnish very poorly :)

    1 week ago by lemonad.

  • lemonad

    @Nikke: I think you're right, when Google does figure out that it's Finnish the translation is leaving a lot of words untranslated. I assume that means that they're basing their language detection scheme not on grammar or patterns but on the actual words it has in their dictionaries.

    1 week ago by lemonad.

  • lemonad

    Still trying to find threads written in other languages than English. So far I've found Finnish, Swedish, Chinese, and Danish that Google was able to translate.

    1 week ago by lemonad.

  • lemonad

    Hot damn, Google only needed one comment to figure out that this was Danish.

    1 week ago by lemonad.

  • lemonad

    Russian (?) was way easy.

    1 week ago by lemonad.

  • Nikke

    Compare the Danish post with the long one in Finnsish and look att the nouns. I'm quite certain that we are on to something here about Google's language capabilities. They do use stemming as well, but don't seem to be able to strip more than a few chars from the end or beginning of the word.

    1 week ago by Nikke.

  • Aaqil

    @lemonad hello tell me is linux OS good? and which version?

    1 week ago by Aaqil.

  • lemonad

    @Nikke: Seems like there's more information in today's DN (in Swedish). It says it is non-rule/grammar based and depends on a large dataset of words and their respective translation.

    6 days, 20 hours ago by lemonad.

  • moonhouse

    The article was really from Ny Teknik.

    6 days, 20 hours ago by moonhouse.

  • lemonad

    @moonhouse: Ah, of course. They've initiated a collaboration.

    6 days, 20 hours ago by lemonad.

Sign in to add a comment