It's interesting to observe the evolution of natural language processing when my own native language (Finnish) is such an adversarial case that it breaks every system. I haven't seen even a functioning spell checker. Here some reasons why this is the case. #NLP #AI

First, every word has dozens of conjugations. For example nouns are conjugated in case (15), number (2). So, in even basic situations each noun can appear in approximately thirty forms. This means that each word occurrence is extremely rare.


New words can be formed by compounding, just like in German.For example:tietokone = knowledge machine (literally) = computerkämmentietokone = palm knowledge machine (literally) = tabletThis causes many word occurrences of other languages very rare.


Even non-compound words can be changed with many modifiers.For example:syödä = to eatsyömättäkinköhän = Does he/she mean even without eating, I wonder.These tags can be added independently, which causes a combinatorial explosion making many everyday words ultra rare.


To make the combinatorial explosion even worse. Many tags can be permuted within the word.The following mean roughly the same:syömättäkinköhänsyömättäköhänkinsyömättähänkökinsyömättäkökinhänetc.


Compounding some times changes the meaning from literal to figurative:luotaantyöntävä = unappealingluotaan työntävä = something that literally thrusts you awayFor the latter Google Translate gives the nonsense translation "I trust the pushing".


These features of the Finnish language, and many more, make it very badly compatible with #NLP systems. In fact Finnish isn't very well compatible even with any IT system. You can observe that in how Finnish speakers have to adapt the language to using hashtags.


All in all, I don't see deep learning being the way to #AGI before I see even a well functioning Finnish language spell checker. Additionally, machine translation to and from Finnish is usually just garbage. (Which luckily protects us from some foreign disinformation.)


