Google’s AI translation tool seems to have invented its own secret internal language

All right, don’t panic, but computers have created their own secret language and are probably talking about us right now. Well, that’s kind of an oversimplification, and the last part is just plain untrue. But there is a fascinating and existentially challenging development that Google’s AI researchers recently happened across.

You may remember that back in September, Google announced that its Neural Machine Translation system had gone live. It uses deep learning to produce better, more natural translations between languages. Cool!

Following on this success, GNMT’s creators were curious about something. If you teach the translation system to translate English to Korean and vice versa, and also English to Japanese and vice versa… could it translate Korean to Japanese, without resorting to English as a bridge between them? They made this helpful gif to illustrate the idea of what they call “zero-shot translation” (it’s the orange one):

image01

Image Credits: Google

As it turns out — yes! It produces “reasonable” translations between two languages that it has not explicitly linked in any way. Remember, no English allowed.

But this raised a second question. If the computer is able to make connections between concepts and words that have not been formally linked… does that mean that the computer has formed a concept of shared meaning for those words, meaning at a deeper level than simply that one word or phrase is the equivalent of another?

In other words, has the computer developed its own internal language to represent the concepts it uses to translate between other languages? Based on how various sentences are related to one another in the memory space of the neural network, Google’s language and AI boffins think that it has.

A visualization of the translation system's memory when translating a single sentence in multiple directions.

A visualization of the translation system’s memory when translating a single sentence in multiple directions. Image Credits: Google

This “interlingua” seems to exist as a deeper level of representation that sees similarities between a sentence or word in all three languages. Beyond that, it’s hard to say, since the inner processes of complex neural networks are infamously difficult to describe.

It could be something sophisticated, or it could be something simple. But the fact that it exists at all — an original creation of the system’s own to aid in its understanding of concepts it has not been trained to understand — is, philosophically speaking, pretty powerful stuff.

The paper describing the researchers’ work (primarily on efficient multi-language translation but touching on the mysterious interlingua) can be read at Arxiv. No doubt the question of deeper concepts being created and employed by the system will warrant further investigation. Until then, let’s assume the worst.