cedat logo cedat logo

What's new

Machine Translation: a utopia that has become reality

Monday December 20, 2021

“Величина угла определяется отношением длины дуги к радиусу” – “Magnitude of angle is determined by the relation of length of arc to radius.”

 

The amplitude of the angle is determined by the relationship between the length of the arc and the radius.

It is here, from this sentence, that the modern history of machine translation or, to use our language, of automatic translation systems began in 1954. A modern history. Looking inside history books you will find references of studies and projects aimed at developing systems that would allow the automatic translation of texts or speeches from one natural language to another dating back to the 9th century, 17th century and so on until the early 1930s. But it was in the 1950s that these projects became real.

In 1954 the experiment known as Georgetown-IBM was carried out, with which a fully automatic translation of about sixty Russian phrases into English was carried out. An experiment where its success convinced for the need to continue with studies and investments not only in the United States but all over the world. This happened with varying fortunes in the following decades. The big hurdle to overcome was to be able to demonstrate that machine translation could be faster and just as effective and accurate as a translation done by a human translator. It is no coincidence that to see convincing results, one has to wait until the end of the 80s and from there to the present day. The great obstacle to overcome was computational power and until the last decade of the last century there was not enough of it to be able to achieve the desired results.

With Machine Translation or Automatic Translation, we mean the translation of a written or spoken text. In the latter case often in collaboration with an automatic transcription system, from one language to another through a software capable of processing quickly large volumes of text at a speed far beyond human capacity, in a large number of sources, and target languages. Machine translation works on training data, both generic and customised according to specific sectoral, domain or context needs. Generally, reference is made to three different types and methodologies of automatic translation. First, we have rule-based automatic translation, which uses grammar, linguistic rules developed by linguistic experts, and dictionaries that can be customised for a specific topic or sector. A second methodology is statistical type. In this case, the starting point is not represented by linguistic rules and words, but the analysis of a large amount of existing human translations, which therefore constitute a “knowledge base” of information and recurrences.

Until five years ago, the vast majority of machine translation solutions available on the market were based on statistical algorithms and methods. Statistical Machine Translation an Advanced Statistical Analysis are carried out to evaluate the best possible translations for a word, in relation to the other terms present in the sentence. The third methodology, which has been gaining ground in recent years, refers instead to neural networks. In this case we speak of Neural Machine Translation (NMT), a term which refers to an approach based on deep neural networks and on two key components: an encoder that reads the input sentence and generates a representation suitable for translation and a decoder that generates the actual translation. Words and phrases are represented as vectors of real numbers.

It is a methodology where all the efforts are being focused on since the results generated tend to be better, grammatically accurate and more fluent. In fact, neural networks better grasp the context of complete sentences before translating them.

Sometimes transcription and translation techniques are closely linked so that the result of a good automatic transcription feeds a Machine Translation process with an automatic and continuous flow, for example this is the process adopted by the European Parliament for real-time transcription and translation in 24 languages of the plenary sessions.