Categories:

Nate Robinson: Automatic translation is a useful application of Artificial Intelligence. As a language enthusiast, this is one of the areas of AI that interests me most. In recent years, as neural architectures have become more advanced, Neural Machine Translation (NMT) has been replacing traditional statistical translation methods as a standard practice. 

Since it was proposed in 2017, the neural Transformer architecture has become popular. Transformer networks use encoders and decoders to map inputs of arbitrary length to outputs of arbitrary length. They employ numerical attention mechanisms that allow the network to learn relationships between different parts of input and output sequences, and they use positional encodings that help the network store information about sequence order (i.e. word order for text applications). These traits make the Transformer desirable for translation applications. Many translation Transformer models have shown impressive results in recent years. (See http://jalammar.github.io/illustrated-transformer/ for a digestable overview of the Transformer architecture.) Like all neural networks, Transformer-based translation networks require large volumes of training data, in the form of bitext corpora (that is, corpora with sentences in one language mapped to corresponding translations in another language).

One unique thing about BYU is the wide range of languages studied here. BYU offers courses in 62 of the many different languages spoken by its students. A large portion of these are low-resource languages, meaning there is limited availability of corpora, documentation, translated texts, and the other resources required for Natural Language Processing (NLP) tasks, such as automatic translation.

One low-resource language studied at BYU is Haitian. Haitian is the first official language of the Republic of Haiti. (The second is French.) Written and digital materials in Haitian are scarce for a number of reasons. The language had no developed writing system until the 1940s, and the current writing system was not standardized until 1979. (See https://en.wikipedia.org/wiki/Haitian_Creole.) Even today, Haitian writers often feel pressure to write in French, for social reasons and to make their work more accessible to other francophone countries. 

Though resources in Haitian are scarce, they do exist. One of the earliest attempts to build an automatic translator between Haitian and English was the DIPLOMAT project headed by Carnegie Mellon University and funded by DARPA. In this project, researchers collected a bitext of Haitian phrases with English translations and employed statistical methods to develop an automatic translator. (See http://www.speech.cs.cmu.edu/haitian/.)

Automatic translation in Haitian is a topic of particular interest because the Haitian diaspora is large: approximately 2 million people, including an estimated 1 million in the United States. (See https://en.wikipedia.org/wiki/Haitian_diaspora.) A large proportion of these Haitian immigrants do not speak the primary language of the countries where they dwell. Thus quality automatic translation tools for Haitian, while hard to come by, are widely needed.

As part of a class project in a Natural Language Processing course with Dr. Deryle Lonsdale last year, I gathered a sizable Haitian-English bitext by web-scraping the website for the Church of Jesus Christ of Latter-day Saints, where years of General Conference addresses along with their translations into multiple languages are available. I collected the text for English and Haitian versions of every address that had available translations in both. I used Python libraries BeautifulSoup and Selenium for web-scraping and then used Okapi’s Rainbow and Oliphant tools to align the corpora into a usable bitext format (i.e. small chunks of English text were mapped to small chunks of Haitian text). 

I combined this home-brewed corpus with a large English-Haitian bitext from OPUS to complete my training data set. I used a Transformer model I had developed myself using PyTorch and Google’s Colab editor in a Deep Learning course with Dr. David Wingate last year and trained it to learn the translation task. Examples of translated text from test data are provided below.

Haitian: Sa pa enpòtan menmsi yo te konnen ke yo te pèdi .

Translation: It does n’t matter even if they were lost to Him .

Target: It does n’t matter even if they were aware they were lost .

Haitian: Yon gepa se yon predatè ki tou natirèlman atake lòt animal .

Translation: A cheetah is a predator that naturally preys on other animals .

Target: A cheetah is a predator that naturally preys on other animals .

Haitian: Li kanpe solid kòm sèl jènfi nan branch li nan peyi Islande .

Translation: She stands strong as the only young woman in her branch in Iceland .

Target: She stands strong as the only young woman in her branch in Iceland .

Haitian: M te vin konnen pa pouvwa Sentespri a ke Liv Mòmon an te verite .

Translation: I came to know by the power of the Holy Ghost that the Holy Ghost was true .

Target: I came to know by the power of the Holy Ghost that the Book of Mormon was true .

The overall quality of translation examples was very high but quite domain-specific. To improve performance and topic-generalization in the future I would like to gather a larger training bitext and experiment more with model hyperparameters in training. If you are interested in Haitian-English machine translation or a related topic, please feel free to contact me.