Transformer Language Modeling for Akuapem and Asante Twi

Image for post
Image for post
Fig. 1: We named our main model ABENA — A BERT Now in Akan

Introduction

In our previous blog post we introduced a preliminary Twi embedding model based on fastText and visualized it using the Tensorflow Embedding Projector. As a reminder, text embeddings allow you to convert text into numbers or vectors which a computer can perform arithmetic operations on to enable it reason about human language, i.e., carry out natural language processing (NLP). A screenshot of our fastText Twi embeddings from that exercise is shown in Fig. 2.

Image for post
Image for post
Fig. 2: Our fastText (subword word2vec) Twi embedding model screenshot from a previous article

This model— which we have shared in our Kasa Library repo — enables a computer to begin to reason in Twi computationally. However it is “static” in the sense that the vectors do not change with different contexts. State-of-the-art NLP in high-resource languages such as English has largely moved away from these to more sophisticated “dynamic” embeddings capable of understanding a changing contexts. The most prominent example of such a dynamic embedding architecture is BERTBidirectional Encoder Representations from Transformers. …


Watch the accompanying video to this post above & be sure to hit subscribe to see future content on YouTube.

Introduction

Natural language processing (NLP) is the subfield or Machine Learning and Artificial Intelligence (AI) concerned with teaching computers to read, understand and act on human language. A major component in enabling this is converting text into a meaningful set of numbers that the computer can then analyze and manipulate to extract meaning and context. For the purpose of this article, we will restrict the discussion of NLP to the analysis of text.

Image for post
Image for post
Natural Language Processing (NLP) is key for human interaction with computers [image source: thinkpalm.com]

Formally, Natural Language processing can be loosely described as encompassing the tools and methods involved in the analysis or study of languages used for everyday communications by humans, whether by speech or text, through computer manipulations. …

About

NLP Ghana

NLP Ghana is an Open Source Initiative focused on Natural Language Processing (NLP) of Ghanaian Languages, & its Applications to Local Problems.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store