Researchers developed an AI model to help computers work more efficiently with a wider variety of languages - extending natural language processing (NLP) capabilities to African languages that are heavily under-represented in AI. African languages have received little attention from computer scientists, so few NLP capabilities have been available for large swaths of the continent. But a new linguistic model, developed by researchers at the University of Waterloo in Canada, fills this gap by allowing computers to parse text in African languages for many useful tasks. Lo-Fi Player - Become a Lo-Fi music composer using artificial intelligence AI in the service of African language. The new neural network model, which the researchers dubbed AfriBERTa, uses deep learning techniques to achieve "state-of-the-art" results for low-resource languages, according to the team. It works specifically with 11 African languages, including Amharic, Hausa and Swahili, which are collectively spoken by more than 400 million people, and achieves output quality comparable to the best existing models despite learning from a single gigabyte of text, while other models require thousands of times more data, the researchers said. "Pre-trained linguistic models have transformed the way computers process and analyze text data for tasks ranging from machine translation to answering questions," said Kelechi Ogueji, a master's student in computer science at Waterloo. “Unfortunately, African languages have received little attention from the research community. “One of the challenges is that neural networks are incredibly text- and data-intensive. And unlike English, which has huge amounts of text available, most of the approximately 7 languages spoken in the world can be described as low resource, in the sense that there is a lack of data available to feed data-hungry neural networks. An artificial intelligence can diagnose cancer in less than 5min According to the researchers, most of these models work according to a technique known as pre-training. To do this, the researchers presented the model with text where some words had been covered or hidden. The model then had to guess the hidden words. By repeating this process several billion times, the model learns statistical associations between words, which mimic human knowledge of language. “Being able to pre-train models that are equally accurate for certain downstream tasks, but using much smaller amounts of data, has many advantages,” said Jimmy Lin, president of the Cheriton School of Computer Science. . “Needing less data to train the language model means less computation is needed and therefore less carbon emissions associated with running massive data centers,” he added. "Smaller datasets also make data curation more practical, which is one approach to reducing bias present in models." Lin believes the research and model take a “small but important step” to bring natural language processing capabilities to more than 1,3 billion people on the African continent.