Wow! Amazon AI Can Talk Like a Human!

What did the Amazon AGI research team say?

A research team from Amazon AGI (Artificial General Intelligence) claims that Amazon’s AI model has demonstrated language abilities that were not previously taught. In an unpublished academic paper, a research team from Amazon AGI claims their large language model exhibits “high-level natural fluency” in conversational text.

According to the examples shared, the model looks sophisticated. It is able to generate a variety of sentences that show language leaps similar to those that occur in human language learners. This, based on criteria created with the help of linguists, is a difficult achievement to achieve in AI.

Researchers trained the model “Big Adaptive Streamable TTS with Emergent abilities” (BASE TTS) on 100,000 hours of public domain speech data, 90% of which was in English. The goal was to learn how Americans speak better. BASE TTS is expected to produce voices that are more natural and easier for English speakers to understand.

The Amazon AGI team wanted to test the “emergent abilities” of their language model. To do this, they trained two smaller models, one trained with 1,000 hours and one with 10,000 hours. The goal was to see which of these two models showed the language fluency they were looking for.

By comparing the performance of these two models, the Amazon AGI team hopes to determine the optimal model size to demonstrate “emergent abilities.”

The 10,000-hour model performed best on the emergent ability criteria set by the Amazon researchers. Its ability to understand punctuation, non-English words, and emotions made it superior to other models.

How does the model work?

The model is able to generate sentences that appear natural to the human reader. Its ability is evident when it copies non-words like Tom’s whisper, “Shhh, Lucy, shhh, don’t wake your brother,” as they tiptoe past the baby’s room.

Its ability is further evident when it mimics the internet language often used in text messaging and spoken language, as in the SMS example, “Emergency @ home; call ASAP! Mom & Dad worried…#familyproblems.”

18 AI experts from different countries wrote a paper emphasizing that BASE TTS was never instructed to produce surprising output. The paper explains that BASE TTS simply follows the instructions given without any intention to create provocative output.

The AI experts emphasize the importance of understanding how AI works to avoid misunderstandings and misinterpretations. The paper explains that these sentences are designed to challenge text-to-speech (TTS) models with a variety of tasks.

The model is able to parse trap sentences, stress long compound words, generate emotional or whispered voices, and produce the correct phonemes for foreign words and punctuation. This allows the model to generate more accurate and natural text.

In short, the model is able to understand and generate language better, resulting in more natural and easy-to-understand text. Traditional TTS models are not explicitly trained to perform these tasks. So these sentences can help improve the capabilities of TTS models.

These findings, while not AGI (Artificial General Intelligence), may provide a glimpse into the path to achieving it. This is especially evident in the need for relatively little training data to achieve significant results.

