AI21 Labs: A new artificial intelligence model capable of handling more context than most

April 6, 2024

By auroraoddi

The artificial intelligence industry is increasingly moving toward generative models with wider contexts. However, models with wide context windows tend to be computationally intensive. Or Dagan, product manager at artificial intelligence startup AI21 Labs, argues that this need not be the case, and his company is releasing a generative model to prove it.

Contexts, or context windows, refer to input data (e.g., text) that a model considers before generating output (more text). Models with small context windows tend to forget the content of even very recent conversations, while models with larger contexts avoid this problem and, in addition, better understand the flow of data they process.

AI21 Labs‘ new text-generation and analysis model, called Jamba, can perform many of the same tasks as models such as OpenAI’s ChatGPT and Google’s Gemini. Trained with a combination of public and proprietary data, Jamba can write text in English, French, Spanish, and Portuguese.

A unique feature of Jamba is its ability to handle up to 140,000 tokens with a single GPU with at least 80 GB of memory, such as a powerful Nvidia A100. This corresponds to about 105,000 words, or 210 pages, an appropriate size for a good-sized novel.

By comparison, Meta’s Llama 2 has a context window of 32,000 tokens, a smaller size by today’s standards, but requires only a GPU with about 12 GB of memory to run. (Context windows are typically measured in tokens, which are fragments of raw text and other data.)

At first glance, Jamba might seem like an ordinary model. There are many freely available and downloadable generative artificial intelligence models, such as Databricks’ recently released DBRX and the aforementioned Llama 2.

What makes Jamba unique is what lies under the hood. It uses a combination of two model architectures: transformers and state-space models (SSMs).

Transformers are the preferred architecture for complex reasoning tasks and power models such as GPT-4 and the aforementioned Google Gemini. They have several unique features, but the distinguishing characteristic of transformers is undoubtedly their “attention mechanism.” For each piece of input data (such as a sentence), transformers “weight” the relevance of every other input (other sentences) and draw from them to generate the output (a new sentence).

SSMs, on the other hand, combine different qualities of older types of artificial intelligence models, such as recurrent neural networks and convolutional neural networks, to create a more computationally efficient architecture capable of handling long sequences of data.

SSMs have their own limitations. However, some early incarnations, including an open source model called Mamba developed by researchers at Princeton and Carnegie Mellon, can handle larger inputs than their transformer-based equivalents and outperform them in language generation tasks.

Jamba actually uses Mamba as part of the underlying model, and Dagan says it offers three times the throughput on long contexts compared to transformer-based models of comparable size.

“Although there are some early academic examples of SSM models, this is the first commercial-grade production model,”

Dagan said in an interview with TechCrunch.

“This architecture, in addition to being innovative and interesting for further research by the community, opens up great possibilities for efficiency and throughput.”

Although Jamba has been released under the Apache 2.0 license, an open source license with few restrictions on use, Dagan stresses that this is a release for research purposes and is not intended for commercial use. The model does not have safeguards to prevent the generation of toxic text or mitigations to address possible biases; a refined and supposedly “safer” version will be made available in the coming weeks.

However, Dagan says Jamba already demonstrates the potential of the SSM architecture even at this early stage.

“What isunique about this model, both in terms of its size and its innovative architecture, is that it can be easily adapted to a single GPU,”

he said.

“We believe performance will improve further with further optimizations of Mamba.”

Benefits of artificial intelligence models with broad context

As the artificial intelligence industry continues to evolve, it is increasingly moving toward adopting generative models with broader contexts. These models, such as AI21 Labs’ Jamba, allow more information to be taken into account before generating the desired output.

Models with wide contexts have several advantages over those with smaller context windows. First, models with wide contexts have a greater ability to understand and store important information from previous conversations. This means that the model can create more consistent and accurate output, avoiding repetition or errors due to lack of context.

In addition, models with broad contexts are better able to capture the flow of data they consider. This means they can better understand the general context and create output that better fits the specific context in which it is used.

A practical example of the usefulness of models with broad contexts is the field of chatbots. Chatbots based on models with broad contexts can better understand previous conversations and respond more accurately and consistently to user questions. This leads to a better and more satisfying user experience.

In addition, models with broad contexts can be used in a variety of industries and applications. For example, they can be used to automatically generate text, translate from one language to another, create realistic dialogues for video games or movies, and much more.

Jamba: the potential of SSM models

An interesting aspect of AI21 Labs’ Jamba model is the use of a combination of two model architectures: transformers and state space models (SSMs). Transformers are known to be very effective in complex reasoning tasks, while SSMs can handle longer sequences of data.

The combined use of these two architectures allows Jamba to get the best of both worlds. Transformers provide the complex reasoning capabilities, such as context analysis and coherent text generation, while SSMs allow longer data sequences to be handled without sacrificing performance.

This hybrid approach has been shown to offer three times higher throughput on long contexts than models based solely on transformers of comparable size. This means that Jamba can generate consistent, high-quality text over long data sequences, offering a significant advantage over other models available on the market.

Article source here.