Databricks opened large DBRX language model, ahead of GPT-3.5 tests

DATABRICS announced about
opening a large language model dbrx , which can be used to create chatbots answering questions in natural language that solve the proposed mathematical tasks capable of To generate content on a given topic and create a code in various programming languages. The model was developed by Mosaic ML, which was bought by Databricks for $ 1.3 billion, and trained in a cluster of 3072 GPU NVIDIA H100 Tensor Core. To start the finished model, 320GB memory is recommended.

When teaching the model, the architecture was used moe ( Mixture of Experts ), which allows you to get a more accurate expert assessment, and a collection of texts and code, the size of 12 TB. The size taken into account by the DBRX context is 32 thousand tokens (the number of tokens that the model can process and remember when generating the text). For comparison, the size of the context of the Google Gemini and Openai GPT -4 models is 32 thousand tokens, Google Gemma – 8 thousand, and in the GPT -4 TURBO model – 128 thousand.

The model covers 132 billion parameters and is divided into 16 expert networks, of which no more than 4 can be used when processing a request (coverage of not more than 36 billion parameters for each token). For comparison, the GPT -4 model presumably includes 1.76 trillion parameters, the recently open X/Twitter model GROK (X/Twitter) – 314 billion, GPT -3.5 – 175 billion, Yalm (Yandex) – 100 billion, Llama (META) – 65 billion , Gigachat (Sber) – 29 billion, Gemma (Google) – 7 billion

Model and related components are distributed under the license Databricks Open Model License , which allows you to use, play, copy, change and create derivatives, but with some restrictions. For example, a license prohibits the use of DBRX, derivatives of models and any conclusion based on them to improve other language models that are different from DBRX. The license also prohibits the use of the model in areas that violate laws and regulatory acts.
The derivatives should be distributed under the same license. When used in products and Serivites, which use more than 700 million users per month, a separate permit is required.

According to the creators of the model, in terms of its characteristics and capabilities, DBRX exceeds the GPT-3.5 models from Openai and GROK-1 from Twitter, and can compete with the Gemini 1.0 Pro model when testing the degree of understanding of the language, the possibilities of writing code in programming languages and solving mathematical problems. In some applications, for example, when generating SQL checks, DBRX is approaching the effectiveness of the GPT-4 Turbo model, which leads the market. In addition, the model differs from competing services with very fast work and allows you to form an answer almost instantly. In particular, DBRX can generate text at a speed of up to 150 tokens per second per user, which is about twice as fast as LLAMA2-70B.

/Reports, release notes, official announcements.