Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: June 11, 2024
In this tutorial, we’ll analyze the best large language models currently available. Through this systematic analysis, we’ll describe several of the most popular models, highlighting their features, strengths, and weaknesses. We’ll focus exclusively on generative systems based on LLMs, as comparing LLMs with different purposes wouldn’t be meaningful.
By the end of this article, readers should have a clearer idea of which LLM model might best suit their needs.
Large Language Models (LLMs) are advanced artificial intelligence systems that understand and generate human-like text. They are trained on vast amounts of text data to learn patterns, structures, and nuances of language.
LLMs use deep learning techniques to process and generate text, particularly variants of neural networks like Transformers. Also, these models have parameters of a few hundred million to a few hundred trillion, which is why we call them large.
LLMs have a wide range of applications across various industries and domains. Some of them include:
In this article, we’ll describe some of the most popular LLM systems currently available. We won’t explain the algorithms behind these models, as many are not open-sourced and lack extensive information about their architecture. Our focus will be on LLMs as accessible platforms primarily utilized as chatbots for various purposes.
To select the best models, we’ll use the current ranking on the LMSYS Chatbot Arena Leaderboard. It’s a platform that serves as a crowdsourced open platform for evaluating LLMs. It collects human preference votes to rank various models based on their performance using an Elo rating system.
Users can participate in the ranking process by evaluating and voting on the performance of different LLMs. After entering a prompt, the system randomly selects two models, processes the prompt, and responds anonymously to the user. Then, the user can vote on which model performed better.
Only after voting the system will reveal the names of the models:
The world of LLMs is pretty competitive. Many new models come up every month, making it even more intense. As a result, the field of LLMs remains dynamic and always changing.
Here, we’ll present some of the most powerful models currently available. However, this list may not always be up-to-date, as new models, updates, or patches may appear every week. Still, certain leading model families and platforms will likely maintain their leading positions over a longer period.
OpenAI is a leading artificial intelligence research laboratory that aims to develop and promote user-friendly AI systems. One of its notable creations is ChatGPT, a pioneering LLM model based on the GPT architecture, designed to engage in human-like conversations and assist users in various tasks. OpenAI made history with ChatGPT as the fastest-growing app upon its release, attracting over 100 million monthly users within just two months. This rapid growth surpassed some popular platforms like TikTok and Instagram.
The most powerful LLMs by OpenAI are:
Anthropic is an artificial intelligence startup founded by former members of OpenAI in 2021. Since then, it has raised funding from many VC funds and corporations, including Amazon and Google. Anthropic focuses on creating reliable AI systems, with a strong emphasis on AI safety and ethical considerations. These models are available on claude.ai and the Claude API, accessible in over 150 countries.
The most powerful LLMs by Anthropic are:
Gemini is a family of LLMs created by Google DeepMind. These LLMs are multimodal, which means that the models can process information from multiple modalities, including text, images, audio, and video. Gemini can tackle many interesting problems. One of them is reasoning, which uses different modalities, such as the whole movie. Namely, long context understanding from the whole movie is an experimental feature researcher from Google tested with Gemini 1.5 pro.
The most powerful LLMs by Google are:
Mistral AI is a French company founded in April 2023 by previous Meta and Google DeepMin employees. It produces open-source LLMs, following the importance of open-source software, and as a response to proprietary models.
The most powerful LLMs by Mistral AI are:
Llama (Large Language Model Meta AI) is a family of autoregressive LLMs released by Meta AI starting in February 2023. Meta released all models as open-sourced, with weights available online, which makes them very popular in the community. Llama models are trained on a wide variety of datasets, including web pages, open-source GitHub repositories, Wikipedia in 20 different languages, public domain books, LaTeX source code from ArXiv papers, and Stack Exchange question-answers.
The most powerful LLMs by Meta are:
For the comparison, we’ll use LLSYS ranking and some common LLM benchmarks reported by companies. LLSYS ranking is dynamic, and the numbers change daily. Because of that, we’ll use “top 5,” “top 10,” and “top 15” categories as a measurement.
Some of the common LLM benchmarks will include:
In addition to these benchmarks, there are prompting techniques used during the evaluation. Most commonly, we can differ:
The table that shows a comparison between the presented models is below:
| Input context window | Maximum output context | Release date (m-d-y) | Price per one million input tokens | Price per one million output tokens | LLSYS | MMLU (5-shot) | HellaSwag (10-shot) | MATH (4-shot) | HumanEval (0-shot) | |
| gpt-4-turbo-2024-04-09 | 128k | 4096 | 04-09-2024 | 10$ | 30$ | Top 5 | – | – | – | – |
| gpt-4-1106-preview | 128k | 4096 | 11-06-2023 | 10$ | 30$ | Top 5 | – | – | – | – |
| gpt-4-0125-preview | 128k | 4096 | 01-25-2024 | 10$ | 30$ | Top 5 | – | – | – | – |
| gpt-4-0613 | 8192 | 8192 | 01-13-2023 | 30$ | 60$ | Top 15 | – | – | – | – |
| Claude 3 Opus | 200k | 4096 | 04-03-2024 | 15$ | 75$ | Top 5 | 86.8% | 95.4% | 61.0% | 84.9% |
| Claude 3 Sonnet | 200k | 4096 | 04-03-2024 | 3$ | 15$ | Top 5 | 79.0% | 89.0% | 40.5% | 73.0% |
| Claude 3 Haiku | 200k | 4096 | 04-13-2024 | 0.25$ | 1.25$ | Top 10 | 75.2% | 85.9% | 40.9% | 75.9% |
| Gemini Ultra | 32.8k | 8192 | – | – | – | – | 83.7% | 87.8% | 53.2% | 74.4% |
| Gemini Pro 1.0 | 32.8k | 8192 | 12-13-2023 | 0.13$ | 0.38$ | Top 5 | 71.8% | 84.7% | 32.6% | 67.7% |
| Mistral Large | 32k | 4096 | 02-26-2024 | 8$ | 8$ | Top 15 | 81.2% | 89.2% | – | 45.1% |
| Mixtral 8x22B Instruct | 64k | – | 04-17-2024 | open-source | open-source | Top 15 | 77.75% | 88.5% | – | 45.1% |
| Llama 3 70b Instruct | 8k | 8k | 04-18-2024 | open-source | open-source | Top 5 | 82.0% | – | 50.4% | 81.7% |
| Llama 3 8b Instruct | 8k | 8k | 04-18-2024 | open-source | open-source | Top 15 | 68.4% | – | 30.0% | 62.2% |
Notice that in this table, GPT-4 models don’t have values for common benchmarks, but in many papers, other LLM platforms tend to compare their results with GPT-4. That is because the GPT-4 version mentioned in the original paper with common LLM benchmarks is outdated and retired by OpenAI.
In this article, we’ve presented some of the most powerful LLM models and platforms currently available. We’ve presented a comprehensive comparison using some model parameters, cost, and popular LLM benchmarks.
From what we’ve seen, there are lots of different language models out there, each made for different things. Some are really powerful, some don’t cost much, and some are free and open for anyone to use. It’s cool to see how many choices we have, depending on our needs.
As time passes, we’ll likely see even more new models popping up, giving us even more options based on what we need and can afford.