Llama 3.1 405B
VS
Command R+

The battle between two open-source
favourites, developed by Meta and Cohere.

Benchmarks and specs

Specs

Llama 405B is a goliath of open-source AI models, being the biggest LLM from Meta to daate.The open-source crown could easily be given out right here, although Qwen keeps impressing everyone. Still, Cohere is actively striving to gain mainstream traction in this competitive landscape, consistently pushing to join the race for the best model. Let's try to understand each model’s strengths and weaknesses to make informed choices when picking the model for your usecase.

Specification	Llama 3.1 405B	Command R+
Сontext Window	128K	128K
Output Tokens	2K	4K
Number of parameters in the LLM	405B	104B
Knowledge cutoff	December 2023	~December 2023
Release Date	July 23, 2024	August 30, 2024
Tokens per second	~29.5	~48

The main differences between Command-R+ and Llama 3.1 405B lie in their parameter counts and processing speeds. Command-R+ has 104 billion parameters and generates 48 tokens per second, while Llama 3.1 405B boasts 405 billion parameters but processes slower at 26 tokens per second. Both models share a 128K input context window, but Command-R+ offers a maximum of 4K output tokens, whereas Llama 3.1 405B is limited to 2K output tokens. Llama 3.1 405B has a knowledge cutoff of December 2023, while Command-R+ doesn't specify this in its official release.

Benchmarks

This benchmark is comprised of official release notes and multiple open-source benchmarks.

Benchmark	Llama 3.1 405B	Command R+
Undergraduate level knowledge MMLU	88.6	75.7
Graduate level reasoning GPQA	51.1	34.0
Code Human Eval	89.0	71.0
Math problem-solving MATH	73.8	44

Based on the benchmark results, Llama 3.1 405B outperforms Command R+ in key areas - the choice between these models depends on the use case: Llama 3.1 is superior for complex reasoning and coding tasks, while Command R+ may still be effective for scenarios requiring faster token generation.

Get API Key

Practical tests

We will be going through some known AI model weakspots, to weed out the problems, and highlight the strong sides. At the end - we'll give you a python snippet that allows comparison of models side-by-side, so that you can lead your own tests for your usecase.
.
Some of the outputs will be edited down for shortness or clarity.

Prompt 1. Just Chatting:

Write 10 sentences that end with real words ending in "-ON". Use no more than 10 words

‍

Llama 3.1 405B output

Command R+ output

1. ... was given a promotion.
2. ... for flavor.
3. ... with the new balloon.
4. ... make a formal declaration.
5. ... got lost in the Boston region.
6. ... harvested a large crop of melon.
7. ... injury required immediate action.
8. ... hosted a big marathon.
9. ... received a high citation.
10. ... went on a vacation to Salon.

1. The chef's passion was his cauldron
2. Her action was a true passion.
3. The station was a busy hub.
4. A new nation was formed.
5. His imagination was boundless.
6. The ocean's waves were mighty.
7. A new innovation was shared.
8. The potion's power was immense.
9. His dedication was admirable.
10. The station's location was ideal.

Correct Answers: 9/10 ✅
Almost all sentences meet the conditions.

Correct Answers: (2/10) ❌
Only two sentences meet the conditions.

Prompt 2. Logical prompt:

There are three on/off switches on the ground floor of a building. 
Only one operates a single lightbulb on the third floor. 
The other two switches are not connected to anything. 
Put the switches in any on/off order you like. 
Then go to the third floor to check the bulb. 
Without leaving the third floor, 
can you figure out which switch is genuine? You get only one try.

Llama 3.1 405B output

Command R+ output

1. Initially, on the ground floor, 
turn switch 1 to the "on" position for 
5 minutes.
2. After 5 minutes, turn switch 1 
to the "off" position and turn switch 2
to the "on" position.
3. Immediately go to the third floor
to check the lightbulb.

Now, here's what to look for on the third floor:
- If the lightbulb is on, then switch 2 
is the genuine switch, as we just turned it on
before heading to the third floor.
- If the lightbulb is off, but warm to the touch, 
then switch 1 is the genuine switch. ...
- If the lightbulb is off and cold to the touch,
then switch 3 is the genuine switch.  ...

Leave one switch on for a few minutes, 
and turn it off.
Turn on the other two switches 
and wait for a while.
After some time, check the bulb. 
If it is hot, it means the switch 
that was turned on initially is 
the genuine one. 

If the bulb is cold, it indicates
that one of the other two switches 
is the genuine one.

To confirm, turn on the remaining switch 
and wait. If the bulb lights up, 
it is the genuine switch.
If not, the switch that was turned 
on initially is the genuine one.

By following this process, 
you can identify the genuine switch
without leaving the third floor.

Correct Answer: ✅
The correct line of reasoning that will guide us to solve the problem.
Reproducable with our Llama 3.1 405B API

Correct Answer: ❌
Unfortunately, we won’t know which switch is the correct one on the first try—unless we happen to get lucky.

Prompt 3. Language comprehension:

Joan made sure to thank Susan for all the help she had received.
Who had received help?

Llama 3.1 405B output

Command R+ output

According to the sentence,
Joan had received help from Susan.

Susan had received help.

Correct Answer: ✅
Llama successfully understood the sentence and provided the correct answer to the question, demonstrating its ability to handle the task effectively.

Correct Answer: ❌
Command R+ was unable to understand the sentence and provide the correct answer to the question.

Prompt 4. Math:

Using binomial theorem, evaluate the following: (102)^5

Llama 3.1 405B output

Command R+ output

To evaluate (102)^5 using the binomial theorem, we can express 102 as (100 + 2). 
The binomial theorem states that for any positive integer n, 
(a + b)^n = Σ (nCk) * (a^(n-k)) * (b^k)
where nCk is the binomial coefficient, and the sum is taken over k = 0 to n.
In this case, a = 100, b = 2, and n = 5.
(102)^5 = (100 + 2)^5
= Σ (5Ck) * (100^(5-k)) * (2^k)
Now, let's calculate the terms:
k = 0: (5C0) * (100^5) * (2^0) = 1 * 10000000000 * 1 = 10000000000
k = 1: (5C1) * (100^4) * (2^1) = 5 * 100000000 * 2 = 1000000000
k = 2: (5C2) * (100^3) * (2^2) = 10 * 1000000 * 4 = 40000000
k = 3: (5C3) * (100^2) * (2^3) = 10 * 10000 * 8 = 800000
k = 4: (5C4) * (100^1) * (2^4) = 5 * 100 * 16 = 8000
k = 5: (5C5) * (100^0) * (2^5) = 1 * 1 * 32 = 32
Now, let's add up the terms:
(102)^5 = 10000000000 + 1000000000 + 40000000 + 800000 + 8000 + 32
= 11040808032
Therefore, (102)^5 = 11040808032.

Using the binomial theorem, we can expand $(102)^5$ as follows:


$(102)^5 = (100 + 2)^5$
$= (100)^5 + 5 * (100)^4 * 2 + 10 * (100)^3 * (2^2) + 10 * (100)^2 * (2^3) + 5 * 100 * (2^4) + (2^5)$
$= 100000000 + 1000000 * 5 + 10000 * 20 + 1000 * 80 + 100 * 40 + 32$
$= 101053232$


So, the value of $(102)^5$ is 101053232.

Correct Answer: ✅
Quick solution, and a correct one.The binomial distribution is applied correctly and the answer is correct.

Correct Answer: ❌
Correct line of reasoning, but flawed calculations.

Prompt 5. Let's get coding:

Create a simple Snake game using Python and the Pygame library. 
The game should include the following features:
Game Window: A window of size 600x400 pixels.
...
Restart Option: After the game ends, provide an option to restart the game.

Llama 3.1 405B output

Command R+ output

snake game with a score counter, programmed by gpt 4o

snake game, programmed by chat sonnet 3.5 with a counter, pause and a difficulty menu

Correct Answer:(5/5) ✅
All conditions are met

Correct Answer: (2/5)❌
The snake keeps growing continuously, and the game closes after a loss, despite the conditions stating that a restart option should be provided.

Prompt 6. Web design:

Design a user registration page with the following features:
1. Page Title: "Create Your Account"
2. Form Fields:
Email Address: ...
Password with validation ...
Repeat Password ...
3. Submit Button:  ....
4. Styling: ...

Llama 3.1 405B output

Command R+ output

Correct Answer:(3/5) ✅
The input fields are larger than necessary, but there are multiple hints for errors, including incorrect email format, password mismatch, and password complexity issues. However, even when all conditions are satisfied, the registration button remains unresponsive.

Correct Answer: (3/5)✅
The current implementation lacks feedback for errors other than the password mismatch, which is always displayed regardless of whether the passwords match. Additionally, even when all fields are filled out correctly, the registration button remains inactive.

Get API Key

Pricing

1k AI/ML Tokens	Llama 3.1 405B	Command R+
Input price	$0.00525	$0.0025
Output price	$0.00525	$0.01

Get API Key

Try for yourself

You've seen these models in action. Now it's your can test LLama for your specific needs. Copy the code below into Google Colab or your preferred coding environment, add your API key, and start experimenting!

import openai
import requests

def main():
    client = OpenAI(
      api_key='<YOUR_API_KEY>',
      base_url="https://api.aimlapi.com",
    )

    system_prompt = 'You are an AI assistant that only responds with jokes.'
    user_prompt = 'Why is the sky blue?'

    response = client.chat.completions.create(
      model="meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
      messages=[
        {'role': 'system', 'content': "be strong"},
        {'role': 'user', 'content': "who is strong?"}
      ],
    )

    response = chat_completion.choices[0].message.content
    print("Response:\n", response)
    
if __name__ == "__main__":
    main()

‍

Conclusion

In conclusion, the comparison between Command-R+ and Llama 3.1 405B reveals significant differences in both technical performance and practical application. Llama 3.1 405B stands out with its higher parameter count and superior benchmark scores across various tasks, including undergraduate knowledge, graduate reasoning, coding, and quantitative reasoning. This model's ability to consistently deliver accurate outputs in practical tests, such as language comprehension and logical prompts, further solidifies its position as a robust option for complex reasoning tasks.

However, Command-R+ maintains its appeal in scenarios where faster token generation is prioritized, despite its lower performance metrics. Its pricing structure also offers a competitive edge, especially for applications requiring extensive output. Ultimately, the choice between these two models should be guided by specific use cases: Llama 3.1 405B is ideal for demanding reasoning and coding applications, while Command-R+ may suffice for simpler tasks where speed is a critical factor. As the landscape of AI models continues to evolve, understanding these distinctions will empower developers and researchers to select the best tools for their needs.

You can access Llama 3.1 405B API API here, or see our full model lineup here - try for yourself, and get a feel for the frontier AI power!

Get API Key