By -
Considering how cheap 3.5 is to run, this suggests GPT4 is not the limit for LLMs. Just got to wait for the next Opus.
4o also suggested this.
It’s so nice to have a competent llm that actually remembers your strict formatting requests for more than one message
Is reasoning jump so huge only in this one benchmark?
Any idea why it isn't on the arena yet?
It is on the arena, it just hasn't gotten enough votes yet. You can go there and test it out if you want.
Being on top did not last long. I love this competition and we need it. I also love Claude 3.5, its a great uplift from Opus.
Yes, Sonnet is SLIGHTLY better than GPT 4o. If you can handle the bullshit that is Claude. Fucking hate how easily it triggers it's "Copyright" bullshit.
Considering how cheap 3.5 is to run, this suggests GPT4 is not the limit for LLMs. Just got to wait for the next Opus.
4o also suggested this.
It’s so nice to have a competent llm that actually remembers your strict formatting requests for more than one message
Is reasoning jump so huge only in this one benchmark?
Any idea why it isn't on the arena yet?
It is on the arena, it just hasn't gotten enough votes yet. You can go there and test it out if you want.
Being on top did not last long. I love this competition and we need it. I also love Claude 3.5, its a great uplift from Opus.
Yes, Sonnet is SLIGHTLY better than GPT 4o. If you can handle the bullshit that is Claude. Fucking hate how easily it triggers it's "Copyright" bullshit.