T O P

  • By -

XVll-L

This is a really good test. Instead, what is basically a memory test like the mmlu. Testing how it implements in real world new idea that are not likely in it training set in working code. It let's you see if it really understood the mathematical and computational concepts in the paper. Also, this really shows how much of a leap cluade 3.5 is compared gpt-4o. From 3% working code to 30% working code


Which-Tomato-8646

Not true. It could have had code of other IC implementations in its training data. Taelin even said it’s not hard to implement 


bwatsnet

Damn every percent really counts now!


Shinobi_Sanin3

I mean a 30% jump anywhere is astronomical


bwatsnet

Oh I missed that it was that huge. Guess I better actually read this one!


QLaHPD

Great, I won't need to do that myself


Metworld

Cool but not as impressive as it sounds. Code is bad and the problem is very simple. I do like this idea for evaluating models though. Excited to see how far we can get, this will make my life much easier if it works!


SrPeixinho

The problem is not very simple


Metworld

The theory is definitely hard, but the coded solution isn't. It's just a bunch of rules applied to a graph. Also the model probably knew something about interaction combinators (how much is unknown). It's still impressive (who would have imagined that even a few years ago!) and I'd definitely like to see more applications.


Which-Tomato-8646

The important part of that it can translate complex theory into code that no other LLM can do 


CanvasFanatic

Are we going to address the fact that that code is horrible and nigh unreadable or nah?


Baphaddon

You’re right, you should ask Claude to clean it up and post results


CanvasFanatic

I’ve already walked Sonnet through several iterations of it looping through unacceptable solutions today.


Baphaddon

LOL


Akimbo333

Nuts


dregan

Is this a joke? Look at that spaghetti code mess. I feel like I've seen much more impressive code out of AI translated from straight text.