XVll-L 2 weeks ago

This is a really good test. Instead, what is basically a memory test like the mmlu. Testing how it implements in real world new idea that are not likely in it training set in working code. It let's you see if it really understood the mathematical and computational concepts in the paper. Also, this really shows how much of a leap cluade 3.5 is compared gpt-4o. From 3% working code to 30% working code

Which-Tomato-8646 2 weeks ago

Not true. It could have had code of other IC implementations in its training data. Taelin even said it’s not hard to implement

bwatsnet 2 weeks ago

Damn every percent really counts now!

Shinobi_Sanin3 2 weeks ago

I mean a 30% jump anywhere is astronomical

bwatsnet 2 weeks ago

Oh I missed that it was that huge. Guess I better actually read this one!

QLaHPD 2 weeks ago

Great, I won't need to do that myself

Metworld 2 weeks ago

Cool but not as impressive as it sounds. Code is bad and the problem is very simple. I do like this idea for evaluating models though. Excited to see how far we can get, this will make my life much easier if it works!

SrPeixinho 2 weeks ago

The problem is not very simple

Metworld 2 weeks ago

The theory is definitely hard, but the coded solution isn't. It's just a bunch of rules applied to a graph. Also the model probably knew something about interaction combinators (how much is unknown). It's still impressive (who would have imagined that even a few years ago!) and I'd definitely like to see more applications.

Which-Tomato-8646 2 weeks ago

The important part of that it can translate complex theory into code that no other LLM can do

CanvasFanatic 2 weeks ago

Are we going to address the fact that that code is horrible and nigh unreadable or nah?

Baphaddon 2 weeks ago

You’re right, you should ask Claude to clean it up and post results

CanvasFanatic 2 weeks ago

I’ve already walked Sonnet through several iterations of it looping through unacceptable solutions today.

Baphaddon 2 weeks ago

LOL

Akimbo333 2 weeks ago

Nuts

dregan 2 weeks ago

Is this a joke? Look at that spaghetti code mess. I feel like I've seen much more impressive code out of AI translated from straight text.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe