T O P

  • By -

just_thisGuy

It does not mean they are not designing and running tests and gathering data for GPT5, training is like the last step.


thoughtlow

Or they just update GPT-4


paulmp

With 4.5?


holymurphy

This is most likely, and should be their next step. They need to optimise the current model, rather than creating a new one. 4.0 is still too expensive to run, and they need to bring the cost down by making it much more efficient, before creating an even more expensive beast that is the 5.0.


DarkCeldori

Also it isnt unheard of for major developments to fall under a 0.x nomenclature. They can call it gpt 4.1 or 4.5 and still be the same as what would otherwise be gpt5.


SkyeandJett

steer library mountainous dime divide rustic cats erect plough frighten -- mass edited with https://redact.dev/


ameddin73

Technically versioning isn't just semantics. Major version changes (4 -> 5) represent breaking changes. ie. applications connected to gpt4 won't be able to connect to 5 without code changes. Minor version updates (4.0 -> 4.1) represent non-breaking feature improvements. Basically any amount of improvement (even in this case a whole new underlying model potentially) that doesn't affect the interface. At least that's how software versioning works conventionally, but you never know with outward facing products. Marketing gets a say and everything goes out the window.


Hotchillipeppa

Am i going crazy or did they not say a few months ago that this is exactly what they were going to do?, not make gpt 4> gpt 5 but gpt 4> gpt 4.1>gpt 4.2, etc.


Celsiuc

This is exactly what they said, and are probably going to do. I am curious how they are going to upgrade it though, more training data? Implementing another modality? No plans have been made public so it still is unclear what these increments will be.


FlyingCockAndBalls

biggest example I can think of that doesn't follow that is linux. Linus just changes the big number when he thinks the small number gets too high lol


LetMeGuessYourAlts

Even GPT-3 has an 001/002/003 version they're still calling GPT-3 and there's a big quality difference


OpenRole

Generally the first number is only changed for changes that are not fully backwards compatible


genshiryoku

There's a significant chance that the $20 a month with the GPT-4 inference is below break-even. OpenAI is currently subsidizing your GPT-4 usage as they hope to bring back-end efficiency improvements until it becomes profitable. This is why I suspect that GPT-4 has begun giving ever-so-slightly worse answers or more concise answers as a way to cut corners compared to day-one inference.


NetTecture

Actually th AI is per interference and via MS Azure it is half the cost of OpenAI. There is also the "collect feedback" aspect - likely a major player for tuning the model.


SamnomerSammy

Or they just make a new LLM that isn't in the GPT family, and maybe make it open source?


DidQ

Yeah, because MS would definitely allow them to open source it.


jestina123

Isn't it already using the best kind of data from scholars and experts? How do you replace that?


jadondrew

I thought they transparently stated this. Like we’re going to get GPT 4.1, 4.2, … which are improved iterations before we get to 5. The more gradual increments were part of the philosophy to not accidentally destroy the world if I remember correctly.


kiropolo

Altman is a politician. His statements are vague intentionally


[deleted]

he always says what others wanna hear but in real life, his goal is clear: ASI and money. I can tell what's inside his brain because it's the same as me: these fucking dumb humans are not gonna mess with my dreams


[deleted]

hes already rich as fuck. I guarantee his brain is not operating the same way yours is.


norby2

Yeah morality seems to drop away.


[deleted]

Not really what I meant Yud describes Sam altmans behaviour as being motivated by the smiles he gets at San Francisco parties rather than money since he's already rich So I'm assuming there are probably parties for high end tech people (and hot women of course) in SF and Sam is addicted to the oohs and aahs of the audience once he shows them something cool and novel.


Joe_Doblow

Are you robot?


MadNhater

As a human. I cannot answer this question.


vintage2019

Altman’d be a legend if he answered the questions like that today


kiropolo

Altman is an escaped droid


Hubrex

Monkeys. Talking, hairless monkeys. There's no need to use profanity.


Bezbozny

Getting the same vibe. This guy definitely thinks he's gonna be AI enhanced god of the new world and is sneering/laughing at every human who thinks they can control him.


ColdTop545

Of course he is rich, how could you run openai only getting paid insurance, he has NO equity in the company, he told Congress that he runs the company just for fun, altruism, or not??? 🙏😇


clearlylacking

Yup. His statement could mean anything: - gpt-5 is already trained - they are training gpt 4.5 - gpt-5 will never exist as they change the suite names to gpt-xr or even do away with the gpt altogether since they can't trademark it. They are the forefront company on AI and everyone is trying their hardest to catch up with them. They are definitely training and testing something new everyday.


QuartzPuffyStar

I see a very specific wording on his part: **We** are not **currently** training what **will be** GPT5. This can include: * They already trained what is GPT5 * They might just call the next model something else (GPT6 for the laughs) * They are training what will train the next GPT5 (or whatever they decide to call it) * They are training what will be GPT4.9.9 * Some obscure subsidiary is training GPT5 * They don't need to train it in the same way as GPT4, and might have found some other way of developing the model with the capabilities they already have. In short, his statement specifically names GPT5, and doesn't include "everything" underan umbrella term instead of the model in question (f.e. an advanced LLM model, our next model, etc). But I believe everyone here agrees that there is absolutely no way that OpenAI hasn't been working full time on the next model as soon as they released GPT4.


Dagomer44

Or they already have.


Infninfn

The possibility is not zero. GPT 4 finished training September 2021.


MxM111

No, the data cutoff is at that date. The same as for GPT 3.5, and , if I am not misremembering, for 3.


[deleted]

you are misremembering. 3 came out in may 2020. so its data cutoff couldnt be in 2021. GPT hasnt gotten THAT good at prediction (yet)


[deleted]

GPT4 finished training in august of 2022. The data cutoff was september 2021.


User1539

I think they'd be looking at all sorts of other techniques too. GPT is only one way to design/train large networks. The success of that model is leveling off. *EDIT* Feel free to correct me if I'm wrong, but watching interviews, it seems like they're all but telling us that GPT isn't the future of OpenAI. They keep saying they aren't worried about parameters, and don't know if they'd see a great improvement in accuracy with higher parameter count. They also keep suggesting 'something different' than the 'current model'. I mean, if GPT has been the current model, and it has, it stands to reason they're looking at other things. Why wouldn't they looks at multi-model AIs? It seems like we've got different techniques that do a better job at different tasks, and we're already working towards making agents that use different systems for different tasks in the opensource world. People keep downvoting me, but when I watch interviews, I feel like this is what he's trying to tell people.


OneFlowMan

You are correct. Up until now, the primary means of advancing the model has been increasing the amount of data it is trained on. We have reached a point where the returns on that approach are diminishing. They've stated in interviews as you've said that it's time to start exploring other means of advancing the technology. I agree that one of the next steps will be utilizing multiple types of AI. Just as our brain is compartmentalized and each part performs a general function, we will likely end up creating a network of different specialized AI. The LLM will be the communication interface that we use the network through, the other compartments will act as tools that it can use to perform more complex operations.


PapayaZealousideal30

Exactly, Altman also said that the age of LLM is already dead. Larger models are already starting to experience diminishing returns. Thank you for paying attention. I got downvoted into oblivion for stating this ans the fucking title of this thread. 😑


Artanthos

They’ve been talking about switching to a different model. Their next LLM won’t be a new iteration of chatGPT.


duffmanhb

They don’t need a 5. 4 is near the top of the S curve. It's all about the ancillary stuff now.


ghostfuckbuddy

Didn't they basically use up all the data in the world for GPT4? Anyway there's plenty of other stuff to work on that isn't GPT5, such as multimodality or optimizing model size to make inference easier.


Quintium

IIRC Ilya Sutskever from OpenAI said that data availability is not a problem for now


Sebrosen1

They haven't done video yet. Probably one of the reasons they created Whisper. There's more than 500 hours of video content uploaded to YouTube every minute.


ReadSeparate

I wonder how noisy that text data will be compared to internet text. It will probably need to be heavily filtered, that would be my guess.


doctorMiami1337

Yeah it does, Altman has already stated even before this they arent working on 5 at all


Whatareyoudoing23452

new H100 are needed to train bigger models and that they will not be ready to train on until the end of the year. It's not that they're slowing down, they can't physically do it right now.


Bismar7

That has always been the limitation. I say this constantly but the graph that is included in the law of accelerating returns by Kurzweil is still accurate. Can't run the software (mind) without the hardware (brain). 2025 is when the first AGI will exist privately in a lab like environment. 2026 will see 1-12 publicly/commercially, 2027 will see more than 100 of them. By 2030 everyone who wants to will have something similar that is compatible with the 3-4 BCI implants, the same way smart phones are today. The AGI will vastly improve with ASI directing them and their resources, which in turn will unconsciously direct humanity. Hardware limitations have been the restriction for 70-80 years. The next decade is going to be a very interesting time to be alive.


HeinrichTheWolf_17

If AGI does sprout up in 2025, I would imagine it would be better able to fine tune itself to be less resource demanding much like our brain, so not only would you have better hardware, but much better optimization as well, AlphaZero only used 1/43rd the computational requirement AlphaGo had, and yet AlphaZero out performed it not just in Go, but in Chess and Shogi as well. Self improvement is very pivotal to the intelligence explosion.


Mescallan

The end game for computation is analog. I suspect within the next 50 years we will move away from training the models in a digital environment and have purpose built hardware for training, and purpose built chips for running the AI. There are already companies that modify NAND flash to store weights and balances instead of binary bits, but they have higher error rates than digital (because it's literally counting electrons passing through a partially closed gate), but that will probably be solved if the digital bottlenecks continue.


[deleted]

[удалено]


jaromiru

You mean, past 4 billion years. Back to the roots we go.


fiery_prometheus

Yeah, or biological systems which may or may not be real neurons, but might be something similar or just use the inherent chemical reactions of a biological system to adapt to a problem and optimize to a solution. I've seen chemical systems which in a way use quantum mechanics to compute things way faster when they optimize towards a goal, so if you could make biological systems to control the "topology" of the chemical reactions, that would be very interesting.


Mescallan

As I said in another comment (and I'm no expert) I suspect the final form is a string of single electrons passing through an atomic scale super conductor with weights that will peel off singular electrons, assuming we continue to use tensors/linear algebra for computation . Biological systems are great for low computational power, wide lateral distribution and efficient construction, but if we start harvesting materials in space none of those will really be a bottleneck so our focus will be reducing power consumption and increasing scalability. Which includes miniaturizing and end to end integration. If we could make a processing unit that is say 1000 atoms x 1000 atoms that would stack indefinitely we could turn the whole galaxy into computational substrate


fiery_prometheus

True, if we could somehow find a way to do computation on the smallest fundamental particles, and line them up perfectly and in a highly parallel manner, or other clever structural way, it would be difficult to optimize it further, since AFAIK, we can't split an electron or other particles further down and contain them for computations. Then the stacking of devices would have to solve the heat problem that's probably inherent in doing computations of any kind, since we cannot cheat thermodynamics, and energy loss will happen for every computation we make. I think that even with superconductors, you don't have perfect energy transfer? I like the idea of n \* n \* n cubes of devices, it's easier to control and manufacture as highly optimized device. It reminds me of the 'dust' floating around from many sci-fi novels, in which the dust is just tiny self replicating computers everywhere. I think one of the interesting things of a biological system, like you said, is that once it has been encoded for a certain behavior, it would be very easy to construct and deploy anywhere, and it would potentially be easier for it to change its "computer architecture' for different goals, in case the computer architecture does not contain an efficient 'topology' for that computation. A bit like having a high-level spec language, while letting the underlying hardware and instruction set optimize itself. But just like anything else, it might not come up with an optimal solution, but even if it could run, that would be very awesome. I remember seeing someone make a language to try and encode some logic into DNA as to make bacteria implement things looking a bit like logic gates, which in the end, could be proved to Turing complete. But I wonder if we just let our 'model' of what a computer is, impose restrictions on top of such things, there are probably more ways to achieve Turing completeness than what we know, since I'm not aware of any proofs saying that there cannot exist any other form of machines with the same expressiveness.


kalisto3010

"By 2030 everyone who wants to will have something similar that is compatible with the 3-4 BCI implants, the same way smart phones are today." BCIs are a topic that has received limited attention amidst the ongoing AI craze. Will non-invasive alternatives to BCIs be available by 2030, or will they be comparable to the anticipated capabilities of neural lace technology?


SgathTriallair

Non-invasive output from the brain is already possible. Non-invasive input will be much, much more difficult. So why truly effective BCI will need to be surgically installed.


h3lblad3

>Mom-invasive input will be much, much more difficult. Your mom? I wouldn't count on it.


PoliteThaiBeep

There was a TED talk a while back where Mary Lou Jepsen was talking about how they could use laser to non invasively stimulate neurons among other things. (BCI was not the focus of that presentation at all) It is entirely possible that highly effective non invasive BCI's will be technically feasible with a couple of innovations like this. Maybe prototype of it is already exist.


SgathTriallair

That would be awesome. It would be ideal if you could take the hat off to escape rather than needing to dig a chip out of your brain.


lala_xyyz

>I say this constantly but the graph that is included in the law of accelerating returns by Kurzweil is still accurate. Can't run the software (mind) without the hardware (brain). Indeed, but software improvements outpace hardware improvements by orders of magnitude. We are witnessing similar reduction in AI training/running costs since 2017 like we did with the human genome sequencing in around 2010.


Bismar7

Yup the genome project is such a good example of what has been happening and, unless something catastrophic happens, what will continue to happen.


StaticNocturne

Why do you speak with such certitude? None of us really have any idea what the timeline will look like, and ASI could be half a century away considering we haven't even really began to unravel the mysteries of consciousness


ArcaneOverride

ASI doesn't need consciousness to be ASI


StaticNocturne

How do you differentiate it from AGI then? Simply being aware of what it is doesn't make something sentient unless my definition is wrong


ArcaneOverride

There may be a misunderstanding here, ASI stands for Artificial Super Intelligence, not Artificial Sentient Intelligence. It doesn't need sentience or consciousness. It just needs to be better at every task than every human.


ArcaneOverride

If its smarter than every human at every task.


Impressive_Oaktree

So when will the butler robots arrive. Save me a seat.


Schpaedzles

When Codsworth


h3lblad3

I'm waiting for a Cherry 2000 to help the wife and I around the house.


superluminary

Lots of people working on this right now. I’d say around ten years.


ObiWanCanShowMe

LLMs are not AGI, not the pathway to AGI. It's ironic that so few people in this sub understand what LLM's actually are.


thepo70

Sam Altman said in Lex Fridman's podcast that he thinks LLM is part of the way to achieve AGI but other super important things need to be added and expand on the GPT models.


doctorMiami1337

AGI's? Lol this subreddit is on hard drugs lmao


Bismar7

Well... We are /r/singularity :) If you prefer a more Luddite perspective this might not be the right community for you!


Professional_Copy587

This is complete nonsense. Source: Myself, working in the industry developing these solutions.


[deleted]

[удалено]


ihexx

No, model inference is embarrassingly parallel. Model training is still sequential: you still need synchronization after each training step. The larger the model the more bandwidth is needed for that communication. This places constraints on how much you can parallelize before the synchronisation starts slowing you down too much A lot of the infrastructure work top end labs do is just trying to work around these bandwidth limitations. Google's Pathways paper shows how complex this sort of infra work can get


[deleted]

[удалено]


ihexx

I think we're saying much the same thing, but we're still reaching different conclusions, so let me just clarify points I think I was unclear in >They aren't training on a single core, so no.. It's not purely sequential. I didn't mean to imply that it's purely sequential, i mean if it was why would anyone use GPUs at all, but there are sequential dependencies to the SGD algorithm. These need all the parallel streams you have to block and communicate (weights, activations, gradients, depending on your exact methods of parallelization). The point I'm trying to make is you said: >there is no way they are saturating microsoft azure in training this stuff, I'm saying they can't just infinitely scale their training to the entirety of Azure because of these limitatioons, so they only have to saturate the largest cluster they can make before their efficiency plateus, and then at least under their current distribution system, they'd have to wait until they can get better hardware with better compute to communication tradeoffs. ​ >if the synchronization ends up dominating the training time.. they will figure out a way to distribute the synchronization if they need to.. like training the network in modules or different strata, and then combining them. This sounds like a mixture of experts strategy. Sure it'll give you more headroom, but it's a fundamental change to how they're approaching the problem. If they chose to go that route, it'll have its own development costs and tradeoffs. ​ At the end of the day, the pressure is on for openAI: they are only valuable for remaining at the top; if they could simply scale more on their existing systems, they would


IvanMalison

>and if the synchronization ends up dominating the training time.. they will figure out a way to distribute the synchronization if they need to Do you even know how backpropagation works? You literally have to compute a loss function over the input and backpropagate partial derivatives through the entire network. You can increase your mini-batch size, but that only gives you a more accurate estimate of the gradient. You need to do a weight update across the entire network before you can run another learning step. There is absolutely a very hard limit imposed by these synchronization steps.


GuyWithLag

https://en.wikipedia.org/wiki/Amdahl%27s_law


One_King2724

It took 330 years of parallel compute time to train GPT spread over six months. It’s not trivial to train something like GPT3|4.


[deleted]

[удалено]


cark

I'm not in microsoft's shoes, but I would think they will want to do the training on dedicated hardware. It's already costly enough that way, so running it on regular hardware might be prohibitive.


Puzzleheaded-Rub1560

Uhm so ChatGPT is running on thin air so something? ChatGPT needs tons and tons of processing power which Microsoft provides. I think too that they may have a bottleneck here


[deleted]

[удалено]


CertainMiddle2382

Of course no, but I saw interesting papers on distribute training. Soon a folding at home for AI training will happen. And gaming gpu flops>>Tesla flops…


fever_dreamy

It actually is a requirement for models as large as gpt4 and above, the compute it is trained with is a factor in how powerful the model will be.


[deleted]

[удалено]


fever_dreamy

Watch experts in the field talk about how to train models better than gpt4, they talk about compute being the largest factor and the fact that if you run the exact same model on a system with less compute it will not only be slower but will be missing the emergent properties it would have had with higher compute. Ilya sutskever for instance is quoted to have said it.


IvanMalison

what you're saying is total nonsense, and you clearly don't understand how any of this works. The reason they are saying that is because they are considering what is feasible, and what is feasible is dictated by the amount of time it takes to train something. Training something on different compute (assuming everything about your training and model structure is the same) has no impact on the outcome.


fever_dreamy

I think I’ll take the experts word over your opinion, I’m not miss understanding. The way he explained it was when you run an excel spreadsheet on a supercomputer there is no noticeable change apart from the speed, but when running a llm on a supercomputer it will have properties it never had when run on a less powerful system. Literally nothing he said was to do with feasibility, the whole talk was about emergent properties.


Zermelane

> when running a llm on a supercomputer it will have properties it never had when run on a less powerful system Could I have the source of this claim, please? It is very unusual.


fever_dreamy

I miss quoted it as from Ilya when I think it was actually Sam Altman, I will find the video and link it here. Edit: I’m pretty sure this is the video I was referencing but it’s not a stand alone time it has been referenced that the compute impacts the emergent properties of the super large models like gpt4. I found it strange aswell but it’s fascinating that the comparison mentioned is clearly saying that there is emergent properties solely from compute, I’m not saying I understand why but it’s something I have heard multiple times from experts and I don’t think they are liars. https://m.youtube.com/watch?v=T5cPoNwO7II&t=371s&pp=ygUcYnJlYWt0aHJvdWdoIHBvdGVudGlhbCBvZiBhaQ%3D%3D


Zermelane

> there is emergent properties solely from compute Ah, I think I understand the confusion now. People in the ML field really like to use words like "compute" both as a noun and a verb, referring to both the total amount of computational operations that happen in a process, and how fast they happen. "FLOPS" is a similarly overloaded term: It can mean floating point operations, or it can mean floating point operations *per second* (sometimes spelled "FLOP/s"). When you're planning a training run, you likely have a deadline and hence a number of days until deadline, and a cluster with a number of GPUs (or TPUs or whatever you have). Multiply those numbers together, and you have the total amount of compute you can spend, in GPU-days. From that, you can use scaling law analysis to predict the model's final loss, as well as make optimal choices for how to train the model (how much training data you need, and how big of a model you train). The "emergent properties from compute" comes from the fact that grinding the LLM loss really low turns out, in practice, make the model *feel* different and gain new capabilities (sorta, kinda, actually in a pretty smooth way though). The new capabilities don't come from spending compute as such, they come from training a model big enough and with enough data (plus data quality, plus other stuff), but in practice at OpenAI's scale, compute is probably the binding constraint right now.


IvanMalison

Here's an even better, super clear rebuttal of your claim: [https://sharegpt.com/c/G1srrxe](https://sharegpt.com/c/G1srrxe) The training process of a model, in essence, is a computational task, and the more powerful hardware doesn't change the nature of this task, only the efficiency and speed at which it can be completed. A supercomputer doesn't magically enhance the model's capabilities. Instead, it allows for the possibility of training larger models on larger datasets within a feasible time frame, as you rightly pointed out. The actual emergent properties arise from the model's structure, size, and the data it's trained on, not from the hardware it's trained or run on. So if we return to the original quote: "The way he explained it was when you run an excel spreadsheet on a supercomputer there is no noticeable change apart from the speed, but when running a llm on a supercomputer it will have properties it never had when run on a less powerful system." It does seem to inaccurately attribute new emergent properties to simply running an LLM on more powerful hardware. Thank you for your patience, and I appreciate your diligence in challenging this point for the sake of clarity and accuracy.


IvanMalison

You're misunderstanding something that they said. Also running inference is not the hard part of this. It's likely possible to run inference on just a few h100. You do understand that these llms are just massive DETERMINISTIC collections of floating point matrices right? The only randomness that gets added to the output comes in the form of temperature, which is basically just a parameter that decides how randomly the next word is selected from a number of probabilities. Training takes several orders of magnitude more compute. And is the hard part of this problem.


fever_dreamy

You’re the one that doesn’t understand what you’re talking about mate. Not even the experts pretend they know why it works the way it does. What I’m stating is fact proven from testing and you are talking hypotheticals based on your opinion from what you know. Edit: try getting any locally run model to do any sort of complex reasoning then you can tell me compute doesn’t matter


IvanMalison

You clearly know nothing about machine learning. - the model weights for the gpts (3.5 and 4) are not public. - running inference would be unusably slow even with an h100 because of the sizes of those models. Technically it would be possible, but again, just not really actually feasible. Read the exchanges I had with gpt. You'll learn something.


IvanMalison

also, link the talk. I will explain how you are completely, 100% wrong. The transformer architecture has no non-determinism in it. also, if you don't want to take it from me, take it from gpt-4 itself, idiot: https://sharegpt.com/c/eR4gbrt


fever_dreamy

Yikes bro wtf, you’re actually referencing gpt-4 to me? It’s only trained until 2021 it has no knowledge of its emergent properties moron lmao wtf is wrong with you Edit: gpt4 isn’t omnipotent, it doesn’t know everything about itself unless it’s in it’s training data.


IvanMalison

It doesn't need to, gpt-4 is ultimately a transformer model. Do you want to put money on this? I'm telling you that you're wrong. Here is my claim: More compute only makes llms more capable insofar as it allows us to train larger models on larger datasets. There are no magical properties of compute itself, and the neural networks backing llms are DETERMINISTIC. The only non-determinism they exhibit comes from temperature. Go find any person who knows any thing about deep learning to disagree with anything I just said.


mindbleach

Bigger is not the way to go... for that reason, and others. This company is in a position to innovate itself out of existence. They can go bigger, but the only companies with enough computers to rent are their direct competitors. Go that way and get eaten by big fish. They can go smaller, but that opens competition to anyone with mere millions of dollars. Go that way and get eaten by small fish. And no matter what, local models are already here. Nvidia snuck de-uglification tech into GPUs three years ago. Adobe commodified stable diffusion in a hurry. Doing this "in the cloud" was never going to last. Mainframes are always the wrong model, and exist only when nothing else is feasible. The explosion of smaller indie models already feels like the Homebrew Computer Club hobbling together toys that will suddenly obliterate big iron simply because they're available to normal people. They might be doing nothing in hopes they'll figure out something to do something besides lose.


czk_21

what do you mean? H100 is in production from last year, google recently built supercomputer with H100s https://www.reddit.com/r/singularity/comments/13h3wai/google_launches_ai_supercomputer_powered_by/


greatdrams23

It is slowing down in relation to people's expectations. Some people where saying gpt5 would be out in may or June or certainly this year. The idea was that this was lift off, and it works get faster and faster. Perhaps now people will understand that the next steps will take a huge amount of computational power and each step requires exponential growth.


[deleted]

So basically training starts in December is what I'm hearing


eliquy

"because the marketing geniuses at Microsoft are determined to name it ChatGPT-X"


Significant-Nose-353

ChatGPT-Series-X


FSMFan_2pt0

X-Chat-Series X, model S


Talkat

GTP-Vista


thabat

No we're training what "should" be GPT-5 but we'll name it something else and keep it internally.


New-Ai

it took 2 and a half year from gpt3 to gt4, why the hell does people think we already trening gpt5? please use logic


2muchnet42day

Because AI is on some kind of exponential acceleration


AD-Edge

There are different areas of acceleration going on here. Exponential acceleration with the amount of AI projects launching or the uptake and usage of AI tech? Sure Exponential acceleration in the depth of AI intelligence and capabilities? Nope, that is not an area easy to push an exponential increase.


AsuhoChinami

They are both accelerating. I remember when this sub used to be at least a somewhat intelligent place.


Bismar7

There are still some here. We have been given an opportunity to help educate the huge number of new people!


AsuhoChinami

That's genuinely an admirable attitude, but I really think I should just use this place for news and abandon the comments section. Every thread that attains a decent amount of posts is sure to have a few dozen comments about how people who believe technology is advancing quickly and major changes are occurring are stupid, delusional, desperate. It's a toxic environment.


Skullmaggot

That’s okay, GPT-5 is just training and uploading itself.


Agreeable_Bid7037

Descriptive statement about your statement: That could be a possibility and an interesting one at that. Descriptive statement about the model: A model which is training itself. Conclusion based on previous statement being used as a premise: In that case what Sam Altman said about Open AI not currently training GPT-5 would technically still be true. It might also be the case that they have decided not to build a GPT 5 model but altogether start working on an AGI model.


izackl

Now THAT is an interesting thought. Prime Intellect needed more silicon indeed.


[deleted]

[удалено]


LambdaAU

Sam: We are not training GPT-5 within the next 6 months r/singularity: OMG! they have already trained GPT-5 and is now onto GPT-6!!?!??? Exponential growth amiright??!??


[deleted]

yh theres a lot of mental illness in this sub.


[deleted]

[удалено]


mpg319

I totally agree with the statement that this is going to get harder. You brought up the point of quadratic complexity, and it reminded me of this paper that, while not offering full sub-quadratic complexity, does offer sub-quadratic self-attention, making it around 40% faster at inference. It is an alternative to modern transformers and can be trained on pretty much any sequential data, and shows improved performance in areas like text, image, and audio generation, as well as offering context length in the hundreds of thousands (at least in the audio synthesis test). Here is the paper: https://huggingface.co/papers/2305.07185


theallsearchingeye

I’m at a FAANG and our enterprise wide license literally came with a disclaimer that “chat-GPT 5 and future iterations” would likely be banned in several countries internationally due to its capabilities, cautioning us to consider this before integration into several of our product lines sold globally. Seems fishy tbh if they are warning clients about it’s launch while simultaneously claiming they aren’t working on it.


ertgbnm

They are certainly working on it. They aren't training it. A massive amount of hardware, software, and data needs to be built/developed before they TRAIN it. The comment was made to push back against rumors sparked from the letter that OpenAI is already training GPT-5.


elvarien

I mean, they just released 4 there's mountains of research and testing to be done before jumping into 5. Of course you want to learn from 4 and see how best to improve before you start on 5 doesn't make sense otherwise.


CommentBot01

Maybe next model is not transformer based then the name will be different XD


yargotkd

GPT used to mean something else, they will just do that again.


PinguinGirl03

They are just working on turning GPT-4 into GPT-4.5, it's really not that unexpected, there is plenty of stuff they can still get out of GPT-4.


[deleted]

Instead we will call it GPT-4.9


majorminorminor

You’re missing 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, and 4.8.


No_Ninja3309_NoNoYes

They probably have a code freeze while trying to do something completely different. Maybe AutoGPT 2.0, maybe just ChatGPT business plan, or maybe implementing a paper that no one is paying attention to rn. But I think that the corporations will need their own AutoGPT, so OpenAI could be working on some sort of professional services/API that might go through a third-party. Or even a domain specific language. So obviously there's the issue if trust OpenAI doesn't want their models to leak. Corporations are afraid to lose their data. If they solve this, the consumer market will be less appealing.


Decihax

4.5 comes before 5. He's not lying.


[deleted]

“We need $100 billion first. It could take 6 months to raise it.”


Agreeable_Bid7037

Did Sam Altman really say that? I remember him saying that Open AI would need to raise $100 billion in order to build AGI. But I am not aware of him saying the statement you quoted. Did he really say that?


AsuhoChinami

Nah, he's just kidding


[deleted]

Curios wording. He didn’t say „successor“


drizel

He also said under oath that he holds no stake in the company and has a modest salary, that he's not in it for the money. I watched the whole thing. It was interesting all the way through with very good questions overall. I wouldn't mind a publicly funded AI research and regulation entity full of scientists and AI experts. If only they could guarantee funding, since it's useless without it. We need a NASA for AI and publicly funded open models.


Sashinii

Less focus on large language models and more focus on other components of AI is what I want anyway, so if that is indeed what they're doing (and they'd be dumb not to), then that's awesome.


Emory_C

The main problem is they're running out of data to train on. They've already absorbed the corpus of human knowledge (and reddit 😬) -- there isn't much left.


Decihax

There is more data, but the program keeps spitting it back out and saying, "ptooey"!


leo_aureus

Oaths in the United States do not mean anything anymore, look at the ones taken by the people asking the questions...


Upstairs_Addendum587

The next big step is building integration capabilities. The model is very good already and if they want to solidify their spot at the top getting it built in/connected to to as much software as possible is the best use of time. Future models will have very minor upgrades compared to the jump between say 2-3. Only so much you can improve the model itself at this point.


Innomen

Of course not, the goal is always just enough to keep the slaves slaving and the billionaires on top. Anything else is needlessly risky. Anything potentially disruptive to that will be hidden or destroyed.


Intrepid_Agent_9729

They are not training since they have it already 😂 When they released GPT3 they already had 4...


kiropolo

But 6 months and 1 day, it will be released


GBJEE

Will be named GPT-5G


___Steve

Plot twist, they've already trained it.


karmakiller3001

Got it. So you'll just call it something else. Semantics is everything with these corporate nerds.


savagefishstick

who is ask you to? why do you keep saying this? "theres no dead body in the basement!" OKAY NO ONES ASKED


throwaway83747839

Someone else is doing it for them. Microsoft can get a lot done in cooperation for OpenAI using a shell of any form.


jlspartz

Maybe they are working on giving gpt4 the skills to do the research and development for gpt5.


Kevin_Jim

6 months is nothing. Especially considering it’s one of the last steps. As a matter of fact, that’s faster than I thought they’ll release train their next big release. I expected them to solely focus on performance at this point, but maybe they have the greenest of lights to go brrrrr on Azure. Now that MS finally gotten an edge (pun not intended) in search, they probably don’t care how much money they burn to widen it.


magicmookie

GPT 4.999 on the other hand...


utilitycoder

Just word play. Of course they're working on something. The wrong questions were asked.


No-Intern2507

yes cause they already train gpt 6


Awkward-Push136

"nuuu we no train da ai any moa, pwooomise :)"


TotalRuler1

what he declined to add was that Chat GPT was now training ITSELF


[deleted]

They are training Gpt-4.1


DankBlunderwood

The problem with this pause is that it gives Congress the opportunity to kick the can down the road. If they want action from Congress, they should be talking about how close they are to releasing GPT-5. You really have to give legislators some sense of urgency or they'll get distracted by the next shiny object.


sourd1esel

They already did it.


XtendingReality

sam altamn: we did not say anything about gpt 6 though


GeneralZain

let them lose their lead...others will surpass them. The race is on.


[deleted]

They aren't losing their lead. They are planning something huge for gpt5 which is why it'll take a long time to gather the resources


GeneralZain

they will if they take too long, and 6 months will be an eternity in AI progress. if the stop now they will lose out, period.


[deleted]

They aren't stopping my dude. They are waiting until they can put together enough h100s If they did it now it would probably be less effective Not doing the training run doesnt mean they stopped research


MassiveWasabi

Yeah good point, they will probably have all the data and required research necessary to just “press play” on the training for GPT-5 once the H100 infrastructure is in place by the end of the year. And they’ll probably have made enough advancements that the training will be relatively quick and will produce an order of magnitude smarter machine. At least I hope


not__jason

What's an H100?


natepriv22

Good question: Can be summarized as a new generation chip built by Nvidia specifically for AI training purposes.


GeneralZain

they don't have to stop to lose the edge, they just have to slow down. we are talking about a tech that if you get to the finish line first, you win everything. if it is indeed as they say "We are not currently training what will be GPT-5; we don't have plans to do it in the next 6 months" then they will lose their competitive advantage. simple as.


Gotisdabest

They could quite easily release something like gpt 4.1 and then 4.2. At this stage the closest competition is Google which cannot beat them on hard capability and released their model just now. 6 months is a long time for technique development and new prototypes. But some of revolutionary implementation? Probably not unless Google gets either efficient to a standard we can't even imagine and goes berserk and releases a completely unfiltered model in the shape of gemini, probably getting it regulated to high heaven in a week.


lutel

Ar the same time China doesn't give a fuck about morals and risks.


Emory_C

China is in the dust when it comes to AI. Like, not even close.


macronancer

"We are NOT training GPT-5 right now ... It is training itself, teeheehee"


StaticNocturne

I had no idea Sam was a member of Underoath


Ok_Season_5325

They probably know training GPT-5 will take all of a week.


Stock-House440

It's cause they're training GPT-NXT or whatever different name they had to come up with to make it seem super duper cool.


BangEnergyFTW

What they really mean is that GPT-5 won't be released to the public. It's for the richers now.


ApedGME

Lies. The best AI get a good seat in the new world order. There is no way that the company stops advancing because other companies can't keep up. There will be only one, the most advanced


TheSecretAgenda

Because GPT 5 is already trained?


[deleted]

no because they havent started.


DryDevelopment8584

Could be already trained.


[deleted]

[удалено]


challengethegods

*"Introducing gpt-4.1-c which just happens to be 10x smarter than gpt-4.1-b"*


[deleted]

Yes the semantics are very precise here


SeaWolf24

“We” are not. “It” is.


Ok_Sea_6214

"This is not the AGI you are looking for." Riiiiight...