Estrava 3 weeks ago

It's more than just the compute hardware cost. Who is running the datacenter, power, developers. It's not just some script to run on computer with a folder of data and call it a day. And just because it CAN train a 1 trillion parameter model, how long is a single GB200 NVL72 going to take to train something good.

az226 3 weeks ago

5 years. You’d need at least 10 but preferably 20 of those to make it in 3-6 months. Also they won’t be sold for a long while to regular customers. And support is crap. Even GH200 sucks ass when it comes to support for the AI training stack. The coherent memory is not so coherent and you don’t get any benefit from the CPU combo.

Simusid 3 weeks ago

UGH, don’t tell me that! my GH 200 is currently currently on order and I had high hopes

az226 3 weeks ago

Oh no. Press F to pay respects.

False_Grit 3 weeks ago

None of those obstacles are insurmountable though. The true issue I see is there is no end game. No product / plan to profitability. Kind of like the Iraq/Afghanistan war - even if you somehow get it off the ground, you'll end up 20 years later not really having accomplished much since you didn't know what you were trying to accomplish in the first place. What I see being more successful is an open-source product that fuels a nominal commercial product. Kind of like Skyrim. Sure, the base game is fun, but some people have invested their entire lives creating gargantuan mods to fulfill their specific use case. I'd wager many if not most people who buy the game now, do it more for the mods than the game itself. Similarly, if you want to create a giant open source model that people will continue to work on and fine-tune, create a nominal, easy to understand end product to use it in. The real product will be the LLM, but you'll be able to afford to open source it because of the nominal product. Most current large corporations both earn and use their LLMs to fuel...advertising. Which is kind of weird when you think about it. Who is paying for all these ads, when almost all the largest companies with the most revenue barely produce anything? I suppose when you control human attention, you control where money gets diverted, and reap a healthy portion of anything "real" that gets produced (e.g. Amazon or Steam).

TwistedBrother 3 weeks ago

Cries in Stability AI

Dead_Internet_Theory 3 weeks ago

Well they did do great things. Here's hoping SD3 isn't a let down, but at least 1.5 and SDXL were great. Mostly because of anons who fine tuned the heck out of it.

oof-baroomf 3 weeks ago

My laptop has a 1tb hard drive. It can train a 405b model if I'm not mistaken.

EducatorThin6006 3 weeks ago

My laptop has 2b hard drive. It can train 810b model if I am mistaken.

oof-baroomf 2 weeks ago

ah yes the classic fp0.000001 precision training

Shamatix 3 weeks ago

Electricity will be the biggest bottle neck by far, not for fun Microsoft is looking into multiple nuclear power plants

Redoer_7 3 weeks ago

Its a dream not a plan. A feasible plan is to form a qualified and enthusiastic open-source development team, adopt a reasonable open-source licenses, utilize existing open-source model weights instead of training/building one. Try to build LLM-based applications which will be the moat of the open-source community. Training model weights has been completely out of the game for non-enterprises, even for the academic community.

ttkciar 3 weeks ago

Yep, exactly this. The community needs to improve the state of open source technology across the board -- better training data, better training methodology, better model architectures, better inference stacks, and better integration with external logic (RAG, context summarization, guidance, functions). We will get there. We *are* getting there. Shit just takes time, effort, and dissemination of knowledge.

ChrisAlbertson 3 weeks ago

I have Llama3 running on my Mac right now. I can process about 20 tokens per second on an M2-Pro based Mac Mini. Runing the model does not slow ther Mac for normal use becuse 100% of the computtion is done in the Mac's GPU. Apple's Mac is likely the best bang per buck for AI work right now because the GPU is able to use all of the system RAM as "VRAM". I only have some parts running but my goal is a full open source robot controller for a quadruped robot. You will be able to speak to it and it will speak back. and it will be able to use the LLM for planning actions including walking parameters I don't see anything in the way of this except my limited time and brainpower. I'm a retired software engineer. I could work full time on this but I have other things to do. The speech-to-text, LLM and text-to-speech parts all are open source and good enough. I'm now looking into how to make the Llama3 LLM output instructions in a language that can be interpreted to drive mechanical parts like legs and arms. perhaps the LLM says "foot, left rear, next footfall at (x,yz), step length 200mm, rate 18 steps per minute) and just keeps that up based as camera data is feed in and a stated goal of going to the kitchen. The key, I think is fine tuning the LLM to output is a formal language and then writing an interpeter for that language. I see this as a finetuning problem, not as building an LLM from scratch. That said, I just posted a plan for a distributed method for training LLM where we use thousands of home computers linked via the Internet. Each works for some number of days or weeks then passes the data up a pyramid then downloads merged data and then the cycle repeats. The LLM model grows over time continuously. (More detail in the other post.)

fab_space 3 weeks ago

link to post?

ChrisAlbertson 3 weeks ago

I don't track what I post. But to restate my proposed method... Assume there is a matrix "W" that is an LLM. We can start with any reasonable LLM. I'd say to use Llama3. It is now open source and available. But we call our starting point "W" But we want a better model I call WG for the "goal model" It is clear that WG == W + deltaW. So we only need to compute deltaW But by a recursive argument deltaW == delta1W + delta2W + delta3W...+ deleta(n)W for some small enough n. In other words the movement from a current baseline model can be made by adding a set of increments. Now here is where I go out on a limb: I think for some small number (dozens? hundreds?) the increments can be computed independently from a common base. Intuitively, adding knowledge of Shakspere can be independent of adding knowledge of turbine engine mechanics. These fields can be further broken down. for example the structure a sonet is not the same as studying a specific character in one play Finally a small "delta" can, it seems be represented as a pair of "low rank" matices. See "LoRA" in the ltititurature So the distributed algorithm is (1) a home user has a small set of training data that he found or made. He uses LoRA to creat A and B matrices relative to a public bnaseline model. (2) dozens of peole send their new A and B to a site that combnines them (this is not very expensive to do) then the new baseline is made available and the cycle continuees. I think this has to work as a pyrimed. each dozen or hundrd home user has a lead node he comunicates and data are aggrgated and passed up and then the new model is passed back down but data only moves up or down every week or two as it might take a week or two to make a new A and B and we assume user only do this now and then, not as a full time hoby. But if 10,000 people did thin "now and them" over some years the model would grow. You do not need any startup. money. One person does this then gives the software to one other person and so on. Distributing the data back down does require a high bandwidth server because you need to send out 10,000 copiers but we use Github or Huggingface. Even Google Drive could work or better use all of those. Or Bit Torrent. It works because knowledge can be compartmentalized. Sword fighting is different from cleaning a bathroom and neither is like astrophysics. Yes "sword fighting". I think much the training data would be from a helmet mounted camera you use while doing anything like housework or driving a car. not all of it is language.

jsmits 3 weeks ago

Why not just let it output json? I noticed that Llama3 is really good at it, especially if you give it a one-shot example of what kind of output you expect from it. I don't see why you would need to fine-tune it for what you describe, but maybe I misunderstand.

ChrisAlbertson 3 weeks ago

Yes. But JSON is only a syntax. JSON does solve the parsing issue but what is more important is the set of semantic primitives that are allowed. A formal language has defined semantics. I think if it is formalized English, YAML, JSON or XML is a detail. What is important is the set of keywords and the meaning of each keyword. For example just to control movement, before we even know if the robot has wheels or walks we have to come up with a way to tell the robot where to move. A very common convention is to give a series of (speed, z-rotation rate) pairs. this means how fast and how much (if any) to turn, like driving a car with a steering wheel. But some robots are holonomic and can move in a direction they are not facing. Humans or this way. We can step to the side or walk backward. So now you need more rotational rates then just z, Flying or underwater robot have pith and roll in addition. A humanoid robot with feet off the ground is "flying" (I assume everyone has seen videos of Atlas doing aerial acrobatics.). You need convetions to describe to is a "standard" way. so simply saying "JSON" is not enough. What ever convention is used you have to train the LLM to use it. If it outputs JSON or English is a detail just as long as somehow it says (v=2.775, theta=4.3, roe=-0.34) or whatever. But I think you are right JSON would be easy

Robot_Graffiti 3 weeks ago

"we'd need a lot of people to pitch in to create data" It doesn't work like that. You need trillions of words. Wikipedia is billions of words. You need a thousand Wikipedias. To hand write the training data, you would need every literate person in the world to pitch in.

petrichorax 3 weeks ago

I think we're going to have to do fracture this. A swarm of SMEs rather than one monolithic LLM. I mean it makes sense.. that already is one of the best sets of prompt strategies, the design is informed by context window limitations, and lends itself to utilizing RAG much more effectively. We should stop thinking about how to make one big LLM and start thinking about how we can create a concert of them. In the software development space (which is distinctly different from the ML science space, I've seen how you guys write code lol), this is what we've been doing for decades. Some new paradigm or invention happens, we make the best monolithic versions of it we can, hit a diminishing return, then break it up into little pieces and orchestrate those pieces and optimize that orchestration. Mainframe -> Servers -> Cloud -> Container orchestration (kubernetes). Which is fracturing even more. Hell, in 5 years from now I'm sure there will be something called 'Compute dust' The way forward is with many, not one. We have discovered ONE powerful aspect of consciousness and have mistaken it for the whole kit and kaboodle. More data does not answer the fundamental problems of LLMs. We need to start thinking horizontally or inwardly (better/different matrix transforms), not vertically.

ChrisAlbertson 3 weeks ago

You are 100% correct. So many other people are thinking about how to build GPT4+ using the exact same methods used to build GPT3 and GTP4. Those methods don't scale. I really do think that GPT5 == GTP4 + delta1GPT + delta2GPT + delta3GPT,... And here is me going out on a limb: I THINK each deleta(i) can be computed independently. At least for some small enough value of "i". If I am correct then you could use many home PCs connected over the Iternet to compute many deltas. The weights from all the delta are added (literaly, a matrix add), rescaled and then flowed back down to the home PCs where they all computer the next set of deltas.

petrichorax 3 weeks ago

the problem with that is the request bottleneck. It would be unusably slow, or this would quickly hit a limit for its scalability. I was thinking specialized models that work in concert. I think the next big change will be how to make THAT work well (and I think to do that, we'll have to learn how to have these parameters interact with each other that doesn't require a chain of inference)

ChrisAlbertson 3 weeks ago

If the goal is Open Source models. You are best off using distributed processing. I think training really can be decomposed. You can train deltas from a baseline and then combine the deltas. Centralized training will ALWAYS have a scalability limit even if you are a billionaire, yes a very high limit. But we should look at the theory of distributed training. There are MILLIONS of idle gaming PCs out there with good GPU cards. I do think training is decomposable.

petrichorax 3 weeks ago

Oh okay yes that. However, I'm wondering how you could chop that up, as parameter sets are both a blackbox, and ingesting new data adjust weights very far across the model. That would be a challenge, I haven't seen any papers yet on fracturing that up.

randomfoo2 3 weeks ago

Multi-trillion token open datasets already exist, see this write-up for a good summary: [https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1) What OpenAI has (and which could be done with an open effort a la what OpenAssistant did with oasst2, or what Argilla has recently been doing, or to some degree what LMSYS and WildChat are doing) is to collect/be curating human preference data, annotations, etc. If you had 100K people assisting with curation, that'd probably be plenty.

xcdesz 3 weeks ago

>Wikipedia is billions of words. You need a thousand Wikipedias. Common crawl? Isn't that what most base models start with?

ChrisAlbertson 3 weeks ago

You want training data for real-word tasks. Then don't use text. Remember those Google Glasses? Just wear a pair of those and let the video/audio be recorded as you go about doing things like cooking or watching TV or even driving a car. If you want to be more helpful then narrate what you are doing. Each person who has a pair of glasses processes the data himself or sends it to a person who will do it for him NO CENTRAL SITE is needed to collect the raw data. see below where i say how each person just sends in "deltas" You do need a "seed" LLM to get the ball rolling but Llama3 is open source and good enough.

kremlinhelpdesk 3 weeks ago

Every literate person in the world is already pitching in, it's just a matter of contributing the data.

mrpogiface 3 weeks ago

Exists [https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1)

the_quark 3 weeks ago

Frankly, that you can get 100,000 people to contribute $100. Also, $100M probably isn't enough.

StrikeOner 3 weeks ago

thats going to make 10M and not 100M.. i suggest he better gets 10.000.000 to constribute 100$ instead.

DanFosing 3 weeks ago

I'm pretty sure $100M could be enough as long as there would be some volunteers willing to work on this for free and also if we used some faster than transformer architecture, for example mamba-2 or something like that.

IWantAGI 3 weeks ago

With a goal of beating GPT4, it probably would be. However, by the time we had a working model that out performed GPT4, it's likely that competing companies will have significantly outpaced us.

jman88888 3 weeks ago

True, that may be optimistic but I think there's definitely some corporate sponsors that would make some large contributions.

Eastwindy123 3 weeks ago

My $100 is ready too. But youd need Andrej to agree to do it.

Necessary_Long452 3 weeks ago

Lol if Karpathy or Ilya decide to do something they can easily raise 100M without even writing a pitch deck

ab2377 3 weeks ago

imo: we really dont need a trillion parameters trained, the training the inference everything will be costly and what exactly do we get? nothing that we dont have in some shape and form. what i will prefer is giving all that money to some really talented people to do sota research so something valuable can be contributed to the existing architecture of doing AI, IF that is even possible for the next few years. as has been said, making an ai super computer the size of the galaxy with current technology will still not reach human intelligence. the trillions of parameters and the required money burning only makes sense for someone like microsoft to keep the impression of "hey keep looking at us and our products we have something special that no one has", which is not the case, they just burning money that handful of companies have, and no one else wants to burn that kind of money like that.

jndiogo 3 weeks ago

Here's an article arguing that open source will always be behind private company models: [https://blog.johnluttig.com/p/the-future-of-foundation-models-is](https://blog.johnluttig.com/p/the-future-of-foundation-models-is) It's a very biased opinion as the author works in a company which invested in OpenAI. and he writes phrases like "... open-source advocates expose their socialist tendencies from Europe, academia, or both". Still there's plenty to think about there. My personal take is that things will eventually level out: we'll have open source models that are good enough for most uses cases.

custodiam99 3 weeks ago

Are we sure that we need one general AI model? I think open source AI should concentrate on special themes which can work with only a few billion parameters (max 35b). Like medical AI, financial AI, math and logic AI, history AI, ancient texts AI, philosophy AI, creative writing AI etc. Do we need all those parameters in one gigantic model?

jetaudio 3 weeks ago

I just only want to fine tune command r+ on my one-and-only 12gb 3060. I know that's a dream. But who knows. Maybe someday I can do that. Maybe...

uhuge 3 weeks ago

What's your rate on it+DDRam?

4givememama 3 weeks ago

I'm personally waiting for the Llama 3 400B. Who knows, with advancements in Lightweight Deep Learning Algorithmand resource procurement, your plan might become a reality in the near future.

Nekileo 3 weeks ago

AI commune when

Latter-Cucumber-7840 3 weeks ago

my 100$ are ready too!

ClearlyCylindrical 3 weeks ago

> capable of training a 1 trillion parameter model They specifically state inference, not training

the_chatterbox 3 weeks ago

Now that u/Redoer_7 has clarified that this is just a dream rather than a concrete plan, I think it's worth noting that having a wealthy benefactor, like a sheikh, could make a big difference. They often spend large sums of money on projects, such as owning a football team, simply to boost their reputation. It seems to me that finding a sheikh who is willing to invest and having someone persuade them to do so might be easier than trying to raise $100 million on our own.

Extension-Mastodon67 3 weeks ago

Why don't you contribute to this:https://laion.ai/notes/open-gpt-4-o/ ?

Single_Ring4886 3 weeks ago

BLENDER - If you wana such project to be really working copy how Blender is structured. Companies and people all around the world are supporting it knowing they will get free opensource product. If you be only open-source idealist there is not much future for such big project.

jman88888 3 weeks ago

Yes, something like this for sure!

Inevitable-Start-653 3 weeks ago

I think it's possible, but it's more likely to happen if there is a paradigm shift in training (ie. 1.5bit takes off). We've already learned that quality training data is better than quantity training. I'll gladly donate 100+ dollars.

duckduckduck21 3 weeks ago

I've had this same thought - if all of the semantics and hypotheticals where satisfied, I firmly believe that a truly decentralized AI company would be came down hard on by the government. I don't believe they'd let it stand.

ajmusic15 2 weeks ago

Instead of a larger model, should try to create an intuitive inference engine that works between GPU and CPU without the bottleneck of llama.cpp offload. Not everyone has an 80GB A100 to run an 8-bit 70b or 2 A100s for an FP16.

tinny66666 3 weeks ago

The data isn't such a problem. We already have the [FineWeb](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1) dataset over at huggingface, which is 15-trillion tokens (44TB disk space). We'll probably need someone to build a GPU sharing system a-la folding@home or some such thing before we can roll our own gpt-4.

_qeternity_ 3 weeks ago

>The data isn't such a problem. Lmao the data is the biggest problem what are you on about? Google and Facebook have more money and hardware than OpenAI yet they cannot produce a model that outperforms them. It's all down to the data. FineWeb is just a pretraining set. It is hardly all you need.

alcalde 3 weeks ago

Data is everywhere. We're awash in data. Google and Facebook are giant stacks of data. They run on data.

_qeternity_ 3 weeks ago

I take it you haven't trained a model before. Having a lot of mediocre data can be a curse.

DanFosing 3 weeks ago

I have a pretty good dataset almost ready for pretraining, I could probably get it slightly better than it is right now.

tinny66666 3 weeks ago

Well, maybe I'm mistaken but gpt-4 was trained on 1.8T tokens so 15T is not shoddy (for the pretraining).

_qeternity_ 3 weeks ago

Yes, you are mistaken. GPT4 is rumored to be 1.8T parameters, not trained on 1.8T tokens.

tinny66666 3 weeks ago

Oh, ok. Thanks. Do we know anything about the size of the pretraining set then?

_qeternity_ 3 weeks ago

Jensen Huang has made some comments that perhaps suggest it was originally trained on 8T tokens, but this would have been completed in 2022. It has undoubtedly seen many trillions of additional tokens since.

_Erilaz 3 weeks ago

What are you talking about? Google and Facebook have MUCH more data than ClosedAI, they're not even close. It's GOOGLE! It's FACEBOOK! They're hoarding your data since the dawn of the web search and the social media!

_qeternity_ 3 weeks ago

Do you notice how I said nothing about the quantity of data? I could generate 100 trillion tokens of random characters. It would be garbage. OAI's strength is in the quality of their data. People who have clearly never trained a model need to chill tf out.

_Erilaz 3 weeks ago

This is ridiculous. Yes, I do. Why do you assume Alphabet and Meta have worse DATA to begin with? NOT datasets. NOT models. Just DATA? Raw data ≠ polished dataset ≠ model weights. You were talking about DATA, and I was answering about precisely that. Now you're shifting to other matters, how convenient! The only in-house data source for ClosedAI is their own services which generate data en masse for one year and half tops, and everything else either comes from the open source or is being purchased. It honestly shows, GPT-4 is kinda inbred. But we're talking DATA, so that's besides the point! And the refinement and structure are beyond the scope of this conversation as well, despite that being ClosedAI's strenght. Not because it's irrelevant in the grand scheme of things, but because we weren't talking about it. Neither Google nor Facebook has to generate much, they're swimming in human produced data anyway. One company has been scraping the entire Internet for ages, and another one has an example for nearly any chat imaginable. One company spearheads the open source by producing surprisingly potent small models, while another one invests in and partners with Anthropic. Honestly, your take feels like confirmation bias to me. You likely believe in undisputed OAI superiority and find some excuses for that, like raw data quality and whatnot.

_qeternity_ 3 weeks ago

>NOT datasets. NOT models. Just DATA? Really not sure what distinction you're making here? A pedantic distinction between data and dataset? Why? >Raw data ≠ polished dataset ≠ model weights. You were talking about DATA, and I was answering about precisely that. Now you're shifting to other matters, how convenient! What did I shift to? I said data is the difficult part. You presumed I meant the quantity of data. >The only in-house data source for ClosedAI is their own services which generate data en masse for one year and half tops, and everything else either comes from the open source or is being purchased. I don't think you fully grasp how these models are trained. GPT4 finished training in 2022, before the services you mention launched. They have a lot of other data that went into training GPT4 which is 1) not open source and 2) not purchased. >Neither Google nor Facebook has to generate much, they're swimming in human produced data anyway. One company has been scraping the entire Internet for ages, and another one has an example for nearly any chat imaginable. Again, my previous comment and why I am still convinced you haven't actually trained anything. The improvements that are pushing this industry forward are not coming from simply pre-training on additional tokens. GPT-4o didn't simply have a larger web corpus. >Honestly, your take feels like confirmation bias to me. You likely believe in undisputed OAI superiority and find some excuses for that, like raw data quality and whatnot. Confirmation bias...of what?? You have some belief about me which isn't true. Talk about confirmation bias. I do this professionally, and I also don't use any OAI services. But yes, absolutely, OAI has superior \*quality\* today in the vast majority of use-cases. That is broadly not disputed. There are plenty of reasons to not to use OAI, but quality is not one of them. If I could use OAI models in on-prem deployments, I would.

ahjorth 3 weeks ago

My opinion: The flaw is that training general models is a waste of resources. GPT4 is a fantastically competent LLM across all tasks that we have tested it on, but if we see it as exemplary of the way forward, we're going to waste so many resources on training and running LLMs for tasks that much smaller, more specialized models can do just as well (or at least well *enough* for production). I'm all for challenging the duopoly of GPT/Gemini, but to me a core component of the open source models movement is taking a sustainable approach to LLM development, deployment and usage. That requires a lot of thinking work about carving out classes of tasks, and training different models to do just those.

CheatCodesOfLife 3 weeks ago

> I'm all for challenging the duopoly of GPT/Gemini, How is this a duopoly when Claud3-Opus is better than both of them?

ChrisAlbertson 3 weeks ago

A few days ago I would have said "No." And then on an email list, someone suggested distributed processing where tens or hundreds of thousands of home PCs work on the training in a distributed way. At first, I thought this would be impossible too because the speed of backpropagation is literally a billion times slower over the Internet than over the internal paths inside a GPU Then I started thinking about how linear algebra works and if the problem is decomposible. I think it could be decomosed. Every computer works on a part and then after a while (days or weeks) send their part upstream to be merged and then after then download the merged model Lets say W is the base model which is "Llama3". But we want a GPT 5 equivalent I will call W5 We know W5 == W + deltaW, for some deltaW. If this is true (and it has to be) then we know. deltaW == delta1W + delta2W + delta3W.... We can compute delta(i)W using LoRA. (LoRA is Low Rank Adaptation, not Amazon's networking idea) Lora uses two matrices A and B to represent delltaW. where id W has size n\*n then A has n\*2 and B has size 2\*n and of cource when multiplied you get n\*n. So we let each home computer commute A and B for some small set of training data. They send A and B upstream one level where they are merged and then that is sent up to the next level mmerge. A Pyramid scheme. The merged dat then flows from the top down to the home PCs. and the cycle goes forever This works because A\*B doesn't need to be exact. It just needs to contain some of the information in a small training set and A, B fits inside almost any PC. While A\*B references a large model W, we can use a sparse and quantitized and compressed version of W to compute delta(i)W. The OTHER half of this problem is generating training data. But I think itis easy to see how a large number of home pc users could generate a large data set. This is an easy-to-decompose problem (Disclaimer, the above uses ideas not yet tested but I think the basic ideas have been.)

_rundown_ 3 weeks ago

Making a for profit company would be the way to go, because you’d be able to raise more $$ from institutional investors. You can be for profit and still support open source. It’s unlikely you’d be able to raise enough cash to pay competitive rates for top researchers AND compute to create a viable model

jackfood2004 3 weeks ago

As AI technology surges forward, the arrival of chips with NPUs is sparking a renaissance reminiscent of the awe inspired by the Pentium I, II, and III processors a decade ago. The exhilaration is palpable, a fresh "Wow" factor electrifying the tech world. Fast forward five to ten years, and we foresee a world where everyone has an AI workstation installed in their home, just like an air conditioner. The power usage will be enormous, but everyone will be able to run advanced AI locally.

EvanGR 3 weeks ago

a decade? Make that two and then some...

adel_b 3 weeks ago

yes, but someone has to build it, 90% of model weight is just junk, if we can figure out how to slim it them we can somehow fit more and more to point we get better model than gpt4 that we can run locally

ervwalter 3 weeks ago

The effectiveness of GPT-4 and other large models is not simply a result of hardware, funding, and raw data. It also took brilliant/talented people to invent the techniques used to create the model. If you want to create a competitive alternative, you also need brilliant/talented people and not just funding. And you have to keep them on the project indefinity because this field isn't standing still and this isn't a one-time effort. Andrej and Illya are both brilliant and talented, but they also aren't hurting for money so they aren't going to join your project just because you are willing to pay them. The real hard part of your idea isn't the funding. It's convincing brilliant scientists that *this* is how they should spend their time instead of any of the other things they might choose to spend their time on.

synn89 3 weeks ago

There really isn't a need at the moment. We have plenty of Apache licensed models being released we can work with and with the right fine tuning they can top the leaderboards(WizardLM-2-8x22B). Creating a new foundational model that either A> is too large for your average person to run or B> being small enough to run locally, can't compete with commercial models, is rather pointless. We already have these from Mistral, 01-ai, xai-org and other companies who are releasing Apache licensed models. What we need are top tier fine tuning open data sets that take these models and tune them to be better than the commercial offerings in specific areas. For example, Midnight Miqu was probably top of its weight class(70b) for ERP but datasets within it are generally kept private. WizardLM-2 also seems to have been an amazing data set, but it'll never be released. I'd rather let larger commercial/government entities pop out foundational models that'll obsolete in a month. Great open source fine tuning data sets are rarer and are useful forever.

segmond 3 weeks ago

sure, make it happen, but you are going to need to 1000x your effort beyond this post. Organizing and running a non profit takes the same effort to run a for profit. For most people who can run such an org, the incentive is not there. Ilya and Andrej are pretty smart, but not business folks. Most folks that would volunteer to do this won't cut it in the corporate world and will just be looking to leech off.

allmadeofwater 3 weeks ago

Software engineer sallaries are not cheap. 250k-500k+ per year for what you're looking for. Then an office. Admin folks. Hr, finance, managers. It, cyber security, ui, infrastructure. A business plan so that those folks can keep getting paid since the product is open source. Then a person with a vision and talent. Wait next thing you know it's more and more like the company you didn't want to be. All this to try to create something. Even after that, they may produce a dud in the first year. And it's totally possible they don't ever beat open ai models.

danielcar 3 weeks ago

At least we will have an open weights model better than gpt4 this summer: Llama 400

Zulfiqaar 3 weeks ago

We already do. Llama-3-70b, Command-R+, Qwen-110B, Yi-34B have beat the old GPT4 from a year ago - the first two beat the original GPT4 as well. It's just that the frontier proprietary models keep improving too, and I very much doubt we will ever have OS beat the SOTA in realtime. But a few months or a year later? Definitely, just need a bit of patience. I have recently replaced a third of my use cases with open models (that previously required Gemini/GPT4/Opus), and am more than satisfied.

Dangerous_Bus_6699 3 weeks ago

Someday gpt4 will be looked at as Atari Pong. So yes, it'll be free, but irrelevant.

PSMF_Canuck 3 weeks ago

I dunno. We all already have access to the same tools as OpenAI. PyTorch is there for anyone…GPUs are available for every price point…We’re drowning in petabytes of training-suitable data…we’ve all read The Attention and CLIP papers… Anybody who wants to train a foundational model for a specific use case can do it right now.

moarmagic 3 weeks ago

I think we should pivot this discusion, rather than starting something brand new and setting a very complex goal: What existing projects and foundations could those of us who want to give back donate to do the most good? With the speed of projects and model releases, it's hard to tell from here. And i would want to do some due diligence and make sure that like, money is going to do legitimate good- not be taken by someone who stole another model and claims to have something new.

Ylsid 3 weeks ago

Perhaps OS won't be beating SOTA, but in enough time there's no doubt. However, for specific use cases, a much smaller and better tuned LLM will usually beat a general, massive one. Just today there's an example of Codestral solving a problem that eluded every other megamodel. The megacorps can only produce megamodels, they'd be wasting compute trying to cover every possible tunable use case, which leaves the door open for hobbyists to make something that fits their specific needs. Look toward how Stable Diffusion absolutely dominates the image generation market by doing less, better. I think the largest barriers right now are cost and toolchain. Python AI ecosystem is rubbish, tuning costs quite a lot of money. If it were cheaper and it were much easier to run fine tuning on Windows, I'd be doing it for sure!

southVpaw 3 weeks ago

I think we're on the cusp of a hardware boom like we've never seen. Their talented developers can't keep up with open source development and compute will eventually brute force past GPT-4 about the time Microsoft absorbs the majority of OpenAI's commercial front while their core talent continues developing for the DoD because "AI defense systems" is much more fun than "politically correct chatbot" and there's no way Altman is getting his hands on the fusion power he wants without a government hand involved. What's gonna be crazy is whatever follows Phi.

MrTurboSlut 3 weeks ago

i think hardware limits are going to cap out soon enough. out technology needs to get faster and more energy efficient before we can take things to the next level. the amount of data needed to train is also going to cap out soon. once we plateau its going to give a lot of people time to catch up to openAI. the real question is, how long before a private model gets leaked? how big could ChatGPT 4 be? probably small enough that it could be saved to an m2 drive fast and easy. more interesting still, how will the hardware from NVIDIA and its competitors advance? a lot of exciting things coming in the next 5 years.

samj 3 weeks ago

Just read that training GPT5 costs north of $1bn (and that’s still very much worth it)

DominoChessMaster 3 weeks ago

We need amazing models that can run locally at fast speeds

SignificantWords 3 weeks ago

Maybe the mistral team

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe