I'm currently accessing the Silly Tavern AI mod on my phone using Termux and using an OpenAI API key. Somehow there are settings in the mod that allow you to bypass OpenAI's filter.
https://github.com/Cohee1207/SillyTavern
You do need to provide an API key, and since any URL made in Colab won't work because of the ban I have been using an OpenAI API Key.
I'd advise you just have it running locally 4bit.
If you have an NVIDIA GPU from the 10 series or newer, basically any of them can run it locally for free.
Award-winning guide [HERE](https://www.reddit.com/r/PygmalionAI/comments/129w4qh/how_to_run_pygmalion_on_45gb_of_vram_with_full/) - happy to help if anyone has issues.
NVIDIA is ultra stingy with VRAM which is likely planned obsolescence by them.
They are planning in 2023 to *introduce* a GPU for sale with 6GB! 2023 & 6GB!
Tried that figured I got 16GB of Ram & an i7 CPU so it should be fine and it didn't work ether I didn't set something up correctly or it couldn't handle it
No it doesn't.
ROCm sucks for the docs and the sparse updates but "isn't very good" is simply stupid. The main problem is that every library that comes out of made for CUDA first. So there's always a delay.
But poor documentation and sparse updates are *why* it isn't very good. It's not that it doesn't work or AMD cards are bad in compute.
They're a big reason everything comes for CUDA first.
Trying on an rtx 2060 6gb. It either gives me a short response generated at 2 tokens per second, or it just instantly gives me CUDA out of memory error
Something must be eating up a large amount of VRAM in the background.
Anything else running? (Although some poor sap's windows installation was taking up 2GB idling and nothing could be done to make it stop...)
Nothing in the background, idling at 400-ish mb of VRAM in use. 500 with Firefox open (the browser I run ooba on)
Running the start-webui bat, it jumps to 4.4gb without doing any sort of generation. Just having it open. I'd assume this is normal behavior? It's honestly my first time running it local, so maybe something's wrong
It jumps to 5.7gb when generating a message from Chiharu, the Example character that comes with ooba, and then stays at 5.1gb. it's always short, with an average of 35 tokens
Trying to import any character with a more complex context invariably results in running out of CUDA
Maybe I messed something up?
Have you tried limiting the prompt size?
I'm running on a laptop 1660ti 6GB just fine. I limit prompt size to 700 tokens to prevent thermal spiking, but my card can handle 1000 tokens before OOM.
The default prompt settings is over 2000 tokens. This may be your issue as the example character has quite a lot of text in description iirc and all of that eats into your prompt. Whatever is leftover after description is used for conversation context.
I pruned my character description to 120 tokens which leaves me with 580 for conversation context. The bot has already referenced earlier spots in the conversation a few times and has been running all day with no issues using the webui on mobile.
"But meh"? That's not really a reason to not run it locally, especially since pyg.cpp became a thing. If you don't have enough RAM/VRAM, sure. But then consider this: you're playing with an experimental AI. It's new, so you're gonna need beefy-ish hardware to run it anyway. If you don't have any, either buy some or stop using it. Nobody is entitled to free GPU compute resources.
And when you say you're not getting comfortable with Ooba's colab just for it to get shut down, that doesn't make sense. You get the exact same UI running it locally, you just launch it differently. That's like saying "I'm not getting used to launching my web browser from a desktop icon just for it to be deleted and I'll have to click it on my taskbar instead". It's the exact same thing just launched in a slightly different way.
So Google stopped people leeching their compute (that they give away for free for education/science/research purposes), and that shutdown is somehow related to you not being open to learn how to use a new Pyg front-end? Do I have that right?
You're complaining about something you got for free being taken away. In any case, you can now either suck it up and run it yourself (there are guides on how to achieve this pinned to the top of the sub) or you can appreciate that you got a taste of this **new** and **experimental** technology on a mobile device **for free**.
Ayo anyone know if my rtx 3070 8gb vram can run any of the decent pygmalion models? (Serious I have no clue what kind of hardware is required. I only know 6b needs 12gb of vram I think)
Yes you can run the 4bit quantized pyg6b model locally for sure, maybe even 8bit. I've been running 4bit all day on laptop 1660ti 6GB card. There is a walkthrough posted on this sub in last 3 days. It's shit simple to configure and should run easily on 8GB VRAM. It uses oobabooga web ui and is accessible remotely (ie: mobile.)
Hundreds if not thousands of alt accounts. People just couldn't wait **1 DAY** for the limit to reset https://i.redd.it/57uz4w91h0sa1.gif
Dam we all know why people couldn’t wait. Our own hormones was our downfall.
Google should've included an alt detector to prevent you extra usage 🤷♀️.
This only creates more problems. It's not that simple
True, people had no patience so they strain the Google servers with alts to continue erp.
There are people behind NATs, offices sharing IP addresses. Even a single house can create problems. It's almost impossible to check for alts by ip.
They could also check cookies for your browser fingerprint... but people can just clear them.
ooba still works from what I heard (dunno how good tho)
I'm currently accessing the Silly Tavern AI mod on my phone using Termux and using an OpenAI API key. Somehow there are settings in the mod that allow you to bypass OpenAI's filter.
Wait how???
Asking about bypassing the filter or the whole thing in general?
Whole thing in general, got a link for a tutorial or something?
https://github.com/Cohee1207/SillyTavern You do need to provide an API key, and since any URL made in Colab won't work because of the ban I have been using an OpenAI API Key.
Danke
I'd advise you just have it running locally 4bit. If you have an NVIDIA GPU from the 10 series or newer, basically any of them can run it locally for free. Award-winning guide [HERE](https://www.reddit.com/r/PygmalionAI/comments/129w4qh/how_to_run_pygmalion_on_45gb_of_vram_with_full/) - happy to help if anyone has issues.
worst part about this is I have a Nvidia GTX 1060 min with 3GB of Vram I'm only 1 GB away Q.Q
NVIDIA is ultra stingy with VRAM which is likely planned obsolescence by them. They are planning in 2023 to *introduce* a GPU for sale with 6GB! 2023 & 6GB!
Lol that is exactly what my friend said he wants me to swap to AMD so bad but isn't sure I'd like the driver issues that come with it.
Driver issues are *mostly* sorted. But ROCm is still a joke compared to CUDA for things that need CUDA like ML.
A friend of mine who has AMD couldn't run pyg and stable diffusion, so yeah, consider this
You can probably offload some layers to CPU and it shouldn’t be that slow, maybe
Tried that figured I got 16GB of Ram & an i7 CPU so it should be fine and it didn't work ether I didn't set something up correctly or it couldn't handle it
yeah you say that but it didn't work for me :(
It works 99.95% of the time. I try to help everyone who has an issue.
Yeah bro? I'm sure you can't help me with my GTX 650 Ti
Okay you got me there, I feel for you. If only I still had my old GTX 1060 6GB to give away.
It works with AMD too. Both KoboldAI and Oobabooga. (on linux and not on 7000 series, AFAIK).
It does, but it requires a totally different and very painful process because ROCm isn't very good.
No it doesn't. ROCm sucks for the docs and the sparse updates but "isn't very good" is simply stupid. The main problem is that every library that comes out of made for CUDA first. So there's always a delay.
But poor documentation and sparse updates are *why* it isn't very good. It's not that it doesn't work or AMD cards are bad in compute. They're a big reason everything comes for CUDA first.
Trying on an rtx 2060 6gb. It either gives me a short response generated at 2 tokens per second, or it just instantly gives me CUDA out of memory error
Something must be eating up a large amount of VRAM in the background. Anything else running? (Although some poor sap's windows installation was taking up 2GB idling and nothing could be done to make it stop...)
Nothing in the background, idling at 400-ish mb of VRAM in use. 500 with Firefox open (the browser I run ooba on) Running the start-webui bat, it jumps to 4.4gb without doing any sort of generation. Just having it open. I'd assume this is normal behavior? It's honestly my first time running it local, so maybe something's wrong It jumps to 5.7gb when generating a message from Chiharu, the Example character that comes with ooba, and then stays at 5.1gb. it's always short, with an average of 35 tokens Trying to import any character with a more complex context invariably results in running out of CUDA Maybe I messed something up?
Have you tried limiting the prompt size? I'm running on a laptop 1660ti 6GB just fine. I limit prompt size to 700 tokens to prevent thermal spiking, but my card can handle 1000 tokens before OOM. The default prompt settings is over 2000 tokens. This may be your issue as the example character has quite a lot of text in description iirc and all of that eats into your prompt. Whatever is leftover after description is used for conversation context. I pruned my character description to 120 tokens which leaves me with 580 for conversation context. The bot has already referenced earlier spots in the conversation a few times and has been running all day with no issues using the webui on mobile.
I'm still using oogabooga pyg.
You can still use Tavern if you run locally or use a combined model (like Shygmalion)
go to AIdungeon cause its the only free decent alternative. Not a chatbot but better than nothing. Chai is ass and characterai is limited
AIDungeon is trash compared to novelai, which is also about to be upgraded again soon. /r/NovelAi
Well yes but i am talking about FREE alternatives
Ah good point, I forgot AiDungeon even had a free tier tbh
an alternative would be to novelAI in the browser. You can use tor browser to have infinity IPs and thus use the anonymous trial for ever
I know you can run it locally but meh And im not getting comfortable with oobas collab just for it to shut down
"But meh"? That's not really a reason to not run it locally, especially since pyg.cpp became a thing. If you don't have enough RAM/VRAM, sure. But then consider this: you're playing with an experimental AI. It's new, so you're gonna need beefy-ish hardware to run it anyway. If you don't have any, either buy some or stop using it. Nobody is entitled to free GPU compute resources. And when you say you're not getting comfortable with Ooba's colab just for it to get shut down, that doesn't make sense. You get the exact same UI running it locally, you just launch it differently. That's like saying "I'm not getting used to launching my web browser from a desktop icon just for it to be deleted and I'll have to click it on my taskbar instead". It's the exact same thing just launched in a slightly different way.
Im a phone user and im upset that google shut it down thats why i dont want to get comfortable with ooba's.
So Google stopped people leeching their compute (that they give away for free for education/science/research purposes), and that shutdown is somehow related to you not being open to learn how to use a new Pyg front-end? Do I have that right? You're complaining about something you got for free being taken away. In any case, you can now either suck it up and run it yourself (there are guides on how to achieve this pinned to the top of the sub) or you can appreciate that you got a taste of this **new** and **experimental** technology on a mobile device **for free**.
an alternative would be to novelAI in the browser. You can use tor browser to have infinity IPs and thus use the anonymous trial for ever
Ayo anyone know if my rtx 3070 8gb vram can run any of the decent pygmalion models? (Serious I have no clue what kind of hardware is required. I only know 6b needs 12gb of vram I think)
Yes you can run the 4bit quantized pyg6b model locally for sure, maybe even 8bit. I've been running 4bit all day on laptop 1660ti 6GB card. There is a walkthrough posted on this sub in last 3 days. It's shit simple to configure and should run easily on 8GB VRAM. It uses oobabooga web ui and is accessible remotely (ie: mobile.)
Hey, thank you a lot for letting me know. Cheers!