Selfhosted AI

Any model right now where I can feed it a library of text and get something halfway usable out of it?

(5)

[deleted]

(2)

I do have my 3080 with 10 gigs of VRAM, and I have my eye on a used Tesla with 24.

2023-03-03 02:26:43

(2)

u/Low-Commercial7163

FLAN-T5 is one of the best options in the realm of stuff that fits in gaming gpus. Use load_as_8bit with accelerate/bitsandbytes

2023-03-03 10:58:52

(1)

Thanks, I'll give it a try!

2023-03-03 11:01:14

(4)

u/mark-haus

While 3GB of VRAM is enough for a lot of workloads. Some of the more impressive models like speech transcription, text generation and stable diffusion can’t really be done with that little space for its tensors unfortunately. Unless you had versions of those models focused specifically on the parameters you know you’d use

2023-03-03 11:40:37

(3)

u/livrem

Yes, I would not recommend running any of those things on only 3 GB, but I think it is promising that anything works at all, so with some somewhat OK GPU it should be possible to selfhost something useful/fun.

2023-03-03 16:50:02

(4)

u/mark-haus

The big push in AI these days is more with less data and complex models so we’re solid be seeing more practical home use AIs in the next few years. Because currently the biggest bottleneck is accelerators with enough VRAM to hold a whole model

2023-03-02 19:46:07

(59)

u/DELETED

Invoke AI in my opinion is a good UI but Automatic is so much better, has a much larger community and is updated frequently in the core and add ons.

https://github.com/AUTOMATIC1111/stable-diffusion-webui

2023-03-03 01:52:35

(8)

u/speed_rabbit

I really like Easy Diffusion (formerly cmdr2) with some of the batch plugins. It really makes it easy to iterate quickly over thousands over generations in real-time. The only downside is that nothing beats A1111 for having the latest features first.

Still, if I don't need a feature that it's in A1111, the UI experience is so much better for my process. Maybe I'm missing great UI plugins for A1111 but in my experience I could never recreate something with as streamlined a workflow.

I still keep A111 setup in parallel and use it when I need it! The nice thing is you can use mutiple setups. If you symlink the models directory to share it then it doesn't even take much extra space.

https://github.com/cmdr2/stable-diffusion-ui

2023-03-03 14:04:49

(3)

u/you999

nose apparatus shocking library ink humorous absorbed tub uppity zephyr -- mass edited with https://redact.dev/

2023-03-03 04:49:53

(1)

u/jontstaz

Agreed but I like the interface and functionality of InvokeAI so much better. The thing holding it back is it only has Stable Diffusion 1.5. Once it gets SD 2+ it'll be so much better. I've been waiting for it for a while since I first used Invoke.

2023-03-03 06:41:29

(8)

u/iiiiiiiiiiip

Stable Diffusion 2+ isn't better, most people are still using 1.5 because of that and and a lot of new features are 1.5 first. The issue with 2.0 is it heavily restricted the included modeling data, in particular NSFW modeling data both when training and when outputting. While you might think "I don't care about NSFW content so it doesn't affect me", it absolutely does, even for regular generations things like anatomy are significantly worse in 2+ compared with 1.5.

2023-03-03 06:23:50

(3)

u/DELETED

It supports both SD 2.0 and 2.1 according to the Readme in the InvokeAI repo. You just need to get the checkpoint file instead of safetensor which last I checked wasn't supported in invoke AI.

2023-03-02 17:28:06

(72)

I may only have an AMD Ryzen 9 3900X and NVIDIA 2080 TI, but if I can run an AI myself, I'd love to try it :)

Only? Me with my i7-950 and a 580X are looking at you O.O

2023-03-02 21:03:52

(10)

u/scriptmonkey420

I have an oddball setup. Ryzen 7 3800x with a RX480 8G.

2023-03-02 22:09:13

(4)

Why a oddball ? You have a vastly superior cpu to me, and your 480 is not so slower than a 580... is it ?

2023-03-02 22:24:55

(5)

u/scriptmonkey420

The 580 was basically a transistor step refresh of the 480. So the 580 is slightly better than the 480.

But yes, CPU does make a huge difference. My previous CPU with the same GPU was a FX6300.

2023-03-03 03:36:17

(2)

u/iszomer

480

We can do SD at home now?

2023-03-03 06:33:31

(7)

u/Bagel42

Ha! I have a Dell Optiplex 3060, and a Pentium all in one with Ubuntu Server lol.

2023-03-03 08:58:52

(3)

Optiplex 3060

Sir... that's beast!.. my little thing is still on socket X58.

I don't complain about performance... but the lack of certain instructions in the CPU is rising problems.

2023-03-03 18:29:40

(3)

u/nightmareFluffy

If we're having a race to the bottom, many people here have Raspberry Pis. And this one place I worked for had a server from the early 1980's still handling some mission critical stuff for thousands of users. So hah! You're not at the bottom!

2023-03-03 19:17:16

(2)

Oh... Nowhere near. I still give support to socket 771 and 775 servers, they are not bleeding edge, but are reliable, dependable. They do their job.

2023-03-17 22:19:31

(1)

u/mattsl

this one place I worked for had a server from the early 1

Much more common than you think

2023-03-07 20:27:27

(1)

u/Bagel42

No no. That’s my actual desktop. The Pentium is the server lol

I do have two pi’s doing other things, but they’re always at high temps or high usage

2023-03-02 18:18:44

(-25)

u/TheRealJoeyTribbiani

You'd hate to see my rack

(9)

🍑

(3)

First thought , no shit: is this from r/selfhosted or from one of the many r/nsfw ??

Still, in any case, the answer would be the same:

You're damn sure I would love to see your rack!

2023-05-31 23:50:20

(1)

u/thebadslime

Ryzen 5 4500u, using the built-in GPU, still motoring along.

2023-03-02 18:35:58

(30)

Checkout getting a Tesla m40. Max wattage of 250 and doesn't have built in active cooling, so you have to slap an ID-COOLING ICEFLOW 240 VGA AOI on it. BUT they can be had for less than $200 on ebay and they have 24 gigs of VRAM, which is super important for running AI.

My current AI box has a Ryzen 7 3700x, 64 gb of 3600 ram, and a RTX 3060. The 3060 isn't the fastest, but it has the best dollar per gig of VRAM...thats if you don't want to go with an m40. I run Automatic1111 (web ui over stable diffusion) and Mycroft's mimic3 (text to speech) with no problem. I want to run gpt-j or gpt-neo, which require more VRAM, so I ordered a m40

2023-03-02 20:30:34

(13)

u/diymatt

I have an Nvidia Tesla K80. I modded a fan to it with a 3d printed mount.

It's sitting in a bin. It was so fiddly to use.

It's working! oh the drivers crashed.

It's working! oh the display crashed.

It's working! oh exclamation points in DM again.

Using an old Quadro K1200 now instead and it's way more stable.

2023-03-02 20:36:02

(1)

How much are quadros? Might look into them if the m40 isn't stable

2023-03-02 20:40:38

(2)

u/DELETED

I picked up a couple of Quadro P6000s for about $600 USD each. I've managed to get them to work in my DL380 G9 with Hugging Face models. It's a lot of fun to play with if nothing else.

2023-03-02 21:33:47

(1)

u/diymatt

The k1200s I bought new for $198. Amazon has a few used right now for $118

2023-03-02 20:33:43

(4)

Amazing idea! I completely haven't thought of that. Looked on eBay and found them for around 150€ and a dual-fan cooling contraption that seems to mount to the tailend of the card.

Thanks for the thought! Will definitively check it out.

2023-03-02 20:35:13

(3)

Just be weary of those fans. They can make the card super long, they are loud, and might pull a ton of amps.

2023-03-03 14:05:17

(4)

u/rothnic

Looked around and there is a slightly newer Tesla p40 for ~$200. Then there are some newer architectures like the v100 that is well over $1000. Did you consider the p40? I'm interested, but don't want to deal with the stability issues mentioned by someone else. I assume the newer the architecture the better, but doesn't always work out that way.

2023-03-07 15:57:56

(2)

u/ResearchTLDR

I just cross-posted a sanity check question about using a rig with 8 p40 cards here.

2023-03-03 14:27:06

(1)

I don't know anything about the p40. I wonder what CUDA version it uses

2023-03-03 15:53:32

(2)

u/rothnic

Came across it here. Looks like Cuda v6.1. I was mainly looking for the newest architecture that is still at a decent price point and I think this is one generation newer than the m40. The p40 is the same generation as the 1080.

2023-03-03 16:15:32

(2)

Might have to give it a shot then. That's great.

2023-03-08 12:39:42

(1)

So, roughly on a 2080 level then?

2023-03-06 20:39:11

(2)

u/ResearchTLDR

That is an intriguing idea! Is tbis the cooler you were talking about? ID-COOLING ICEFLOW 240 VGA Graphic Card Cooler 240mm Water Cooler GPU VGA Cooler Compatible with RTX 20XX Series/GTX 10XX Series /900 Series/AMD RX 200/300 Series/GTX 1600 Series https://a.co/d/17KN01p

(2)

That's the one

(1)

Didn't realize you could get a 3060 with that much ram for so cheap. How hard would it be to run two of those at the same time on a linux box? I've never seriously considered running multiple cards before

2023-03-02 22:41:55

(1)

3060 has 12gb VRAM. Running multiple on linux? I'm not sure. I don't think it's that hard, but whatever program you're using has to support multiple.GPUs

2023-03-02 22:51:26

(2)

u/grep_Name

I'll do some research then I suppose, ideally they'd be passed through to a docker container running in docker compose, not sure if that makes things more or less complicated :V

2023-03-03 07:33:04

(2)

I faintly remember that NVIDIA arbitrarily restricts GPU virtualization in some capacity. Although Docker runs the clients in basically a fancy Linux Namespace, its still partially virtualized - so you might have to look into actual GPU support for that scenario.

That said, both GPUs appear as different device nodes, meaning you can just use the gpus: all entry for both, if need be.

2023-03-03 12:02:42

(2)

u/AcceptableCustard746

The main limitation is in the number of video transcodes (3) at this point. There are patches for Windows and Linux that remove that limit from keylase.

You should have full features for AI, but may need to make sure you have a display or dummy plug connected to the device for best performance.

2023-03-02 22:33:10

(1)

u/Taenk

I run Automatic1111 (web ui over stable diffusion) and Mycroft's mimic3 (text to speech) with no problem. I want to run gpt-j or gpt-neo, which require more VRAM, so I ordered a m40

How fast is inference with Automatic1111? I get about 4s on my 2060 Super.

2023-03-02 22:40:31

(1)

Depends on settings, but everything on default + face fix + 4 pics in a batch it's probably ~10 seconds.

2023-03-03 02:25:18

(1)

u/nero10578

Anything with Tensor cores like on the RTX 20 series and above will be immensely faster than previous cards. I tried and even a 1080Ti is almost 1/4 as fast as a 2060 Super in stable diffusion.

(13)

[deleted]

(29)

Mycroft is dead.

https://mycroft.ai/blog/update-from-the-ceo-part-1/

2023-03-02 23:35:45

(16)

u/Dankmemexplorer

thanks for posting this, i had no idea.

2023-03-03 03:55:15

(4)

u/scriptmonkey420

Is there a viable alternative?

(4)

Rhasspy is best IMO

(5)

Yeah, currently in development https://github.com/LAION-AI/Open-Assistant

2023-03-03 11:27:40

(4)

u/rackhamlerouge9

I'm leaving reddit and I hope to escape from social-media walled gardens upon the wings of ActivityPub. I will consider moving to a server running Kbin, which - from the user's point of view - is an interface to "federated" social media.

“Federation” describes a way in which servers communicate with one and other. The best-known example is that of e-mail: one can have an email account on an AOL server, and communicate with a user whose account is on a Gmail server. Some servers that are thought to push out spam are blocked or have their mail sent to ‘spam’ folders, but they nevertheless can all communicate. Gmail, Yahoo, Protonmail, AOL and so-forth all have different programs with which the user (us!) interacts, and they might present that email information in slightly different ways (displaying email chains as ‘conversations’ for example). In the same way, social-media servers that communicate with one and other using ActivityPub have different programs with which the user interacts.

Some programs that service-providers can run on their server look a little like Reddit, and might let you mark the data you share with markers (metadata) that lets people display and interact with the data in a similar way (Eg.: Kbin or Lemmy), some look more like Twitter and mark the data you share in ways similar to Twitter (Eg.: Mastodon), and there’s even one that’s trying to help users share video in a way that makes one think of YouTube (Eg.: Peertube). Fundamentally, these all permit interaction with one and other through activitypub.

One can even host one’s own server (Eg.: Nextcloud, a program that runs on a server to function as one’s own cloud, lets the person who runs it install an ‘app’ that one can federate with any other ActivityPub servers open to intercommunication).

Many programs that use ActivityPub for federated interaction are written by folks who realise that things published on servers – even private messages – often get shared beyond the realm in which the author expected (hopefully for the joy and glory of the author, but sometimes not). I think because of this, messages sent from a user on one server to a user on another are sent in-the-clear; they aren’t encrypted in any way, they’re just a post like any other, except being marked for the attention of someone specific rather than for the attention of all, and it’s up to us as the users to think carefully about the words we push to others.

There is a sterling list of alternatives to Reddit on r/RedditAlternatives.

How did I think it best to go about this? - I downloaded all the posts on reddit I'd "saved". - I used "Power Delete Suite" and rather than just delete all my posts, have replaced them with text. Everything published online ought to be regarded as likely permanent, and Reddit especially, as people like to take snapshots of as much data as possible that’s published "in the clear" (I.E.: anything that isn’t publically accessable). Some folks have described problems with "deleted" posts mysteriously re-appearing after they deleted their accounts… Regardless of the cause, I hope I might reduce that risk a little by editing those posts. R/datahoarders might have tips on alternative methods still functioning after the API-use price is introduced (~$20m at the time of writing according to a dev that made an app to help the blind use reddit; they have sadly had to stop developing their app). - There's a guide to downloading all the data Reddit have collected directly from your inputs here but note that Reddit may take a month to process that request. - Remember most of one’s interaction with the internet is reading. Subreddits all have RSS feeds, and can easily be accessed by an RSS reader app. F-droid is a great way to get android apps that people have made openly so anyone willing to learn can understand how they process your inputs and data, and that others have freely distributed, for the glory of free speech. Sorry for sounding like a hippy there; I know, I know, it’s a slippery slope to bicycle lanes and communism! A modicum of private thought, and free speech is a very fine thing, though. - I encourage people to share the text of this post if they find it useful, in order to give others a way to think about how they make and put data on the internet in social media.

To be sure, Reddit still holds, or has doubtless sold on (and thus can never delete), hoofing amounts of data. I shan’t hold a public opinion on a business seeking profit; over time as the art of gathering and selling data has been refined, I’ve tried to read what little about it is within my understanding. If my small tokens of communication, my upvotes and downvotes, the time I spend looking at things, and what things I look at, what things I shy away from, and how I type and compose my thoughts, are the grains of sand that make up the beach from which they intend to profit, it’s up to me to decide where I place those grains of sand in the future. In the immediate timeframe I will use a mathematics-oriented mastodon server (I’ll let you hunt it out if you’re curious!) because maths is fairly apolitical, useful to learn about, and a good, communicable, basis for understanding things. Go in peace, siblings of the internet, and if in doubt, consider “What Would Tim Berners-Lee Do?”.

~~~~~ P.S.: I’m not sure what I can link to that might be useful to most readers, but there’s a lovely Indian lecture on sharing wisdom with one and other here, and because financial awareness is important to most people, and because I’ll only be watching r/bogleheads from afar, here’s a link to Bogle’s Little Book Of Common Sense Investing - he started the Vanguard fund, and r/bogleheads explains his investing philosophy, which is very simple and elegant. If anyone’s looking for a good charity to which to make a tax-deductable donation, I hope you might find the internet archive is a noble and worthy candidate.

RLR9 Out.

2023-03-02 16:44:45

(28)

u/ByteOfWood

I tried using Whisper to make subtitles for my videos but kept getting weird results. I'll have to try it again sometime to see if I can get it working properly.

2023-03-02 17:51:44

(12)

u/rursache

in the meantime you can use this replicate node. made a few subtitles for some obscure media completely free.

2023-03-02 20:55:30

(2)

u/squirrelhoodie

How are subtitle timings for you? The main issue I had with Whisper was that often the timestamp started way earlier than the actual speed when preceded by silence.

2023-03-03 14:39:20

(1)

u/rursache

I had around 2-3 desyncs in a 1.5h video. You can easily fix them with this after the replicate node export

2023-03-02 23:49:50

(4)

It works pretty well for me! I wrote a script that automatically transcribes videos, translates them via DeepL if needed (I found the built-in translation to be very lacking) and then muxes them into an mkv via FFmpeg.

There are certainly some oddities with Whisper, like hallucinated sounds. But there are ways to go around the silence and the timings especially!

2023-03-03 09:42:36

(2)

u/s_91

I want to do just that, can you make a docker container for your script?

2023-03-03 10:58:06

(3)

I'm working on it, but I'll be on vacation for the next week :)

2023-03-03 12:41:08

(3)

u/rounakdatta

Check Whisper.cpp as well, that is meant to be much more lightweight. Has a WASM demo as well.

2023-03-03 02:24:06

(8)

u/Low-Commercial7163

Most stuff on HuggingFace is trivial to set up behind FastAPI. You can do Whisper or whatever with maybe 20 lines of code.

Install CUDA

pip install torch torchvision torchaudio

pip install fastapi

pip install uvicorn

Write a little code, install model requirements

uvicorn main:app —host —port

You can also look at torchserve but I prefer doing it this way

2023-03-03 07:28:14

(2)

So FastAPI is like a universal frontend to many HuggingFace models? Interesting. I'll give it a look!

2023-03-03 17:10:42

(2)

u/Low-Commercial7163

It’s a framework meant for REST services that can be used for a lot of things, including inference 😀

2023-03-02 19:00:34

(13)

u/utopiah

Maintaining https://fabien.benetou.fr/Content/SelfHostingArtificialIntelligence you might like.

2023-03-03 08:29:11

(7)

u/EidenzGames

LaMa-cleaner to remove objects/people from photos.

Multiple SOTA models available on the interface, user-friendly and easy to install. It's much more powerful than tools like Photoshop & Google Magic Eraser.

2023-03-02 20:33:50

(10)

u/diymatt

I use CodeProject AI for my BlueIris setup. Works great. No false alarms.

2023-03-02 19:30:02

(4)

u/thehackeysack01

create a pull request and update the list.

EDIT

or do it in the opposite order: Fork, update and create a pull request.

2023-03-02 20:03:19

(5)

u/MDSExpro

chaiNNer supports AI models.

2023-03-03 15:19:34

(3)

u/dangernoodle01

I am running https://github.com/oobabooga/text-generation-webui locally paired with Pygmalion 6B parameter version on a 3060 12GB. (It must use 8-bit precision)

It is AMAZING to play with. It had actually already helped me when I bash problem I wanted to solve. Surely it was basic and basic, but it gave me a perfect answer, using MY example.

2023-03-02 21:08:28

(6)

u/opensrcdev

Check out the YOLOv5 / YOLOv8 project. It handles object detection extremely well, using the Pytorch framework for Python. Highly recommended!

Use case: Ingest individual frames from one or more security cameras via RTSP streams, then run object detection against them. Find people, cars, trucks, and more. Log results to a database and send notifications to your phone with any important events.

2023-03-03 22:54:46

(3)

u/Quick_Primary6109

What would be amazing, is if any of these self-hosted AI, could use distributed resources. IE, the 4, 8 year old laptops sitting the garage. Which were decent workstations in their day. Or in a business sense, having an internal AI that can be trained on your data that isn't shared with the world, and utilises via an desktop agent a % of resources from across the companies fleet of PCs.. Any one heard of work towards this ? Reverse cloud almost .

2023-03-04 14:00:52

(1)