TurboPilot: A self-hosted Copilot clone that runs a code-completion LLM on your CPU

2023-04-10 13:54:00

(155)

Even if it comes on par with Copilot, or not; I always clap for projects to self host. Currently having a copilot subscription provided by my job, but will try it out one day to run it locally

2023-04-10 23:50:32

(18)

u/DELETED

[deleted]

2023-04-11 01:03:24

(9)

u/invisibleGenX

Some very large companies are rolling it out. But yes they have concerns and there are a lot of upfront discussions.

2023-04-11 08:33:50

(2)

u/bumble2100

Plenty of organizations don’t exactly realize how that works. Especially smaller ones that don’t have an army of lawyers to nudge them. I know a handful of fairly technical startup that uses it, and their reasoning was talking to many dozens of larger companies that were. Honestly the only ones I know for sure are not allowing it are the really big ones (other than Microsoft itself)

2023-04-11 01:36:21

(1)

u/thatguyonthevicinity

Probably enterprise licenses with on prem hosting? That's probably what I imagine our company need to be able to at least consider it lol.

2023-04-11 07:02:51

(1)

u/ixoniq

My employer gets me what I need. New laptop, PC parts for my second PC, software licenses, copilot license, done. They trust me for asking what I need. No questions asked, if I didn’t need it for nee work, I wouldn’t mention it

2023-04-11 11:04:58

(3)

u/thatguyonthevicinity

That's sound like a good employer

2023-04-11 11:07:43

(1)

u/ixoniq

It is. I’ve seen much worse, but this time, wether it’s a 2k MacBook, of 500 bucks of PC hardware, when I request it, no questions asked, I must have a good reason

2023-04-11 01:19:07

(0)

u/neumaticc

i get it free with college lol (.edu email)

just sign up for an online community college and 'do' a class

2023-04-11 08:26:14

(4)

u/bumble2100

You do realize the one saying “lol” is GitHub/Microsoft in that transaction, right?

(0)

how so?

(1)

I showed my employer how it understands the context of the code, and automatically suggests new methods for like API’s and programs it in the way I do it with specific logging, and for it saves time.

Or when using Java for android, and I need a specific thing, which normally takes 80 lines of programming to get it done, one content and it finishes the code for 90% my needs.

That’s mostly huge timesavers which employers love.

Wasn’t hard to convince to let him play for 3 licenses.

2023-04-10 11:55:51

(91)

u/jamesravey_

Hi all,

I got stuck at home over the long weekend with COVID so I decided to see if I could get a code completion model of a similar size to github's copilot running locally on CPU only. This project is heavily inspired and based on fauxpilot which does the same thing but requires relatively flashy GPUS.

This project is still very early and there are some known limitations and bugs (for example, it is quite slow and it can crash when you try to edit a long file) but the suggestions that it makes seem pretty reasonable.

Any questions, comments and feedback welcome - I have a background (PhD) in natural language processing and software engineering so happy to try and answer any questions as best I can. I'll try to stick around and chat in the comments

2023-04-10 16:35:40

(15)

u/HateChoosing_Names

Would a Google Coral help? Those are cheap (but somewhat rare).

2023-04-10 17:01:46

(16)

u/jamesravey_

I'd certainly be interested to try! One of the tricks used to get the model running on a low resource machine is to quantize (compress) the weights of the neural network down to 4 bits from 32 bits. The Llama.cpp library does some fancy maths that is compatible with 4 bits on the cpu but I'm not sure if coral hardware supports 4 bit operations (I need to research it).

2023-04-10 19:07:14

(9)

u/baseketball

Coral is limited to 3 dimensions max. LLM use hundreds to thousands of dimensions for embeddings.

2023-04-10 21:43:14

(5)

u/Vogete

So you're saying if I have tens of thousands of corals, i might have a chance?

2023-04-10 22:46:04

(7)

u/HateChoosing_Names

300 corals and big usb hub. I say it’s doable.

2023-04-10 19:11:10

(2)

u/HateChoosing_Names

Well there goes that idea…

2023-04-10 19:59:53

(2)

u/okk1337

Can't fauxpilot run on the CPU?

2023-04-11 06:14:56

(2)

u/jamesravey_

I don't believe so. On their repo it lists "An NVIDIA GPU with Compute Capability >= 6.0 and enough VRAM to run the model you want." under the prerequisites.

Compute Capability 6 starts with the RTX 2000 series so that's quite a tall order. I did manage to get it running on my Titan GTX with a bit of hackery but I didn't try with CPU only. I imagine it would be painfully slow - even vs my solution - because they only do 16 bit optimisation (As opposed to 4-bit)

2023-04-16 03:53:53

(2)

u/esperalegant

of a similar size to github's copilot

Copilot is based on GPT3 which is a 175 billion parameter model. Yours is a quantized 6 billion parameter model. These are not similar size.

Still though, cool project. I do have privacy reservations about using Copilot so it's nice to see things that can be run locally, even if they won't be as good. Unfortunately, even with quantization, we won't be able to run anything comparable to Copilot until consumer GPUs have 100gb+ of RAM. From what I understand, that size of model would never be able to run on a CPU either because it's requires reading the entire set of tokens for each token in the model, and that can only scale beyond a few billion when massively parallelized on a GPU -unless you want to wait several minutes for each generated word/token, anyway. Correct me if I'm wrong though.

I guess even your current small model must max out most cores on the CPU, right? Do you find other programs becoming less responsive while you're running it?

2023-05-25 17:05:23

(1)

u/heaven00

was thinking of a similar idea,

any reason to choose fauxpilot? I was thinking of using https://huggingface.co/bigcode/starcoder instead.

2023-04-10 15:27:01

(15)

u/CAG_Gonzo

I didn't even realize this would be a logical thing developers would want but it makes perfect sense. I've been wanting to run some AI stuff for a while but the hardware required presents a sizable obstacle. But you had me at docker.

I majored in computer science but alas, I do not make use of it as often as I wished beyond VBA for occasional work projects and self-hosting/Linux familiarization on my own time. But this makes me eager to give it a whirl on some of my existing Java code and see how I can learn!

2023-04-10 15:38:36

(36)

u/brandontaylor1

Why the hell didn't MS name it Clippy?

2023-04-11 01:04:56

(3)

u/michael9dk

Don't worry, next year we will get Microsoft Turbo Clippy, that's powered from an autonome self-taught bing-engine :D

[/sarcasm off]

2023-04-12 08:48:09

(1)

u/unit_511

The new and improved version of Clippy can now gaslight you about the current year and even threaten you if the emotional manipulation fails.

2023-04-11 01:05:03

(1)

u/invisibleGenX

I think the Office-integrated Microsoft Copilot should have been named Clippy.

2023-04-11 01:28:46

(4)

u/michael9dk

THIS is why I love open source.
One make a proof of concept, and everyone can play with it, learn, and share their enhancements.

2023-04-10 16:35:44

(3)

u/DELETED

amazing work

2023-04-10 21:01:05

(3)

u/Curmudgeons_dungeon

I so wish someone would make a model that does Powershell . Also has anyone thought of doing many small models for example one for each programming language ?

(2)

Is there a demo?

(1)

I'm starting to learn ML/AI, so please dont be hard on my basic questions.

You mentioned cpp, but wouldn't cpp, with the cpu's native instruction sets, be faster than python (I haven't taken a deepdive in python, yet)?

As I love C#, I'm wondering if there is a significant performance hit, when using a cpp library running with/without unsafe code execution, compared to Python, in this kind of load?

Side-question. Don't misunderstand me, python is great, but is python the only language that scientists learn? Shouldn't they be introduce to the benefits of modern programming languages?

2023-04-11 06:10:56

(4)

u/jamesravey_

of course - happy to help!

You're completely right - in an apples-to-apples comparison Python, C++ and C# come out on top.

One of the biggest tricks that Python has up its sleeves is its ability to leverage compiled C/C++ plugins and wrap them up inside Python's runtimes as if they were python code. That way you get all the nice things about Python itself being interpreted (line by line step through, REPL for quick prototyping/trial-and-error )and all the benefits of the compiled extension (super super fast).

Once you get into the Python ecosystem, you'll notice that this pattern is super common, maths and ML libraries that seem like they should be performance sensitive are actually written in C/C++ with a Python wrapper around them (Torch, Tensorflow, Numpy, ScikitLearn). Torch and TF are particularly powerful because they are partially written in C code that specifically targets GPU hardware as opposed to standard CPUs.

You ask whether data scientists should learn any other languages? In short, yes I think they should! Many data scientists come from a mathematical background and learn programming second - I came from software engineering and learned the maths second - I actually think having a solid background in the software side of things made me a better and more capable data scientist - with a low level understanding of the silicon underneath you can write more efficient code etc.

Of course that's just my opinion and I suppose the trade off is that it takes me a little longer to do the formal stats/maths side of things.

2023-04-11 07:32:26

(2)

u/michael9dk

Thank you, I think I'll have to prioritize python on my todo list.

TurboPilot: A self-hosted Copilot clone that runs a code-completion LLM on your CPU

Comments (35)