LLM ASICs on USB sticks?

Source: nostr

https://snort.social/nevent1qqsg9c49el0uvn262eq8j3ukqx5jvxzrgcvajcxp23dgru3acfsjqdgzyprqcf0xst760qet2tglytfay2e3wmvh9asdehpjztkceyh0s5r9cqcyqqqqqqgt7uh3n

Paper:
https://arxiv.org/abs/2406.02528

Fisch , 15 days ago

That would actually be insane. Right now, I still need my GPU and about 8-10 gigs of VRAM to run a 7B model tho, so idk how that's supposed to work on a phone. Still, being able to run a model that's as good as a 70B model but with the speed and memory usage of a 7B model would be huge.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

JackGreenEarth , 15 days ago

I only need ~4 GB of RAM/VRAM for a 7B model, my GPU only has 6GB VRAM anyway. 7B models are smaller than you think, or you have a very inefficient setup.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Fisch , 15 days ago

That's weird, maybe I actually am doing something wrong. Is it because I'm using GGUF models maybe?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Mike1576218 , 13 days ago

llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Fisch , 12 days ago

Yeah, I usually take the 6bit quants, didn't know the difference is that big. That's probably why tho. Unfortunately, almost all Llama3 models are either 8B or 70B, so there isn't really anything in between but I find Llama3 models to be noticeably better than Llama2 models, otherwise I would have tried bigger models with lower quants.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Chrobin , 15 days ago

I have never worked on machine learning, what does the B stand for? Billion? Bytes?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Fisch , 15 days ago

I think it's how many billion parameters the model has

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Chrobin , 14 days ago

Thanks!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Barx , 16 days ago

Finally. Wrong answers to questions using my phone.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kakes , 16 days ago

Never really occurred to me before how huge a 10x savings would be in terms of parameters on consumer hardware.

Like, obviously 10x is a lot, but with the way things are going, it wouldn't surprise me to see that kind of leap in the next year or two tbh.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...