Mike1576218 ,

llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • artificial_intel@lemmy.ml
  • test
  • worldmews
  • mews
  • All magazines