1. Home
2. Questions
3. Tags
5. Discussions Labs
6. Chat
7. Users
9. Jobs
10. Companies
11. Collectives
12. Communities for your favorite technologies. Explore all Collectives
Teams

Ask questions, find answers and collaborate at work with Stack Overflow for Teams.
Try Teams for free Explore Teams
Teams
Ask questions, find answers and collaborate at work with Stack Overflow for Teams. Explore Teams

All Questions

Ask Question

2 questions

-1 votes

1 answer

2k views

I load a float32 Hugging Face model, cast it to float16, and save it. How can I load it as float16?

I load a huggingface-transformers float32 model, cast it to float16, and save it. How can I load it as float16? Example: # pip install transformers from transformers import ...

Franck Dernoncourt

83.7k

asked Jul 5, 2024 at 23:58

1 vote

1 answer

948 views

Can language model inference on a CPU, save memory by quantizing?

For example, according to https://cocktailpeanut.github.io/dalai/#/ the relevant figures for LLaMA-65B are: Full: The model takes up 432.64GB Quantized: 5.11GB * 8 = 40.88GB The full model won't fit ...

rwallace

33.7k

asked Mar 16, 2023 at 6:41

Collectives™ on Stack Overflow

All Questions

I load a float32 Hugging Face model, cast it to float16, and save it. How can I load it as float16?

Can language model inference on a CPU, save memory by quantizing?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags