How to resolve Import Error when using quantization in bitsandbytes

Question

I am trying to make a gradio chatbot in Hugging Face Spaces using Mistral-7B-v0.1 model. As this is a large model, I have to quantize, else the free 50G storage gets full. I am using bitsandbytes to do so, but I get an Import Error.

This is the HF Space url - https://huggingface.co/spaces/AnishHF/Mistral-7B

Traceback:

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
Traceback (most recent call last):
  File "/home/user/app/app.py", line 15, in <module>
    model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=quantization_config, device_map="auto", token=access_token)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3165, in from_pretrained
    hf_*********.validate_environment(
  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
    raise ImportError(
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

Note - I am using the free CPU with 16GB RAM, so torch isn't compiled with GPU

I have added both accelerate and bitsandbytes in requirements.txt (huggingface.co/spaces/AnishHF/Mistral-7B/blob/main/requirements.txt)

I have also tried changing bitsandbytes to bitsandbytes==0.43.1 (which I think is the latest version), but it didn't solve the problem.

Below is the full code (app.py)

import os
import bitsandbytes as bnb
import torch
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

access_token = os.environ["GATED_ACCESS_TOKEN"]

quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype="float16",
)

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=quantization_config, device_map="auto", token=access_token)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

# Function to generate text using the model
def generate_text(prompt):
    text = prompt
    inputs = tokenizer(text, return_tensors="pt")
    
    outputs = model.generate(**inputs, max_new_tokens=20)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Create the Gradio interface
iface = gr.Interface(
    fn=generate_text,
    inputs=[
        gr.inputs.Textbox(lines=5, label="Input Prompt"),
    ],
    outputs=gr.outputs.Textbox(label="Generated Text"),
    title="MisTRALText Generation",
    description="Use this interface to generate text using the MisTRAL language model.",
)

# Launch the Gradio interface
iface.launch()

Edit: I tried running the same code locally on a Raspberry Pi, which resulted in the same error. So I don't think it is a problem with Hugging Face Spaces, but a problem with the library or my code.

Any solution, including another method to perform FP4 quantization without using bitsandbytes would help.

Pratik Zagade · Accepted Answer · 2024-08-27 12:01:05Z

0

You can try restarting or reconnecting your gpu and reinstalling library again using:

pip install -q -U bitsandbytes
pip install -q -U git+https://github.com/huggingface/transformers.git
pip install -q -U git+https://github.com/huggingface/peft.git
pip install -q -U git+https://github.com/huggingface/accelerate.git

For more information you can checkout at: https://huggingface.co/blog/4bit-transformers-bitsandbytes

answered Aug 27, 2024 at 12:01

Pratik Zagade

112 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to resolve Import Error when using quantization in bitsandbytes

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related