0

I am trying to make a gradio chatbot in Hugging Face Spaces using Mistral-7B-v0.1 model. As this is a large model, I have to quantize, else the free 50G storage gets full. I am using bitsandbytes to do so, but I get an Import Error.

This is the HF Space url - https://huggingface.co/spaces/AnishHF/Mistral-7B

Traceback:

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
Traceback (most recent call last):
  File "/home/user/app/app.py", line 15, in <module>
    model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=quantization_config, device_map="auto", token=access_token)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3165, in from_pretrained
    hf_*********.validate_environment(
  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
    raise ImportError(
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

Note - I am using the free CPU with 16GB RAM, so torch isn't compiled with GPU

I have added both accelerate and bitsandbytes in requirements.txt (huggingface.co/spaces/AnishHF/Mistral-7B/blob/main/requirements.txt)

I have also tried changing bitsandbytes to bitsandbytes==0.43.1 (which I think is the latest version), but it didn't solve the problem.

Below is the full code (app.py)

import os
import bitsandbytes as bnb
import torch
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

access_token = os.environ["GATED_ACCESS_TOKEN"]

quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype="float16",
)

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=quantization_config, device_map="auto", token=access_token)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

# Function to generate text using the model
def generate_text(prompt):
    text = prompt
    inputs = tokenizer(text, return_tensors="pt")
    
    outputs = model.generate(**inputs, max_new_tokens=20)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Create the Gradio interface
iface = gr.Interface(
    fn=generate_text,
    inputs=[
        gr.inputs.Textbox(lines=5, label="Input Prompt"),
    ],
    outputs=gr.outputs.Textbox(label="Generated Text"),
    title="MisTRALText Generation",
    description="Use this interface to generate text using the MisTRAL language model.",
)

# Launch the Gradio interface
iface.launch()

Edit: I tried running the same code locally on a Raspberry Pi, which resulted in the same error. So I don't think it is a problem with Hugging Face Spaces, but a problem with the library or my code.

Any solution, including another method to perform FP4 quantization without using bitsandbytes would help.

1 Answer 1

0

You can try restarting or reconnecting your gpu and reinstalling library again using:

pip install -q -U bitsandbytes
pip install -q -U git+https://github.com/huggingface/transformers.git
pip install -q -U git+https://github.com/huggingface/peft.git
pip install -q -U git+https://github.com/huggingface/accelerate.git

For more information you can checkout at: https://huggingface.co/blog/4bit-transformers-bitsandbytes

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.