All Questions
409 questions
0
votes
0
answers
48
views
Transformers PaliGemma evaluate and compute_loss fail with tensors/device errors
I'm loading a PaliGemma2 model google/paligemma2-3b-pt-224 and trying to fine-tune using Trainer/Seq2SeqTrainer. If I add evaluation, this fails. After doing some digging, I found that this only ...
1
vote
1
answer
37
views
--user-dir in Fairseq in failing
I’m trying to fine-tune the IndicTrans2 model using fairseq-train, but I keep encountering the following error:
fairseq-train: error: argument --user-dir: invalid Optional value: 'C:/Users/sasid/...
1
vote
2
answers
706
views
Llama-3.2-1B-Instruct generate inconsistent output
I want to use Llama-3.2-1B-Instruct model, and although I have set "temperature": 0.0, "top_p":0.0 and "top_k":0, it still generates inconsistent output. This is how my ...
0
votes
0
answers
37
views
How to run inference large size model in multi-GPU effeciently?
I'm trying to run only inference with large 70B sized model with multi-GPU env, but facing some issues.
The loading time takes so long, about 15mins.
I'm not sure this works properly to shard model ...
0
votes
0
answers
78
views
Emotion Analysis with bhadresh-savani/bert-base-uncased-emotion
Hope I can get some help here please!
I am trying to run an emotion analysis model from Hugging Face rep. (bhadresh-savani/bert-base-uncased-emotion) and I am struggling with the model run as it's ...
3
votes
0
answers
153
views
How to Log Custom Metrics with Metadata in Hugging Face Trainer during Evaluation?
I'm working on a sentence regression task using Hugging Face’s Trainer. Each sample consists of:
input_ids: The tokenized sentence.
labels: A numerical scalar target (for regression).
metadata: A ...
3
votes
2
answers
1k
views
How to Load a 4-bit Quantized VLM Model from Hugging Face with Transformers?
I’m new to quantization and working with visual language models (VLM).I’m trying to load a 4-bit quantized version of the Ovis1.6-Gemma model from Hugging Face using the transformers library. I ...
0
votes
0
answers
206
views
facebook/m2m100_418M model - how to translate longer sequences of text
I have this extracted the following text from Wikipedia's Wiki (https://en.wikipedia.org/wiki/Wiki),
A wiki is a form of hypertext publication on the internet which is collaboratively edited and ...
0
votes
0
answers
103
views
What does SentenceTransformers provide to simplify sentence embedding?
I find SentenceTransformers seems to be the preferred open source option for sentence embedding. But I can't figure out what sbert provide exactly to simplify sentence embedding compared to using ...
0
votes
1
answer
74
views
AutoModelForSequenceClassification loss not decrease
from datasets import load_dataset
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from tqdm import tqdm
def ...
0
votes
1
answer
57
views
Seq2Seq trainer.train() keeps giving indexing error
I am trying to do a machine translation from Hindi to Sanskrit using NLLB model. But I keep getting the error:
IndexError: Invalid key: 39463 is out of bounds for size 0.
The error is coming when ...
0
votes
1
answer
112
views
Jupyter Lab kernel dies before starting the trainer.train()
Working on fine-tuning phi-3.5-mini, and when trying to run the trainer.train() i am getting the following error:
***** Running training *****
Num examples = 647
Num Epochs = 3
Instantaneous ...
1
vote
1
answer
531
views
Fine-tuning a Pretrained Model with Quantization and AMP: Scaler Error "Attempting to Unscale FP16 Gradients"
I am trying to fine-tune a pretrained model with limited VRAM. To achieve this, I am using quantization and automatic mixed precision (AMP). However, I am encountering an issue that I can't seem to ...
1
vote
1
answer
6k
views
cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'
I've been using LLAMA 2 for research for a few months now and I import as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda")
tokenizer = ...
1
vote
1
answer
414
views
How to Visualize Cross-Attention Matrices in MarianMTModel During Output Generation
I am working on a machine translation task using the MarianMTModel from the Hugging Face transformers library. Specifically, I want to visualize the cross-attention matrices during the model's ...