Skip to main content

All Questions

0 votes
0 answers
48 views

Transformers PaliGemma evaluate and compute_loss fail with tensors/device errors

I'm loading a PaliGemma2 model google/paligemma2-3b-pt-224 and trying to fine-tune using Trainer/Seq2SeqTrainer. If I add evaluation, this fails. After doing some digging, I found that this only ...
BlGene's user avatar
  • 17
1 vote
1 answer
37 views

--user-dir in Fairseq in failing

I’m trying to fine-tune the IndicTrans2 model using fairseq-train, but I keep encountering the following error: fairseq-train: error: argument --user-dir: invalid Optional value: 'C:/Users/sasid/...
Sasi Dhar's user avatar
1 vote
2 answers
706 views

Llama-3.2-1B-Instruct generate inconsistent output

I want to use Llama-3.2-1B-Instruct model, and although I have set "temperature": 0.0, "top_p":0.0 and "top_k":0, it still generates inconsistent output. This is how my ...
parvaneh shayegh's user avatar
0 votes
0 answers
37 views

How to run inference large size model in multi-GPU effeciently?

I'm trying to run only inference with large 70B sized model with multi-GPU env, but facing some issues. The loading time takes so long, about 15mins. I'm not sure this works properly to shard model ...
James Jang's user avatar
0 votes
0 answers
78 views

Emotion Analysis with bhadresh-savani/bert-base-uncased-emotion

Hope I can get some help here please! I am trying to run an emotion analysis model from Hugging Face rep. (bhadresh-savani/bert-base-uncased-emotion) and I am struggling with the model run as it's ...
Rita Bini's user avatar
3 votes
0 answers
153 views

How to Log Custom Metrics with Metadata in Hugging Face Trainer during Evaluation?

I'm working on a sentence regression task using Hugging Face’s Trainer. Each sample consists of: input_ids: The tokenized sentence. labels: A numerical scalar target (for regression). metadata: A ...
enter_thevoid's user avatar
3 votes
2 answers
1k views

How to Load a 4-bit Quantized VLM Model from Hugging Face with Transformers?

I’m new to quantization and working with visual language models (VLM).I’m trying to load a 4-bit quantized version of the Ovis1.6-Gemma model from Hugging Face using the transformers library. I ...
meysam's user avatar
  • 83
0 votes
0 answers
206 views

facebook/m2m100_418M model - how to translate longer sequences of text

I have this extracted the following text from Wikipedia's Wiki (https://en.wikipedia.org/wiki/Wiki), A wiki is a form of hypertext publication on the internet which is collaboratively edited and ...
Naveen Reddy Marthala's user avatar
0 votes
0 answers
103 views

What does SentenceTransformers provide to simplify sentence embedding?

I find SentenceTransformers seems to be the preferred open source option for sentence embedding. But I can't figure out what sbert provide exactly to simplify sentence embedding compared to using ...
Qiulang's user avatar
  • 12.6k
0 votes
1 answer
74 views

AutoModelForSequenceClassification loss not decrease

from datasets import load_dataset from torch.utils.data import DataLoader from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch from tqdm import tqdm def ...
naivebird's user avatar
0 votes
1 answer
57 views

Seq2Seq trainer.train() keeps giving indexing error

I am trying to do a machine translation from Hindi to Sanskrit using NLLB model. But I keep getting the error: IndexError: Invalid key: 39463 is out of bounds for size 0. The error is coming when ...
user27310271's user avatar
0 votes
1 answer
112 views

Jupyter Lab kernel dies before starting the trainer.train()

Working on fine-tuning phi-3.5-mini, and when trying to run the trainer.train() i am getting the following error: ***** Running training ***** Num examples = 647 Num Epochs = 3 Instantaneous ...
Ibrahim Abed-alghafer's user avatar
1 vote
1 answer
531 views

Fine-tuning a Pretrained Model with Quantization and AMP: Scaler Error "Attempting to Unscale FP16 Gradients"

I am trying to fine-tune a pretrained model with limited VRAM. To achieve this, I am using quantization and automatic mixed precision (AMP). However, I am encountering an issue that I can't seem to ...
landings's user avatar
  • 770
1 vote
1 answer
6k views

cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'

I've been using LLAMA 2 for research for a few months now and I import as follows: from transformers import AutoModelForCausalLM, AutoTokenizer device = torch.device("cuda") tokenizer = ...
lucasa.lisboa's user avatar
1 vote
1 answer
414 views

How to Visualize Cross-Attention Matrices in MarianMTModel During Output Generation

I am working on a machine translation task using the MarianMTModel from the Hugging Face transformers library. Specifically, I want to visualize the cross-attention matrices during the model's ...
Lukas's user avatar
  • 47

15 30 50 per page
1
2 3 4 5
28