Newest 'nlp+python+huggingface-transformers' Questions

0 votes

0 answers

48 views

Transformers PaliGemma evaluate and compute_loss fail with tensors/device errors

I'm loading a PaliGemma2 model google/paligemma2-3b-pt-224 and trying to fine-tune using Trainer/Seq2SeqTrainer. If I add evaluation, this fails. After doing some digging, I found that this only ...

BlGene

17

asked Jan 31 at 10:44

1 vote

1 answer

37 views

--user-dir in Fairseq in failing

I’m trying to fine-tune the IndicTrans2 model using fairseq-train, but I keep encountering the following error: fairseq-train: error: argument --user-dir: invalid Optional value: 'C:/Users/sasid/...

Sasi Dhar

33

asked Dec 16, 2024 at 7:03

1 vote

2 answers

706 views

Llama-3.2-1B-Instruct generate inconsistent output

I want to use Llama-3.2-1B-Instruct model, and although I have set "temperature": 0.0, "top_p":0.0 and "top_k":0, it still generates inconsistent output. This is how my ...

parvaneh shayegh

538

asked Nov 28, 2024 at 13:02

0 votes

0 answers

37 views

How to run inference large size model in multi-GPU effeciently?

I'm trying to run only inference with large 70B sized model with multi-GPU env, but facing some issues. The loading time takes so long, about 15mins. I'm not sure this works properly to shard model ...

James Jang

21

asked Nov 25, 2024 at 1:30

0 votes

0 answers

78 views

Emotion Analysis with bhadresh-savani/bert-base-uncased-emotion

Hope I can get some help here please! I am trying to run an emotion analysis model from Hugging Face rep. (bhadresh-savani/bert-base-uncased-emotion) and I am struggling with the model run as it's ...

Rita Bini

1

asked Nov 12, 2024 at 14:03

3 votes

0 answers

153 views

How to Log Custom Metrics with Metadata in Hugging Face Trainer during Evaluation?

I'm working on a sentence regression task using Hugging Face’s Trainer. Each sample consists of: input_ids: The tokenized sentence. labels: A numerical scalar target (for regression). metadata: A ...

enter_thevoid

163

asked Oct 29, 2024 at 9:50

3 votes

2 answers

1k views

How to Load a 4-bit Quantized VLM Model from Hugging Face with Transformers?

I’m new to quantization and working with visual language models (VLM).I’m trying to load a 4-bit quantized version of the Ovis1.6-Gemma model from Hugging Face using the transformers library. I ...

meysam

83

asked Oct 27, 2024 at 9:31

0 votes

0 answers

206 views

facebook/m2m100_418M model - how to translate longer sequences of text

I have this extracted the following text from Wikipedia's Wiki (https://en.wikipedia.org/wiki/Wiki), A wiki is a form of hypertext publication on the internet which is collaboratively edited and ...

Naveen Reddy Marthala

3,153

asked Oct 10, 2024 at 9:45

0 votes

0 answers

103 views

What does SentenceTransformers provide to simplify sentence embedding?

I find SentenceTransformers seems to be the preferred open source option for sentence embedding. But I can't figure out what sbert provide exactly to simplify sentence embedding compared to using ...

Qiulang

12.6k

asked Sep 25, 2024 at 10:24

0 votes

1 answer

74 views

AutoModelForSequenceClassification loss not decrease

from datasets import load_dataset from torch.utils.data import DataLoader from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch from tqdm import tqdm def ...

naivebird

33

asked Sep 21, 2024 at 16:24

0 votes

1 answer

57 views

Seq2Seq trainer.train() keeps giving indexing error

I am trying to do a machine translation from Hindi to Sanskrit using NLLB model. But I keep getting the error: IndexError: Invalid key: 39463 is out of bounds for size 0. The error is coming when ...

user27310271

3

asked Sep 20, 2024 at 8:43

0 votes

1 answer

112 views

Jupyter Lab kernel dies before starting the trainer.train()

Working on fine-tuning phi-3.5-mini, and when trying to run the trainer.train() i am getting the following error: ***** Running training ***** Num examples = 647 Num Epochs = 3 Instantaneous ...

Ibrahim Abed-alghafer

13

asked Sep 16, 2024 at 17:38

1 vote

1 answer

531 views

Fine-tuning a Pretrained Model with Quantization and AMP: Scaler Error "Attempting to Unscale FP16 Gradients"

I am trying to fine-tune a pretrained model with limited VRAM. To achieve this, I am using quantization and automatic mixed precision (AMP). However, I am encountering an issue that I can't seem to ...

landings

770

asked Sep 3, 2024 at 8:38

1 vote

1 answer

6k views

cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'

I've been using LLAMA 2 for research for a few months now and I import as follows: from transformers import AutoModelForCausalLM, AutoTokenizer device = torch.device("cuda") tokenizer = ...

lucasa.lisboa

35

asked Aug 27, 2024 at 17:20

1 vote

1 answer

414 views

How to Visualize Cross-Attention Matrices in MarianMTModel During Output Generation

I am working on a machine translation task using the MarianMTModel from the Hugging Face transformers library. Specifically, I want to visualize the cross-attention matrices during the model's ...

Lukas

47

asked Aug 25, 2024 at 20:13

Collectives™ on Stack Overflow

All Questions

Transformers PaliGemma evaluate and compute_loss fail with tensors/device errors

--user-dir in Fairseq in failing

Llama-3.2-1B-Instruct generate inconsistent output

How to run inference large size model in multi-GPU effeciently?

Emotion Analysis with bhadresh-savani/bert-base-uncased-emotion

How to Log Custom Metrics with Metadata in Hugging Face Trainer during Evaluation?

How to Load a 4-bit Quantized VLM Model from Hugging Face with Transformers?

facebook/m2m100_418M model - how to translate longer sequences of text

What does SentenceTransformers provide to simplify sentence embedding?

AutoModelForSequenceClassification loss not decrease

Seq2Seq trainer.train() keeps giving indexing error

Jupyter Lab kernel dies before starting the trainer.train()

Fine-tuning a Pretrained Model with Quantization and AMP: Scaler Error "Attempting to Unscale FP16 Gradients"

cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'

How to Visualize Cross-Attention Matrices in MarianMTModel During Output Generation

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags