Skip to main content

All Questions

0 votes
0 answers
27 views

TrainingArguments: Do "packing" and "group_by_length" counteract each other?

In the HuggingFace's TrainingArguments and SFTConfig (inheriting from TrainingArguments), there are two arguments for initializing SFTConfig(): group_by_length: Whether or not to group together ...
JoyfulPanda's user avatar
  • 1,057
0 votes
1 answer
59 views

Getting Cuda out of memory when importing microsoft/Orca-2-13b from hugging faces

I am using Ubuntu 24.04.1 on an AWS EC2 instance g5.8xlarge. I am receiving the following error message: OutOfMemoryError: Allocation on device Code: import os os.environ["...
Wolfy's user avatar
  • 470
-1 votes
1 answer
52 views

Deconstructiong the Stable Diffusion 3.5 pipeline

I am trying to deconstruct the SD3.5 (specifically 3.5 medium) pipeline in order to have a controlled process over the denoising steps. I can't do callbacks because I need to modify the latent ...
Curious Scientist's user avatar
1 vote
1 answer
96 views

Why does my Llama 3.1 model act differently between AutoModelForCausalLM and LlamaForCausalLM?

I have one set of weights, one tokenizer, the same prompt, and identical generation parameters. Yet somehow, when I load the model using AutoModelForCausalLM, I get one output, and when I construct it ...
han mo's user avatar
  • 23
6 votes
2 answers
2k views

Why does HuggingFace-provided Deepseek code result in an 'Unknown quantization type' error?

I am using this code from huggingface: This code is directly pasted from the HuggingFace website's page on deepseek and is supposed to be plug-and-play code: from transformers import pipeline ...
Akshit Gulyan's user avatar
0 votes
1 answer
74 views

Image segmentation ONNX from huggingface produces very diferent results when used in ML.Net

I have been trying to get an image segmentation model from huggingface (RMBG-2.0) to work for inference using ML.NET. After a lot of trial and error, I finally got the code to compile and produce an ...
alepee's user avatar
  • 1
0 votes
1 answer
393 views

How to Log Training Loss at Step Zero in Hugging Face Trainer or SFT Trainer?

I'm using the Hugging Face Trainer (or SFTTrainer) for fine-tuning, and I want to log the training loss at step 0 (before any training steps are executed). I know there's an eval_on_start option for ...
Charlie Parker's user avatar
1 vote
1 answer
159 views

How to Compute Teacher-Forced Accuracy (TFA) for Hugging Face Models While Handling EOS Tokens?

I am trying to compute Teacher-Forced Accuracy (TFA) for Hugging Face models, ensuring the following: EOS Token Handling: The model should be rewarded for predicting the first EOS token. Ignoring ...
Charlie Parker's user avatar
0 votes
0 answers
173 views

Jupyter Notebook is crashing when I want to run HuggingFace models

I am using Jupyter Notebook for running some ML models from HuggingFace. I am using Mac (M2 Chip, Memory 32 GB) This is my code: import torch from transformers import AutoTokenizer, ...
taga's user avatar
  • 3,913
1 vote
1 answer
235 views

How to add EOS when training T5?

I'm a little puzzled where (and if) EOS tokens are being added when using Huggignface's trainer classes to train a T5 (LongT5 actually) model. The data set contains pairs of text like this: from to ...
gphilip's user avatar
  • 706
0 votes
1 answer
77 views

An error occurs during the execution of UNet when the batch size is not equal to 1

I'm trying to run a Stable Diffusion model using the code provided in the DDIM Inversion tutorial. However, when the input's batch size is set to a value greater than 1 (e.g., 32), I encounter the ...
young's user avatar
  • 11
0 votes
0 answers
70 views

ValueError: If no `decoder_input_ids` or `decoder_inputs_embeds` are passed, `input_ids` cannot be `None`

I am trying to get the decoder hidden state of the florence 2 model. I was following this https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py to understand the parameters ...
user10418143's user avatar
0 votes
1 answer
74 views

AutoModelForSequenceClassification loss not decrease

from datasets import load_dataset from torch.utils.data import DataLoader from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch from tqdm import tqdm def ...
naivebird's user avatar
0 votes
0 answers
142 views

HuggingFace accelerate device error when running evaluation

I am running some experiments on a multi-GPU cluster, and I'm using accelerate. I'm trying to calculate some metrics after every batch iteration in the training dataloader. While the training code ...
M. Koopmans's user avatar
0 votes
1 answer
132 views

How to reinitialize from scratch GPT2 XL in HuggingFace?

I'm trying to confirm that my GPT-2 model is being trained from scratch, rather than using any pre-existing pre-trained weights. Here's my approach: Load the pre-trained GPT-2 XL model: I load a pre-...
Charlie Parker's user avatar

15 30 50 per page
1
2 3 4 5