All Questions
Tagged with huggingface machine-learning
72 questions
0
votes
0
answers
27
views
TrainingArguments: Do "packing" and "group_by_length" counteract each other?
In the HuggingFace's TrainingArguments and SFTConfig (inheriting from TrainingArguments), there are two arguments for initializing SFTConfig():
group_by_length: Whether or not to group together ...
0
votes
1
answer
59
views
Getting Cuda out of memory when importing microsoft/Orca-2-13b from hugging faces
I am using Ubuntu 24.04.1 on an AWS EC2 instance g5.8xlarge.
I am receiving the following error message:
OutOfMemoryError: Allocation on device
Code:
import os
os.environ["...
-1
votes
1
answer
52
views
Deconstructiong the Stable Diffusion 3.5 pipeline
I am trying to deconstruct the SD3.5 (specifically 3.5 medium) pipeline in order to have a controlled process over the denoising steps. I can't do callbacks because I need to modify the latent ...
1
vote
1
answer
96
views
Why does my Llama 3.1 model act differently between AutoModelForCausalLM and LlamaForCausalLM?
I have one set of weights, one tokenizer, the same prompt, and identical generation parameters. Yet somehow, when I load the model using AutoModelForCausalLM, I get one output, and when I construct it ...
6
votes
2
answers
2k
views
Why does HuggingFace-provided Deepseek code result in an 'Unknown quantization type' error?
I am using this code from huggingface:
This code is directly pasted from the HuggingFace website's page on deepseek and is supposed to be plug-and-play code:
from transformers import pipeline
...
0
votes
1
answer
74
views
Image segmentation ONNX from huggingface produces very diferent results when used in ML.Net
I have been trying to get an image segmentation model from huggingface (RMBG-2.0) to work for inference using ML.NET. After a lot of trial and error, I finally got the code to compile and produce an ...
0
votes
1
answer
393
views
How to Log Training Loss at Step Zero in Hugging Face Trainer or SFT Trainer?
I'm using the Hugging Face Trainer (or SFTTrainer) for fine-tuning, and I want to log the training loss at step 0 (before any training steps are executed). I know there's an eval_on_start option for ...
1
vote
1
answer
159
views
How to Compute Teacher-Forced Accuracy (TFA) for Hugging Face Models While Handling EOS Tokens?
I am trying to compute Teacher-Forced Accuracy (TFA) for Hugging Face models, ensuring the following:
EOS Token Handling: The model should be rewarded for predicting the first EOS token.
Ignoring ...
0
votes
0
answers
173
views
Jupyter Notebook is crashing when I want to run HuggingFace models
I am using Jupyter Notebook for running some ML models from HuggingFace.
I am using Mac (M2 Chip, Memory 32 GB)
This is my code:
import torch
from transformers import AutoTokenizer, ...
1
vote
1
answer
235
views
How to add EOS when training T5?
I'm a little puzzled where (and if) EOS tokens are being added when using Huggignface's trainer classes to train a T5 (LongT5 actually) model.
The data set contains pairs of text like this:
from
to
...
0
votes
1
answer
77
views
An error occurs during the execution of UNet when the batch size is not equal to 1
I'm trying to run a Stable Diffusion model using the code provided in the DDIM Inversion tutorial. However, when the input's batch size is set to a value greater than 1 (e.g., 32), I encounter the ...
0
votes
0
answers
70
views
ValueError: If no `decoder_input_ids` or `decoder_inputs_embeds` are passed, `input_ids` cannot be `None`
I am trying to get the decoder hidden state of the florence 2 model. I was following this https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py to understand the parameters ...
0
votes
1
answer
74
views
AutoModelForSequenceClassification loss not decrease
from datasets import load_dataset
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from tqdm import tqdm
def ...
0
votes
0
answers
142
views
HuggingFace accelerate device error when running evaluation
I am running some experiments on a multi-GPU cluster, and I'm using accelerate. I'm trying to calculate some metrics after every batch iteration in the training dataloader. While the training code ...
0
votes
1
answer
132
views
How to reinitialize from scratch GPT2 XL in HuggingFace?
I'm trying to confirm that my GPT-2 model is being trained from scratch, rather than using any pre-existing pre-trained weights. Here's my approach:
Load the pre-trained GPT-2 XL model: I load a pre-...