Skip to main content

All Questions

2 votes
1 answer
93 views

Error in getting Captum text explanations for text classification

I have the following code that I am using to identify the most influential words used to correctly predict the text in the test dataset import pandas as pd import torch from torch.utils.data import ...
Nayantara Jeyaraj's user avatar
0 votes
0 answers
64 views

Memory increasing after hugging face generate method

I wanted to make an inference with codegemma model from huggingface, but when I use model.generate(**inputs) method GPU memory cost increases from 39 GB to 49 GB in peak usage with torch profiler no ...
user avatar
0 votes
0 answers
30 views

NaN loss when training LSTM-Attention

During model training, the loss value suddenly became Nan. Even though I change the parameters a lot, it still failed. I checked the error when training and it prints the error in the output, not the ...
Khatu Huynh's user avatar
0 votes
1 answer
84 views

SBERT Fine-tuning always stops before finish all epochs

I'm working on a project using the SBERT pre-trained models (specifically MiniLM) for a text classification project with 995 classifications. I am following the steps laid out here for the most part ...
SohmOuse's user avatar
0 votes
0 answers
69 views

Transformer Model Repeating Same Codon During Inference Despite High Training Accuracy

I'm working on a transformer-based model to translate amino acids to codons. During training and validation, my model achieves 95-98% accuracy. However, during inference, I encounter an issue where ...
Farshid B's user avatar
0 votes
1 answer
534 views

ImportError and TypeError Issues in Nougat OCR with BARTDecoder and cached_property

I'm facing issues while running an OCR process using Nougat with two different errors for two different users. The errors are related to importing cached_property and an unexpected keyword argument ...
Charlie Parker's user avatar
0 votes
0 answers
55 views

Text to Openpose and Weird RNN bugs

I want to create AI that generate openpose from textual description for example if input "a man running" output would be like the image I provided Is there any model architecture recommend ...
Peemmaphat Sripongsai's user avatar
1 vote
2 answers
882 views

Why do I get different embeddings when I perform batch encoding in huggingface MT5 model?

I am trying to encode some text using HuggingFace's mt5-base model. I am using the model as shown below from transformers import MT5EncoderModel, AutoTokenizer model = MT5EncoderModel.from_pretrained(...
BBloggsbott's user avatar
0 votes
1 answer
127 views

Why token embedding different from the embedding by the BartForConditionalGeneration model

Why both the embeddings are different even when i generate them using same BartForConditionalGenration model? First embedding is generated by combining token embedding and positional embedding from ...
New_user's user avatar
0 votes
1 answer
359 views

How to convert Spacy Model .pkl file into .pt/.pth pytorch supported format

I have spacy model which I am using for inference in .pkl format. The datatype of .pkl file is <class 'spacy.lang.en.English'>. I want to make inference script run on GPU. I tried using ...
RajeshM's user avatar
  • 89
0 votes
2 answers
298 views

Create a multilingual chatbot

I created a chatbot using PyTorch an I want to make it support the French language. Note that I want to train the chatbot so that it can respond to technical questions. One of the things that came to ...
Amine's user avatar
  • 1
0 votes
1 answer
1k views

OutOfMemoryError: CUDA out of memory in LLM

I have a list of texts and I need to send each text to large language model(llama2-7b). However I am getting CUDA out of memory error. I am running on A100 on Google Colab. Here is my try: path = &...
grey's user avatar
  • 59
2 votes
1 answer
2k views

How does one reinitialize the weights of a Hugging Face LLaMA v2 model the official way as the original model?

I want to reinitialize the weights of a LLaMA v2 model I'm using/downloading. I went through all the documentation and the source code from their Hugging  Face code: https://github.com/huggingface/...
Charlie Parker's user avatar
2 votes
2 answers
2k views

How to get perplexity per token rather than average perplexity?

I can get the perplexity of a whole sentence from here: device = "cuda" from transformers import GPT2LMHeadModel, GPT2TokenizerFast device = "cuda" model_id = "gpt2" ...
Penguin's user avatar
  • 2,581
0 votes
1 answer
180 views

How does an instance of pytorch's `nn.Linear()` process a tuple of tensors?

In the annotated transformer's implementation of multi-head attention, three tensors (query, key, value) are all passed to a nn.Linear(d_model, d_model): # some class definition ... self.linears = ...
Lukas's user avatar
  • 543

15 30 50 per page
1
2 3 4 5 6