All Questions
88 questions
2
votes
1
answer
93
views
Error in getting Captum text explanations for text classification
I have the following code that I am using to identify the most influential words used to correctly predict the text in the test dataset
import pandas as pd
import torch
from torch.utils.data import ...
0
votes
0
answers
64
views
Memory increasing after hugging face generate method
I wanted to make an inference with codegemma model from huggingface, but when I use model.generate(**inputs) method GPU memory cost increases from 39 GB to 49 GB in peak usage with torch profiler no ...
0
votes
0
answers
30
views
NaN loss when training LSTM-Attention
During model training, the loss value suddenly became Nan. Even though I change the parameters a lot, it still failed.
I checked the error when training and it prints the error in the output, not the ...
0
votes
1
answer
84
views
SBERT Fine-tuning always stops before finish all epochs
I'm working on a project using the SBERT pre-trained models (specifically MiniLM) for a text classification project with 995 classifications. I am following the steps laid out here for the most part ...
0
votes
0
answers
69
views
Transformer Model Repeating Same Codon During Inference Despite High Training Accuracy
I'm working on a transformer-based model to translate amino acids to codons. During training and validation, my model achieves 95-98% accuracy. However, during inference, I encounter an issue where ...
0
votes
1
answer
534
views
ImportError and TypeError Issues in Nougat OCR with BARTDecoder and cached_property
I'm facing issues while running an OCR process using Nougat with two different errors for two different users. The errors are related to importing cached_property and an unexpected keyword argument ...
0
votes
0
answers
55
views
Text to Openpose and Weird RNN bugs
I want to create AI that generate openpose from textual description for example if input "a man running" output would be like the image I provided Is there any model architecture recommend ...
1
vote
2
answers
882
views
Why do I get different embeddings when I perform batch encoding in huggingface MT5 model?
I am trying to encode some text using HuggingFace's mt5-base model. I am using the model as shown below
from transformers import MT5EncoderModel, AutoTokenizer
model = MT5EncoderModel.from_pretrained(...
0
votes
1
answer
127
views
Why token embedding different from the embedding by the BartForConditionalGeneration model
Why both the embeddings are different even when i generate them using same BartForConditionalGenration model?
First embedding is generated by combining token embedding and positional embedding from
...
0
votes
1
answer
359
views
How to convert Spacy Model .pkl file into .pt/.pth pytorch supported format
I have spacy model which I am using for inference in .pkl format. The datatype of .pkl file is <class 'spacy.lang.en.English'>. I want to make inference script run on GPU. I tried using ...
0
votes
2
answers
297
views
Create a multilingual chatbot
I created a chatbot using PyTorch an I want to make it support the French language. Note that I want to train the chatbot so that it can respond to technical questions.
One of the things that came to ...
0
votes
1
answer
1k
views
OutOfMemoryError: CUDA out of memory in LLM
I have a list of texts and I need to send each text to large language model(llama2-7b). However I am getting CUDA out of memory error. I am running on A100 on Google Colab. Here is my try:
path = &...
2
votes
1
answer
2k
views
How does one reinitialize the weights of a Hugging Face LLaMA v2 model the official way as the original model?
I want to reinitialize the weights of a LLaMA v2 model I'm using/downloading. I went through all the documentation and the source code from their Hugging Face code:
https://github.com/huggingface/...
2
votes
2
answers
2k
views
How to get perplexity per token rather than average perplexity?
I can get the perplexity of a whole sentence from here:
device = "cuda"
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
device = "cuda"
model_id = "gpt2"
...
0
votes
1
answer
180
views
How does an instance of pytorch's `nn.Linear()` process a tuple of tensors?
In the annotated transformer's implementation of multi-head attention, three tensors (query, key, value) are all passed to a nn.Linear(d_model, d_model):
# some class definition ...
self.linears = ...