All Questions
24 questions
0
votes
1
answer
116
views
Python Bert is failing at preprocessed_inputs = preprocessor(text_input). A KerasTensor is symbolic: it's a placeholder for a shape an dtype
I'm tying to load bert model and text_input is printing
<KerasTensor shape=(None,), dtype=string, sparse=None, name=keras_tensor_54>
and preprocessor is also loading but still getting this error....
1
vote
0
answers
437
views
What if I have too many documents labelled in -1 cluster in bertopic?
I'm generating topics using bertopic on multilingual dataset (mainly Russian and English). I'm reducing the number of topics to 140. After generating topics, I'm analyzing its quality using the ...
1
vote
0
answers
219
views
Classification report in multi label
I try to use BERT for multi-label tasks. My data set has 1000 data. I first use train_test_split to use 80% of my data set as a training set and 20% as a verification set. It is reasonable to say that ...
0
votes
1
answer
130
views
CUDA batch out of memory
I have a small dataset and running a script called LightXML which is on a git:https://github.com/kongds/LightXML
I am getting this error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to ...
2
votes
0
answers
602
views
Using RoBERTa model with transformers-interpret library
I've been trying to use transformers-interpret library and have been successful in getting the results for facebook's BART model, but not for the RoBERTa.
My code goes as follows for the BART model :
...
-1
votes
1
answer
535
views
Low F1-Score after balancing dataset
I have a binary classification problem with tweets; 17000 as a positive class and 122000 as a negative class. I have balanced the data with both as 17000 tweets in each class. I have implemented ...
1
vote
1
answer
890
views
How to read BertForMaskedLM with BertModel?
I have fine-tuned BertForMaskedLM and now I want to read it with BertModel. But my saved model looks like this:
BertForMaskedLM(
(bert): BertModel(
(embeddings): BertEmbeddings(
(...
8
votes
1
answer
9k
views
How to add index to python FAISS incrementally
I am using Faiss to index my huge dataset embeddings, embedding generated from bert model. I want to add the embeddings incrementally, it is working fine if I only add it with faiss.IndexFlatL2 , but ...
1
vote
1
answer
954
views
AI Based Deduplication using Textual Similarity Measure in Python
Given I have a dataframe that contains rows like this
ID
Title
Abstract
Keywords
Author
Year
5875
Textual Similarity: A Review
Textual Similarity has been used for measuring ...
X, Y, Z
James Thomas
...
3
votes
1
answer
4k
views
How to combine embeddins vectors of bert with other features?
I am working on a classification task with 3 labels (0,1,2 = neg, pos, neu). Data are sentences. So to produce vectors/embeddings of sentences, I use a Bert encoder to get embeddings for each sentence ...
1
vote
2
answers
2k
views
How can I train an XGBoost with a generator?
I'm attempting to stack a BERT tensorflow model with and XGBoost model in python. To do this, I have trained the BERT model and and have a generator that takes the predicitons from BERT (which ...
0
votes
1
answer
2k
views
Cosine similarity between columns of two different DataFrame
I wanted to compute the cosine similarity between two DataFrame(for a different sizes) and store the result in the new data. The similarity is calculated using BERT embeddings
df1
title
Lorem ipsum ...
0
votes
1
answer
1k
views
HuggingFace SciBert AutoModelForMaskedLM cannot be imported
I am trying to use the pretrained SciBERT model (https://huggingface.co/allenai/scibert_scivocab_uncased) from Huggingface to evaluate masked words in scientific/biomedical text for bias using CrowS-...
0
votes
0
answers
192
views
How to use BigBirdModel to create a neural network in Python?
I am trying to create a network with tenserflow and BigBird.
from transformers import BigBirdModel
import tensorflow as tf
classic_model = BigBirdModel.from_pretrained('google/bigbird-roberta-base')
...
2
votes
2
answers
6k
views
"Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers." ValueError: Input is not valid
I am using Bert tokenizer for french and I am getting this error but I do not seems to solutionated it. If you have a suggestion.
Traceback (most recent call last):
File "training_cross_data_2....