All Questions
23 questions
0
votes
0
answers
21
views
Why it prints the content automatically when I using bert tokenizer?
class BertEncoder:
def init(self):
self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
self.model = BertModel.from_pretrained('bert-base-uncased')
self.device = torch.device("cuda:...
0
votes
1
answer
203
views
Map BERT token indices to Spacy token indices
I’m trying to make Bert’s (bert-base-uncased) tokenization token indices (not ids, token indices) map to Spacy’s tokenization token indices. In the following example, my approach doesn’t work becos ...
1
vote
1
answer
35
views
ber-base-uncase does not use newly added suffix token
I want to add custom tokens to the BertTokenizer. However, the model does not use the new token.
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-...
0
votes
1
answer
4k
views
Loading local tokenizer
I'm trying to load a local tokenizer using;
from transformers import RobertaTokenizerFast
tokenizer = RobertaTokenizerFast.from_pretrained(r'file path\tokenizer')
however, this gives me the ...
0
votes
0
answers
55
views
Value Error when using add_tokens, 'the truth value of an array with more than one element is ambiguous'
I'm trying to improve a basic BERT, pretrained tokenizer model. Im adding new tokens using add_tokens, but running into issues with the built in method.
Namely:
ValueError ...
2
votes
1
answer
195
views
bert_vocab.bert_vocab_from_dataset returning wrong vocabulary [closed]
i'm trying to build a tokenizer following the tf's tutorial https://www.tensorflow.org/text/guide/subwords_tokenizer. I'm basically doing the same thing only with a different dataset. The dataset in ...
1
vote
0
answers
183
views
How to obtain the [CLS] sentence embedding of multiple sentences successively without facing a RAM crash?
I would like to obtain the [CLS] token's sentence embedding (as it represents the whole sentence's meaning) using BERT. I have many sentences (about 40) that belong to a Document, and 246 such ...
1
vote
1
answer
635
views
NER Classification Deberta Tokenizer error : You need to instantiate DebertaTokenizerFast
I'm trying to perform a NER Classification task using Deberta, but I'm stacked with a Tokenizer error. This is my code (my input sentence must be splitted word by word by ",:):
from transformers ...
0
votes
1
answer
305
views
How to get access to tokenzier after loading a saved custom BERT model using Keras and TF2?
I am working on Intent classification problem and need your help.
I fine-tuned one of the BERT model for text classification. Trained and evaluated it on a small dataset for detecting five intents. I ...
0
votes
1
answer
2k
views
How to replace BERT tokenizer special tokens
I am using an AutoTokenizer --> tokenizer1 = AutoTokenizer.from_pretrained("vinai/bertweet-base", normalization=True) which is more complete than the tokenizer of bert-base-uncased. The ...
1
vote
1
answer
2k
views
How to preprocess a dataset for BERT model implemented in Tensorflow 2.x?
Overview
I have a dataset made for classification problem. There are two columns one is sentences and the other is labels (total: 10 labels). I'm trying to convert this dataset to implement it in a ...
2
votes
2
answers
6k
views
"Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers." ValueError: Input is not valid
I am using Bert tokenizer for french and I am getting this error but I do not seems to solutionated it. If you have a suggestion.
Traceback (most recent call last):
File "training_cross_data_2....
2
votes
0
answers
800
views
UnparsedFlagAccessError: Trying to access flag --preserve_unused_tokens before flags were parsed
Hello I am a beginner in ML. I tried to use BERT and tokenizer didn't work like below.
train_input = bert_encode(train.text.values, tokenizer, max_len=160)
test_input = bert_encode(test.text.values, ...
0
votes
1
answer
2k
views
Split a sentence by words just as BERT Tokenizer would do?
I'm trying to localize all the [UNK] tokens of BERT tokenizer on my text. Once I have the position of the UNK token, I need to identify what word it belongs to. For that, I tried to get the position ...
7
votes
2
answers
10k
views
How to untokenize BERT tokens?
I have a sentence and I need to return the text corresponding to N BERT tokens to the left and right of a specific word.
from transformers import BertTokenizer
tz = BertTokenizer.from_pretrained("...