Newest 'bert-language-model+python+tokenize' Questions

0 votes

0 answers

21 views

Why it prints the content automatically when I using bert tokenizer?

class BertEncoder: def init(self): self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') self.model = BertModel.from_pretrained('bert-base-uncased') self.device = torch.device("cuda:...

dylan xie

1

asked May 27, 2024 at 16:44

0 votes

1 answer

203 views

Map BERT token indices to Spacy token indices

I’m trying to make Bert’s (bert-base-uncased) tokenization token indices (not ids, token indices) map to Spacy’s tokenization token indices. In the following example, my approach doesn’t work becos ...

lrthistlethwaite

546

asked Oct 25, 2023 at 13:58

1 vote

1 answer

35 views

ber-base-uncase does not use newly added suffix token

I want to add custom tokens to the BertTokenizer. However, the model does not use the new token. from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained("bert-base-...

Lulacca

13

asked Jul 14, 2023 at 13:53

0 votes

1 answer

4k views

Loading local tokenizer

I'm trying to load a local tokenizer using; from transformers import RobertaTokenizerFast tokenizer = RobertaTokenizerFast.from_pretrained(r'file path\tokenizer') however, this gives me the ...

Jon

91

asked Jun 3, 2023 at 8:36

0 votes

0 answers

55 views

Value Error when using add_tokens, 'the truth value of an array with more than one element is ambiguous'

I'm trying to improve a basic BERT, pretrained tokenizer model. Im adding new tokens using add_tokens, but running into issues with the built in method. Namely: ValueError ...

Manny

35

asked Apr 27, 2023 at 11:40

2 votes

1 answer

195 views

bert_vocab.bert_vocab_from_dataset returning wrong vocabulary [closed]

i'm trying to build a tokenizer following the tf's tutorial https://www.tensorflow.org/text/guide/subwords_tokenizer. I'm basically doing the same thing only with a different dataset. The dataset in ...

Niccolò Tiezzi

183

asked Apr 8, 2023 at 10:30

1 vote

0 answers

183 views

How to obtain the [CLS] sentence embedding of multiple sentences successively without facing a RAM crash?

I would like to obtain the [CLS] token's sentence embedding (as it represents the whole sentence's meaning) using BERT. I have many sentences (about 40) that belong to a Document, and 246 such ...

Aadithya Seshadri

21

asked Dec 4, 2022 at 4:09

1 vote

1 answer

635 views

NER Classification Deberta Tokenizer error : You need to instantiate DebertaTokenizerFast

I'm trying to perform a NER Classification task using Deberta, but I'm stacked with a Tokenizer error. This is my code (my input sentence must be splitted word by word by ",:): from transformers ...

Chiara

510

asked Jan 21, 2022 at 9:42

0 votes

1 answer

305 views

How to get access to tokenzier after loading a saved custom BERT model using Keras and TF2?

I am working on Intent classification problem and need your help. I fine-tuned one of the BERT model for text classification. Trained and evaluated it on a small dataset for detecting five intents. I ...

Rohit

7,189

asked Dec 2, 2021 at 12:00

0 votes

1 answer

2k views

How to replace BERT tokenizer special tokens

I am using an AutoTokenizer --> tokenizer1 = AutoTokenizer.from_pretrained("vinai/bertweet-base", normalization=True) which is more complete than the tokenizer of bert-base-uncased. The ...

javafest

1

asked Oct 27, 2021 at 23:02

1 vote

1 answer

2k views

How to preprocess a dataset for BERT model implemented in Tensorflow 2.x?

Overview I have a dataset made for classification problem. There are two columns one is sentences and the other is labels (total: 10 labels). I'm trying to convert this dataset to implement it in a ...

Y4RD13

984

asked May 8, 2021 at 20:52

2 votes

2 answers

6k views

"Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers." ValueError: Input is not valid

I am using Bert tokenizer for french and I am getting this error but I do not seems to solutionated it. If you have a suggestion. Traceback (most recent call last): File "training_cross_data_2....

emma

363

asked May 6, 2021 at 13:15

2 votes

0 answers

800 views

UnparsedFlagAccessError: Trying to access flag --preserve_unused_tokens before flags were parsed

Hello I am a beginner in ML. I tried to use BERT and tokenizer didn't work like below. train_input = bert_encode(train.text.values, tokenizer, max_len=160) test_input = bert_encode(test.text.values, ...

Tony

21

asked Apr 7, 2021 at 13:32

0 votes

1 answer

2k views

Split a sentence by words just as BERT Tokenizer would do?

I'm trying to localize all the [UNK] tokens of BERT tokenizer on my text. Once I have the position of the UNK token, I need to identify what word it belongs to. For that, I tried to get the position ...

Andrea NR

1,747

asked Feb 22, 2021 at 12:41

7 votes

2 answers

10k views

How to untokenize BERT tokens?

I have a sentence and I need to return the text corresponding to N BERT tokens to the left and right of a specific word. from transformers import BertTokenizer tz = BertTokenizer.from_pretrained("...

JayJay

203

asked Feb 16, 2021 at 22:14

Collectives™ on Stack Overflow

All Questions

Why it prints the content automatically when I using bert tokenizer?

Map BERT token indices to Spacy token indices

ber-base-uncase does not use newly added suffix token

Loading local tokenizer

Value Error when using add_tokens, 'the truth value of an array with more than one element is ambiguous'

bert_vocab.bert_vocab_from_dataset returning wrong vocabulary [closed]

How to obtain the [CLS] sentence embedding of multiple sentences successively without facing a RAM crash?

NER Classification Deberta Tokenizer error : You need to instantiate DebertaTokenizerFast

How to get access to tokenzier after loading a saved custom BERT model using Keras and TF2?

How to replace BERT tokenizer special tokens

How to preprocess a dataset for BERT model implemented in Tensorflow 2.x?

"Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers." ValueError: Input is not valid

UnparsedFlagAccessError: Trying to access flag --preserve_unused_tokens before flags were parsed

Split a sentence by words just as BERT Tokenizer would do?

How to untokenize BERT tokens?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags