Newest 'nlp' Questions - Stack Overflow

0 votes

0 answers

15 views

stuck in reducing docker image size

I'm working on a text summarization logic using Google's t5-small model. My implementation relies on the torch and transformers libraries. However, the problem is that when I build the Docker image, ...

Sarvesh

15

asked 1 hour ago

0 votes

0 answers

15 views

import gensim binary incompatibility

import gensim import numpy import scipy print("gensim version:", gensim.__version__) print("numpy version:", numpy.__version__) print("scipy version:", scipy.__version__) ...

Nguyễn Anh Minh

1

asked 3 hours ago

-2 votes

0 answers

35 views

For training a neural network, if i have label encoded my feature, is there need to scale it or normalize it?

I am working on a project which predict customer satisfaction score. I have several categorical features. One feature has 3 unique value while some has 59 and 1600 unique values. My question is can I ...

Arpit shourya

1

asked yesterday

3 votes

0 answers

30 views

Can older spaCy models be ported to future spaCy versions?

The latest spaCy versions have better performance and compatibility for GPU acceleration on Apple devices, but I have an existing project that depends on spaCy 3.1.4 and some of the specific behavior ...

synchronizer

2,105

asked Apr 20 at 19:10

-1 votes

1 answer

42 views

Unsupervised Topic Modeling for Short Event Descriptions

I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...

Arthur GONAY

9

asked Apr 16 at 11:17

1 vote

0 answers

21 views

Is there a way to reuse a heavy service across tasks in Airflow?

I'm building an Airflow DAG where some of the steps should do ML/NLP processing. I have a service class that loads NLP model in constructor. E.g.: class SentenceService: def __init__(self, model: ...

LordMsz

304

asked Apr 14 at 21:04

3 votes

2 answers

666 views

NameError: name 'init_empty_weights' is not defined while using hugging face models

I am trying to set up hugging face locally and im running into this issue. NameError: name 'init_empty_weights' is not defined Here is the code I have tested my installation with from transformers ...

cosm1c v1bes

105

asked Apr 7 at 11:02

0 votes

0 answers

55 views

Sentencepiece not generating models after preprocessing (SOLVED)

So this is the log that I see on the terminal: sentencepiece_trainer.cc(78) LOG(INFO) Starts training with : trainer_spec { input: C:\Users\xxxx\OneDrive\Documents\Projects\py\xxxxx\data\...

Crazy Programmer

9

asked Apr 5 at 18:21

0 votes

0 answers

36 views

No attention output in jinaai/jina-embeddings-v3 embedding model

When I use this model like so - from transformers import AutoModel, AutoTokenizer model_id = "jinaai/jina-embeddings-v3" tokenizer = AutoTokenizer.from_pretrained(model_id, ...

Yash Mali

1

asked Apr 5 at 17:29

0 votes

1 answer

105 views

Why does Presidio with spacy nlp engine not recognize organizations and PESEL while spaCy does?

I'm using spaCy with the pl_core_news_lg model to extract named entities from Polish text. It correctly detects both organizations (ORG) and people's names (PER): import spacy nlp = spacy.load("...

Maltion

79

asked Apr 2 at 5:56

0 votes

1 answer

57 views

GPT-2 and other models from huggingface -100 label index for training, instead of pad token [closed]

I understand the -100 label id is used so that the predictions for these are not included when calculating the loss. However on huggingface, they state "complicated list comprehension here ...

jacqui_suis

81

asked Apr 1 at 9:21

0 votes

0 answers

28 views

simpler gmail Filter syntax for "word family" [verif +(y/ied/ification] + similar loanwords [term +(s/es/a)]?

Is there simpler filter that I can use for below cases? Google has a very smart AI gemini, I hope there is a shortcut for this as I am receiving bilingual emails and loan words in Malay/Indonesia are ...

Quarky

13

asked Mar 31 at 12:10

0 votes

1 answer

74 views

Creating regular expression(s) which finds capitalization errors

This is a Sentence which contains Some capitalization errors. So far I have this: (?<![.!?]\s)(?<!^)(?<!\sI\s)(?!I['’][a-z])(?!\b(?:Dr|Mr|Mrs)\.[\s\r\n])\b(?!I\b)[A-Z]\w* It will find "...

Stan Duncan

35

asked Mar 25 at 10:47

0 votes

1 answer

71 views

SFTTrainer Error : prepare_model_for_kbit_training() got an unexpected keyword argument 'gradient_checkpointing_kwargs'

I'm trying to fine-tune a model using SFTTrainer from trl. This is how my SFTConfig arguments look like, from trl import SFTConfig training_arguments = SFTConfig( output_dir=output_dir, ...

sabira kabeer

11

asked Mar 22 at 16:58

0 votes

0 answers

25 views

AllenNLP all models about ccg_supertagger are unavailable. How to fix or download it?

I am trying to use AllenNLP models to parse a file to create a CCG dataset, because as a student I can't afford the CCGBank dataset, However I have to, cuz I need a dataset to help me to train a model ...

刘睿萌

1

asked Mar 22 at 14:20

Collectives™ on Stack Overflow

stuck in reducing docker image size

import gensim binary incompatibility

For training a neural network, if i have label encoded my feature, is there need to scale it or normalize it?

Can older spaCy models be ported to future spaCy versions?

Unsupervised Topic Modeling for Short Event Descriptions

Is there a way to reuse a heavy service across tasks in Airflow?

NameError: name 'init_empty_weights' is not defined while using hugging face models

Sentencepiece not generating models after preprocessing (SOLVED)

No attention output in jinaai/jina-embeddings-v3 embedding model

Why does Presidio with spacy nlp engine not recognize organizations and PESEL while spaCy does?

GPT-2 and other models from huggingface -100 label index for training, instead of pad token [closed]

simpler gmail Filter syntax for "word family" [verif +(y/ied/ification] + similar loanwords [term +(s/es/a)]?

Creating regular expression(s) which finds capitalization errors

SFTTrainer Error : prepare_model_for_kbit_training() got an unexpected keyword argument 'gradient_checkpointing_kwargs'

AllenNLP all models about ccg_supertagger are unavailable. How to fix or download it?

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags