Newest 'machine-learning+nlp' Questions

-2 votes

0 answers

36 views

For training a neural network, if i have label encoded my feature, is there need to scale it or normalize it?

I am working on a project which predict customer satisfaction score. I have several categorical features. One feature has 3 unique value while some has 59 and 1600 unique values. My question is can I ...

Arpit shourya

1

asked yesterday

-1 votes

1 answer

43 views

Unsupervised Topic Modeling for Short Event Descriptions

I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...

Arthur GONAY

9

asked Apr 16 at 11:17

0 votes

0 answers

36 views

No attention output in jinaai/jina-embeddings-v3 embedding model

When I use this model like so - from transformers import AutoModel, AutoTokenizer model_id = "jinaai/jina-embeddings-v3" tokenizer = AutoTokenizer.from_pretrained(model_id, ...

Yash Mali

1

asked Apr 5 at 17:29

2 votes

1 answer

81 views

How to Identify Similar Code Parts Using CodeBERT Embeddings?

I'm using CodeBERT to compare how similar two pieces of code are. For example: # Code 1 def calculate_area(radius): return 3.14 * radius * radius # Code 2 def compute_circle_area(r): return 3.14159 * ...

Nep

21

asked Mar 20 at 14:30

0 votes

0 answers

40 views

MiniBatchKMeans BERTopic not returning topics for half of data

I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...

Matthieu B

17

asked Feb 18 at 17:42

0 votes

1 answer

30 views

QuickUMLS Always Returns "UNK" for Any Input Text

I am using QuickUMLS to extract UMLS Concept Unique Identifiers (CUIs) from text, but no matter what word I input, it always returns "UNK". Here is my code: from quickumls import QuickUMLS ...

mubashir ali

1

asked Feb 2 at 14:12

0 votes

0 answers

40 views

How do I include a custom component in a spaCy training pipeline using the CLI?

I'm trying to implement a simple custom component in my spaCy training pipeline. I'm using the spaCy CLI for training, which means I'm directing the pipeline configuration through the config.cfg file, ...

Dumas.DED

626

asked Jan 11 at 20:14

1 vote

1 answer

54 views

How to correctly identify entity types for tokens using spaCy using python?

I'm using spaCy to extract and identify entity types (like ORG, GPE, DATE, etc.) from a text description. However, I am noticing some incorrect results, and I'm unsure how to fix this. Here is the ...

PrakashT

901

asked Dec 17, 2024 at 12:09

2 votes

1 answer

93 views

Error in getting Captum text explanations for text classification

I have the following code that I am using to identify the most influential words used to correctly predict the text in the test dataset import pandas as pd import torch from torch.utils.data import ...

Nayantara Jeyaraj

2,706

asked Dec 3, 2024 at 12:47

0 votes

0 answers

49 views

how can i fuse embeddings in a manner such that it increase efficiency and score?

I've been working on a problem where the goal is to supplement traditional embeddings with LLM-generated embeddings (I'm using the last_hidden_state for this purpose). So far, I've tried simply ...

lazytux

307

asked Nov 28, 2024 at 13:01

1 vote

1 answer

107 views

How to extract specific entities from unstructured text

Given a generic text sentence (in a specific context) how can I extract word/entities of interest belonging to a specific "category" using python and any NLP library? For example given a ...

Riccardo Raffini

396

asked Nov 26, 2024 at 15:46

0 votes

0 answers

64 views

Memory increasing after hugging face generate method

I wanted to make an inference with codegemma model from huggingface, but when I use model.generate(**inputs) method GPU memory cost increases from 39 GB to 49 GB in peak usage with torch profiler no ...

user17751265

asked Nov 23, 2024 at 19:21

0 votes

0 answers

26 views

BERTopic partial_fit placeholder cluster representation

I have approximately 2M text documents, each small, and want to cluster them. (This will grow to about 500M eventually.) While I am open to suggestions, I am currently using on-line techniques with ...

mrbarret

47

asked Nov 16, 2024 at 12:50

0 votes

1 answer

336 views

Is it possible to get embeddings from NV-Embed using Candle?

What I want to do is a CLI program that outputs embeddings of an arbitrary input. To do that, I want to do an inference with an embeddings model, and I chose NV-Embed-v2. My framework of choice is ...

Zomagk

325

asked Oct 31, 2024 at 15:55

0 votes

0 answers

75 views

Integrate Python Model to Power Bi

In power BI I have to create a NLP based chatbot kind of thing, like we have QnA visual in power bi which can answer the question asked in Natural Language Processing but it due to some of it's ...

Santosh Pal

1

asked Oct 24, 2024 at 16:24

Collectives™ on Stack Overflow

All Questions

For training a neural network, if i have label encoded my feature, is there need to scale it or normalize it?

Unsupervised Topic Modeling for Short Event Descriptions

No attention output in jinaai/jina-embeddings-v3 embedding model

How to Identify Similar Code Parts Using CodeBERT Embeddings?

MiniBatchKMeans BERTopic not returning topics for half of data

QuickUMLS Always Returns "UNK" for Any Input Text

How do I include a custom component in a spaCy training pipeline using the CLI?

How to correctly identify entity types for tokens using spaCy using python?

Error in getting Captum text explanations for text classification

how can i fuse embeddings in a manner such that it increase efficiency and score?

How to extract specific entities from unstructured text

Memory increasing after hugging face generate method

BERTopic partial_fit placeholder cluster representation

Is it possible to get embeddings from NV-Embed using Candle?

Integrate Python Model to Power Bi

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags