Skip to main content

All Questions

Tagged with
-2 votes
0 answers
36 views

For training a neural network, if i have label encoded my feature, is there need to scale it or normalize it?

I am working on a project which predict customer satisfaction score. I have several categorical features. One feature has 3 unique value while some has 59 and 1600 unique values. My question is can I ...
Arpit shourya's user avatar
-1 votes
1 answer
43 views

Unsupervised Topic Modeling for Short Event Descriptions

I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...
Arthur GONAY's user avatar
0 votes
0 answers
36 views

No attention output in jinaai/jina-embeddings-v3 embedding model

When I use this model like so - from transformers import AutoModel, AutoTokenizer model_id = "jinaai/jina-embeddings-v3" tokenizer = AutoTokenizer.from_pretrained(model_id, ...
Yash Mali's user avatar
2 votes
1 answer
81 views

How to Identify Similar Code Parts Using CodeBERT Embeddings?

I'm using CodeBERT to compare how similar two pieces of code are. For example: # Code 1 def calculate_area(radius): return 3.14 * radius * radius # Code 2 def compute_circle_area(r): return 3.14159 * ...
Nep's user avatar
  • 21
0 votes
0 answers
40 views

MiniBatchKMeans BERTopic not returning topics for half of data

I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
Matthieu B's user avatar
0 votes
1 answer
30 views

QuickUMLS Always Returns "UNK" for Any Input Text

I am using QuickUMLS to extract UMLS Concept Unique Identifiers (CUIs) from text, but no matter what word I input, it always returns "UNK". Here is my code: from quickumls import QuickUMLS ...
mubashir ali's user avatar
0 votes
0 answers
40 views

How do I include a custom component in a spaCy training pipeline using the CLI?

I'm trying to implement a simple custom component in my spaCy training pipeline. I'm using the spaCy CLI for training, which means I'm directing the pipeline configuration through the config.cfg file, ...
Dumas.DED's user avatar
  • 626
1 vote
1 answer
54 views

How to correctly identify entity types for tokens using spaCy using python?

I'm using spaCy to extract and identify entity types (like ORG, GPE, DATE, etc.) from a text description. However, I am noticing some incorrect results, and I'm unsure how to fix this. Here is the ...
PrakashT's user avatar
  • 901
2 votes
1 answer
93 views

Error in getting Captum text explanations for text classification

I have the following code that I am using to identify the most influential words used to correctly predict the text in the test dataset import pandas as pd import torch from torch.utils.data import ...
Nayantara Jeyaraj's user avatar
0 votes
0 answers
49 views

how can i fuse embeddings in a manner such that it increase efficiency and score?

I've been working on a problem where the goal is to supplement traditional embeddings with LLM-generated embeddings (I'm using the last_hidden_state for this purpose). So far, I've tried simply ...
lazytux's user avatar
  • 307
1 vote
1 answer
107 views

How to extract specific entities from unstructured text

Given a generic text sentence (in a specific context) how can I extract word/entities of interest belonging to a specific "category" using python and any NLP library? For example given a ...
Riccardo Raffini's user avatar
0 votes
0 answers
64 views

Memory increasing after hugging face generate method

I wanted to make an inference with codegemma model from huggingface, but when I use model.generate(**inputs) method GPU memory cost increases from 39 GB to 49 GB in peak usage with torch profiler no ...
user avatar
0 votes
0 answers
26 views

BERTopic partial_fit placeholder cluster representation

I have approximately 2M text documents, each small, and want to cluster them. (This will grow to about 500M eventually.) While I am open to suggestions, I am currently using on-line techniques with ...
mrbarret's user avatar
0 votes
1 answer
336 views

Is it possible to get embeddings from NV-Embed using Candle?

What I want to do is a CLI program that outputs embeddings of an arbitrary input. To do that, I want to do an inference with an embeddings model, and I chose NV-Embed-v2. My framework of choice is ...
Zomagk's user avatar
  • 325
0 votes
0 answers
75 views

Integrate Python Model to Power Bi

In power BI I have to create a NLP based chatbot kind of thing, like we have QnA visual in power bi which can answer the question asked in Natural Language Processing but it due to some of it's ...
Santosh Pal's user avatar

15 30 50 per page
1
2 3 4 5
175