All Questions
Tagged with machine-learning nlp
2,624 questions
-2
votes
0
answers
36
views
For training a neural network, if i have label encoded my feature, is there need to scale it or normalize it?
I am working on a project which predict customer satisfaction score. I have several categorical features. One feature has 3 unique value while some has 59 and 1600 unique values. My question is can I ...
-1
votes
1
answer
43
views
Unsupervised Topic Modeling for Short Event Descriptions
I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...
0
votes
0
answers
36
views
No attention output in jinaai/jina-embeddings-v3 embedding model
When I use this model like so -
from transformers import AutoModel, AutoTokenizer
model_id = "jinaai/jina-embeddings-v3"
tokenizer = AutoTokenizer.from_pretrained(model_id, ...
2
votes
1
answer
81
views
How to Identify Similar Code Parts Using CodeBERT Embeddings?
I'm using CodeBERT to compare how similar two pieces of code are. For example:
# Code 1
def calculate_area(radius):
return 3.14 * radius * radius
# Code 2
def compute_circle_area(r):
return 3.14159 * ...
0
votes
0
answers
40
views
MiniBatchKMeans BERTopic not returning topics for half of data
I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
0
votes
1
answer
30
views
QuickUMLS Always Returns "UNK" for Any Input Text
I am using QuickUMLS to extract UMLS Concept Unique Identifiers (CUIs) from text, but no matter what word I input, it always returns "UNK". Here is my code:
from quickumls import QuickUMLS
...
0
votes
0
answers
40
views
How do I include a custom component in a spaCy training pipeline using the CLI?
I'm trying to implement a simple custom component in my spaCy training pipeline. I'm using the spaCy CLI for training, which means I'm directing the pipeline configuration through the config.cfg file, ...
1
vote
1
answer
54
views
How to correctly identify entity types for tokens using spaCy using python?
I'm using spaCy to extract and identify entity types (like ORG, GPE, DATE, etc.) from a text description. However, I am noticing some incorrect results, and I'm unsure how to fix this.
Here is the ...
2
votes
1
answer
93
views
Error in getting Captum text explanations for text classification
I have the following code that I am using to identify the most influential words used to correctly predict the text in the test dataset
import pandas as pd
import torch
from torch.utils.data import ...
0
votes
0
answers
49
views
how can i fuse embeddings in a manner such that it increase efficiency and score?
I've been working on a problem where the goal is to supplement traditional embeddings with LLM-generated embeddings (I'm using the last_hidden_state for this purpose). So far, I've tried simply ...
1
vote
1
answer
107
views
How to extract specific entities from unstructured text
Given a generic text sentence (in a specific context) how can I extract word/entities of interest belonging to a specific "category" using python and any NLP library?
For example given a ...
0
votes
0
answers
64
views
Memory increasing after hugging face generate method
I wanted to make an inference with codegemma model from huggingface, but when I use model.generate(**inputs) method GPU memory cost increases from 39 GB to 49 GB in peak usage with torch profiler no ...
0
votes
0
answers
26
views
BERTopic partial_fit placeholder cluster representation
I have approximately 2M text documents, each small, and want to cluster them. (This will grow to about 500M eventually.) While I am open to suggestions, I am currently using on-line techniques with ...
0
votes
1
answer
336
views
Is it possible to get embeddings from NV-Embed using Candle?
What I want to do is a CLI program that outputs embeddings of an arbitrary input.
To do that, I want to do an inference with an embeddings model, and I chose NV-Embed-v2. My framework of choice is ...
0
votes
0
answers
75
views
Integrate Python Model to Power Bi
In power BI I have to create a NLP based chatbot kind of thing, like we have QnA visual in power bi which can answer the question asked in Natural Language Processing but it due to some of it's ...