All Questions
1,052 questions
0
votes
0
answers
40
views
MiniBatchKMeans BERTopic not returning topics for half of data
I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
0
votes
1
answer
30
views
QuickUMLS Always Returns "UNK" for Any Input Text
I am using QuickUMLS to extract UMLS Concept Unique Identifiers (CUIs) from text, but no matter what word I input, it always returns "UNK". Here is my code:
from quickumls import QuickUMLS
...
0
votes
0
answers
40
views
How do I include a custom component in a spaCy training pipeline using the CLI?
I'm trying to implement a simple custom component in my spaCy training pipeline. I'm using the spaCy CLI for training, which means I'm directing the pipeline configuration through the config.cfg file, ...
1
vote
1
answer
54
views
How to correctly identify entity types for tokens using spaCy using python?
I'm using spaCy to extract and identify entity types (like ORG, GPE, DATE, etc.) from a text description. However, I am noticing some incorrect results, and I'm unsure how to fix this.
Here is the ...
0
votes
0
answers
49
views
how can i fuse embeddings in a manner such that it increase efficiency and score?
I've been working on a problem where the goal is to supplement traditional embeddings with LLM-generated embeddings (I'm using the last_hidden_state for this purpose). So far, I've tried simply ...
1
vote
1
answer
107
views
How to extract specific entities from unstructured text
Given a generic text sentence (in a specific context) how can I extract word/entities of interest belonging to a specific "category" using python and any NLP library?
For example given a ...
0
votes
0
answers
75
views
Integrate Python Model to Power Bi
In power BI I have to create a NLP based chatbot kind of thing, like we have QnA visual in power bi which can answer the question asked in Natural Language Processing but it due to some of it's ...
0
votes
1
answer
84
views
SBERT Fine-tuning always stops before finish all epochs
I'm working on a project using the SBERT pre-trained models (specifically MiniLM) for a text classification project with 995 classifications. I am following the steps laid out here for the most part ...
0
votes
1
answer
74
views
AutoModelForSequenceClassification loss not decrease
from datasets import load_dataset
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from tqdm import tqdm
def ...
-1
votes
1
answer
42
views
Vector search to get a uniqueness score based on context
I have a single blog post with a title and description, and I want to compare its uniqueness against multiple blog entries in a CSV file. The CSV contains several blogs, each with a title and meta ...
0
votes
0
answers
41
views
is it possible to train NER model on en_core_web_lg without static_vectors?
I am trying to train an NER model with custom tokenization. it works fine with the en_core_web_sm model, but I am trying to increase accuracy so I am now trying with en_core_web_lg. no matter what I ...
0
votes
1
answer
84
views
Layer expects 2 input(s), but it received 1 input tensors
I am trying to build model to predict posts likes, the model takes text and content type which is one hot encoded column.
I have made a TensorFlow dataset but when trying to fit the model I got this ...
0
votes
1
answer
51
views
A language model for machine translation between a low-resource language and Portuguese using Tensorflow
I'm trying to train a language model for machine translation between a low-resource language and Portuguese using Tensorflow. unfortunately, I'm getting the following error:
PS C:\Users\myuser\...
7
votes
2
answers
11k
views
Unable to use nltk functions
I was trying to run some nltk functions on the UCI spam message dataset but ran into this problem of word_tokenize not working even after downloading dependencies.
import nltk
nltk.download('punkt')
...
0
votes
0
answers
68
views
Why Is My Skip-Gram Implementation Producing Incorrect Results?
I'm implementing a Skip-Gram model for Word2Vec using Python. However, my model doesn't seem to be working correctly, as indicated by the resulting embeddings and their visualization. Here is an ...