Skip to main content

All Questions

0 votes
0 answers
40 views

MiniBatchKMeans BERTopic not returning topics for half of data

I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
Matthieu B's user avatar
0 votes
1 answer
30 views

QuickUMLS Always Returns "UNK" for Any Input Text

I am using QuickUMLS to extract UMLS Concept Unique Identifiers (CUIs) from text, but no matter what word I input, it always returns "UNK". Here is my code: from quickumls import QuickUMLS ...
mubashir ali's user avatar
0 votes
0 answers
40 views

How do I include a custom component in a spaCy training pipeline using the CLI?

I'm trying to implement a simple custom component in my spaCy training pipeline. I'm using the spaCy CLI for training, which means I'm directing the pipeline configuration through the config.cfg file, ...
Dumas.DED's user avatar
  • 626
1 vote
1 answer
54 views

How to correctly identify entity types for tokens using spaCy using python?

I'm using spaCy to extract and identify entity types (like ORG, GPE, DATE, etc.) from a text description. However, I am noticing some incorrect results, and I'm unsure how to fix this. Here is the ...
PrakashT's user avatar
  • 901
0 votes
0 answers
49 views

how can i fuse embeddings in a manner such that it increase efficiency and score?

I've been working on a problem where the goal is to supplement traditional embeddings with LLM-generated embeddings (I'm using the last_hidden_state for this purpose). So far, I've tried simply ...
lazytux's user avatar
  • 307
1 vote
1 answer
107 views

How to extract specific entities from unstructured text

Given a generic text sentence (in a specific context) how can I extract word/entities of interest belonging to a specific "category" using python and any NLP library? For example given a ...
Riccardo Raffini's user avatar
0 votes
0 answers
75 views

Integrate Python Model to Power Bi

In power BI I have to create a NLP based chatbot kind of thing, like we have QnA visual in power bi which can answer the question asked in Natural Language Processing but it due to some of it's ...
Santosh Pal's user avatar
0 votes
1 answer
84 views

SBERT Fine-tuning always stops before finish all epochs

I'm working on a project using the SBERT pre-trained models (specifically MiniLM) for a text classification project with 995 classifications. I am following the steps laid out here for the most part ...
SohmOuse's user avatar
0 votes
1 answer
74 views

AutoModelForSequenceClassification loss not decrease

from datasets import load_dataset from torch.utils.data import DataLoader from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch from tqdm import tqdm def ...
naivebird's user avatar
-1 votes
1 answer
42 views

Vector search to get a uniqueness score based on context

I have a single blog post with a title and description, and I want to compare its uniqueness against multiple blog entries in a CSV file. The CSV contains several blogs, each with a title and meta ...
Muhammad Hamd's user avatar
0 votes
0 answers
41 views

is it possible to train NER model on en_core_web_lg without static_vectors?

I am trying to train an NER model with custom tokenization. it works fine with the en_core_web_sm model, but I am trying to increase accuracy so I am now trying with en_core_web_lg. no matter what I ...
Nicholas Talbot's user avatar
0 votes
1 answer
84 views

Layer expects 2 input(s), but it received 1 input tensors

I am trying to build model to predict posts likes, the model takes text and content type which is one hot encoded column. I have made a TensorFlow dataset but when trying to fit the model I got this ...
Abdulaziz Snobrah's user avatar
0 votes
1 answer
51 views

A language model for machine translation between a low-resource language and Portuguese using Tensorflow

I'm trying to train a language model for machine translation between a low-resource language and Portuguese using Tensorflow. unfortunately, I'm getting the following error: PS C:\Users\myuser\...
kim85's user avatar
  • 1
7 votes
2 answers
11k views

Unable to use nltk functions

I was trying to run some nltk functions on the UCI spam message dataset but ran into this problem of word_tokenize not working even after downloading dependencies. import nltk nltk.download('punkt') ...
Utsav Jana's user avatar
0 votes
0 answers
68 views

Why Is My Skip-Gram Implementation Producing Incorrect Results?

I'm implementing a Skip-Gram model for Word2Vec using Python. However, my model doesn't seem to be working correctly, as indicated by the resulting embeddings and their visualization. Here is an ...
Mohan's user avatar
  • 1

15 30 50 per page
1
2 3 4 5
71