Skip to main content

All Questions

Tagged with
0 votes
0 answers
40 views

MiniBatchKMeans BERTopic not returning topics for half of data

I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
Matthieu B's user avatar
0 votes
1 answer
84 views

Correct topics from LDA Sequence Model in Gensim

Python's Gensim package offers a dynamic topic model called LdaSeqModel(). I have run into the same problem as in this issue from the Gensim mailing list (which has not been solved). The problem is ...
hyco's user avatar
  • 221
1 vote
1 answer
64 views

Inspect all probabilities of BERTopic model

Say I build a BERTopic model using from bertopic import BERTopic topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20) topics, probs = topic_model.fit_transform(docs) Inspecting probs gives me ...
coolhand's user avatar
  • 2,109
0 votes
0 answers
49 views

Topic modelling outputs are gender biased?

Has anyone had this issue? My topic modelling seems to be presenting responses that are very dominated by male respondents. The volume of responses across three different questions is over 800 in each ...
GrBrn's user avatar
  • 3
0 votes
1 answer
51 views

Stopwords problem in text data preprocessing in Python

I want to do topic modeling in Python. For this reason, I used my own stop word list, a stop word list I found on GitHub, and nltk's stop word list to clean the stopwords. However, when I examined the ...
deniz's user avatar
  • 11
4 votes
1 answer
343 views

Topic modelling many documents with low memory overhead

I've been working on a topic modelling project using BERTopic 0.16.3, and the preliminary results were promising. However, as the project progressed and the requirements became apparent, I ran into a ...
Bbrk24's user avatar
  • 973
2 votes
3 answers
86 views

Find matching rows in dataframes based on number of matching items

I have two topic models, topics1 and topics2. They were created from very similar but different datasets. As a result, the words representing each topic/cluster as well as the topic numbers will be ...
Adam_G's user avatar
  • 7,909
0 votes
0 answers
117 views

Applying Representation Model on Non-Leaf Nodes in Hierarchical Topics with BERTopic

I am currently using BERTopic for topic modeling on a set of documents and have integrated an OpenAI model as the representation layer, as outlined in the BERTopic documentation. I am also interested ...
Alioio's user avatar
  • 1
2 votes
0 answers
44 views

Top2Vec model gets stuck on Colab

I'm trying to implement Top2Vec on Colab. The following code is working fine with the dataset "https://raw.githubusercontent.com/wjbmattingly/bap_sent_embedding/main/data/vol7.json" ...
PS Nayak's user avatar
  • 423
0 votes
1 answer
159 views

topic modeling from quotes

Based on the folloiwng link : quotes with help of following code(this site was based on javascript, so first i have disabled it) import selenium from selenium import webdriver from selenium....
user avatar
2 votes
1 answer
943 views

BERTopic: "Make sure that the iterable only contains strings"

I'm still fairly new to Python so this might be easier than it appears to me, but I'm stuck. I'm trying to use BERTopic and visualize the results with PyLDAVis. I want to compare the results with the ...
Dominik's user avatar
  • 23
1 vote
1 answer
1k views

Summarization and Topic Extraction with LLMs (private) and LangChain or LlamaIndex using flan-t5-small

has anyone used Langchain or LlamaIndex imports to deal with single documents that amount to >512 tokens? Yes, I know there are other approaches to dealing with it, but it is difficult to find ...
Ja4H3ad's user avatar
  • 61
2 votes
1 answer
119 views

BERTopic: add legend to term score decline

I plot the term score decline for a topic model I created on Google Colab with BERTopic. Great function. Works neat! But I need to add a legend. This parameter is not specified in the topic_model....
Simone's user avatar
  • 625
-1 votes
1 answer
1k views

Bert topic clasiffying over a quarter of documents in outlier topic -1

I am running Bert topic with default options import pandas as pd from sentence_transformers import SentenceTransformer import time import pickle from bertopic import BERTopic llm_mod = "all-...
RM-'s user avatar
  • 1,018
1 vote
0 answers
199 views

BERTopic Visualization in dark

I want to change the default visualizations within BERTopic to display a dark theme rather than a white or bright theme. Basically I'm trying to do: import plotly.io as pio pio.templates.default ...
RobjSky's user avatar
  • 31

15 30 50 per page
1
2 3 4 5
24