All Questions
Tagged with topic-modeling python
347 questions
0
votes
0
answers
40
views
MiniBatchKMeans BERTopic not returning topics for half of data
I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
0
votes
1
answer
84
views
Correct topics from LDA Sequence Model in Gensim
Python's Gensim package offers a dynamic topic model called LdaSeqModel(). I have run into the same problem as in this issue from the Gensim mailing list (which has not been solved). The problem is ...
1
vote
1
answer
64
views
Inspect all probabilities of BERTopic model
Say I build a BERTopic model using
from bertopic import BERTopic
topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20)
topics, probs = topic_model.fit_transform(docs)
Inspecting probs gives me ...
0
votes
0
answers
49
views
Topic modelling outputs are gender biased?
Has anyone had this issue?
My topic modelling seems to be presenting responses that are very dominated by male respondents.
The volume of responses across three different questions is over 800 in each ...
0
votes
1
answer
51
views
Stopwords problem in text data preprocessing in Python
I want to do topic modeling in Python. For this reason, I used my own stop word list, a stop word list I found on GitHub, and nltk's stop word list to clean the stopwords. However, when I examined the ...
4
votes
1
answer
343
views
Topic modelling many documents with low memory overhead
I've been working on a topic modelling project using BERTopic 0.16.3, and the preliminary results were promising. However, as the project progressed and the requirements became apparent, I ran into a ...
2
votes
3
answers
86
views
Find matching rows in dataframes based on number of matching items
I have two topic models, topics1 and topics2. They were created from very similar but different datasets. As a result, the words representing each topic/cluster as well as the topic numbers will be ...
0
votes
0
answers
117
views
Applying Representation Model on Non-Leaf Nodes in Hierarchical Topics with BERTopic
I am currently using BERTopic for topic modeling on a set of documents and have integrated an OpenAI model as the representation layer, as outlined in the BERTopic documentation. I am also interested ...
2
votes
0
answers
44
views
Top2Vec model gets stuck on Colab
I'm trying to implement Top2Vec on Colab. The following code is working fine with the dataset "https://raw.githubusercontent.com/wjbmattingly/bap_sent_embedding/main/data/vol7.json" ...
0
votes
1
answer
159
views
topic modeling from quotes
Based on the folloiwng link : quotes
with help of following code(this site was based on javascript, so first i have disabled it)
import selenium
from selenium import webdriver
from selenium....
2
votes
1
answer
943
views
BERTopic: "Make sure that the iterable only contains strings"
I'm still fairly new to Python so this might be easier than it appears to me, but I'm stuck. I'm trying to use BERTopic and visualize the results with PyLDAVis. I want to compare the results with the ...
1
vote
1
answer
1k
views
Summarization and Topic Extraction with LLMs (private) and LangChain or LlamaIndex using flan-t5-small
has anyone used Langchain or LlamaIndex imports to deal with single documents that amount to >512 tokens? Yes, I know there are other approaches to dealing with it, but it is difficult to find ...
2
votes
1
answer
119
views
BERTopic: add legend to term score decline
I plot the term score decline for a topic model I created on Google Colab with BERTopic. Great function. Works neat! But I need to add a legend. This parameter is not specified in the topic_model....
-1
votes
1
answer
1k
views
Bert topic clasiffying over a quarter of documents in outlier topic -1
I am running Bert topic with default options
import pandas as pd
from sentence_transformers import SentenceTransformer
import time
import pickle
from bertopic import BERTopic
llm_mod = "all-...
1
vote
0
answers
199
views
BERTopic Visualization in dark
I want to change the default visualizations within BERTopic to display a dark theme rather than a white or bright theme.
Basically I'm trying to do:
import plotly.io as pio
pio.templates.default ...