978 questions
-1
votes
1
answer
43
views
Unsupervised Topic Modeling for Short Event Descriptions
I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...
0
votes
0
answers
40
views
MiniBatchKMeans BERTopic not returning topics for half of data
I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
0
votes
0
answers
30
views
Calculating Topic Correlations or Coocurrences for keyATM
I have been playing around with the keyATM package extensively, however unfortunately there is no approach how to calculate topic correlations and cooccurences, once the model is calculated. I already ...
0
votes
1
answer
83
views
Correct topics from LDA Sequence Model in Gensim
Python's Gensim package offers a dynamic topic model called LdaSeqModel(). I have run into the same problem as in this issue from the Gensim mailing list (which has not been solved). The problem is ...
1
vote
1
answer
64
views
Inspect all probabilities of BERTopic model
Say I build a BERTopic model using
from bertopic import BERTopic
topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20)
topics, probs = topic_model.fit_transform(docs)
Inspecting probs gives me ...
0
votes
0
answers
30
views
importing util library failed
i am trying to pip install bertopic command for installing and usng bertopic model, here is my next code :
from bertopic import BERTopic
topic_model = BERTopic.load("MaartenGr/BERTopic_Wikipedia&...
0
votes
0
answers
61
views
Unhashable type when calling HuggingFace topic model `topic_labels_` function
If I try to follow the topic modeling tutorial at: https://huggingface.co/docs/hub/en/bertopic
The first few lines give me an error:
from bertopic import BERTopic
topic_model = BERTopic.load("...
0
votes
0
answers
24
views
PackagesNotFound error even when verified packages as installed
I am trying to follow this tutorial for BERT topic modeling:
https://jpcompartir.github.io/BertopicR/
library(reticulate)
reticulate::install_miniconda()
library(BertopicR)
BertopicR::...
0
votes
0
answers
49
views
Topic modelling outputs are gender biased?
Has anyone had this issue?
My topic modelling seems to be presenting responses that are very dominated by male respondents.
The volume of responses across three different questions is over 800 in each ...
0
votes
1
answer
51
views
Stopwords problem in text data preprocessing in Python
I want to do topic modeling in Python. For this reason, I used my own stop word list, a stop word list I found on GitHub, and nltk's stop word list to clean the stopwords. However, when I examined the ...
0
votes
0
answers
37
views
Cannot find AIC/BIC of my topic modelling after using "lda.collapsed.gibbs.sampler" in LDA package
I have used "lda.collapsed.gibbs.sampler" to do my topic modelling and LDA visualisation, and now I want to determine which number of models (K) best fits my model. Then I tried to use AIC/...
4
votes
1
answer
341
views
Topic modelling many documents with low memory overhead
I've been working on a topic modelling project using BERTopic 0.16.3, and the preliminary results were promising. However, as the project progressed and the requirements became apparent, I ran into a ...
0
votes
1
answer
38
views
How to extract terms and probabilities from tmResult$terms in topic modeling?
I like to create separate word clouds for each of my 8 topics in an LDA model. I extracted top 40 words across 8 topics - an object of length 320 containing top words and occurrence probabilities.
I ...
0
votes
1
answer
88
views
How is coherence score calculated in Mallet?
I do understand how the diagnostics output shows the coherence values for each topic but my values range between -150 and -600 and other posts that I have seen where Mallet was used show coherence ...
0
votes
0
answers
53
views
Understanding and improving coherence values using Mallet
I am attempting to run an LDA topic model using Mallet. My corpus consists of user comments from news websites. It's a relatively small corpus with approx. 614k words.
The first approach I took was to ...