All Questions
Tagged with nlp text-mining
472 questions
0
votes
1
answer
59
views
catelog sentences into 5 words that represent them
I have dataframe with 1000 text rows. df['text']
I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1)
every score will be in df["word1&...
0
votes
1
answer
66
views
similarity from word to sentence after doing words Embedding
I have dataframe with 1000 text rows.
I did word2vec .
Now I want to create a new field which give me the distance from each sentence to the word that i want, lets say the word "king".
I ...
0
votes
1
answer
46
views
Extract Keywords from Text Vector -- one set of keyworks for each element
Please consider the reprex at the end of the post.
It works along the lines of
https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html
It extracts a set ...
1
vote
1
answer
53
views
I cannot get past data(stop_words) to analyze text in text mining
It's my first attempt at text mining and I have run into a wall. This is what I have done thus far:
library(tm)
library(tidytext)
library(dplyr)
library(ggplot2)
text1 <- c("Dear land of ...
2
votes
1
answer
93
views
Count of most commonly occurring bigram in 1500 IDs without repeating the count within an ID
I'm trying to count the most commonly occurring bigram in 1500 IDS (1 ID per row with 1 event) without counting the bigram more than 1x in each ID (row). For example, if I have the IDs below, I would ...
2
votes
1
answer
75
views
How to extract only those rows of the DataFrame where the values of two columns of the DataFrame are in English Language?
I have a dataframe which has 27 columns including columns FonctionsStagiaire and ExigencesParticulieres. The dataframe has 13774 rows which are either entirely in English or French. The csv file can ...
0
votes
0
answers
88
views
How to use bag-of-words with one-hot encoding on txt file?
EDIT: If there's a better way to ask this question or anything I should articulate to facilitate answers, please let me know. Thanks!
I’m trying to integrate one-hot encoding into my R code so I can ...
0
votes
2
answers
118
views
How can I use Regex to differentiate between a fully uppercase word, and an uppercase word attached to a lower case character with missing whitespace?
Apologies for the convoluted title. I am trying to process text, with some undesirable features: some words are all in upper-case, such as 'EXAMPLE WORD', whilst in other cases there are two words ...
3
votes
1
answer
179
views
Implementing tf-idf in wordclouds
I have some google reviews for some universities in a dataframe like below df_unis. The column uni_name contains the university names. I wish to create word clouds for each university separately but ...
1
vote
0
answers
25
views
How to apply PoS tags to a nested list in R across multiple rows?
I am trying to apply PoS tagging in R to a column that contains a variable that was a sentence. This variable has been tokenised with all irrelavanices removed (punctuation, spaces, spelling etc)
Here ...
0
votes
1
answer
124
views
Tool for detecting differences between text passages from two different groups
I have text data from two different groups. In total I have around 4000 text passages with around 300 words.
I am searching for a tool that allows me to analyze the difference between these two groups....
1
vote
1
answer
81
views
Processing large text files in R (Speed up a loop for separating sentences)
I have large text documents (150K lines per document; 160 documents). Then I read them in as a large VCorpus and convert them to a dataframe it runs quite quickly. Now, I want to separate each ...
0
votes
1
answer
101
views
Creating a token count by date and co-occurence term proportion by date using quanteda
I have a quite massive dataset that contains reviews of utilities services from customers all over the UK, this is a small sample of what the data looks like:
df <- data.frame (text = c("The ...
1
vote
0
answers
369
views
How to calculate the coherence score in topic modeling. (R)
I build the topic models with
topicModel <- LDA(DTM, K, method = "Gibbs", control = list(iter = 500, verbose = 25))
How to calculate the coherence in this topic modeling?
Can I use the ...
0
votes
0
answers
70
views
non-sense word associations in Text Mining
Hi, I ran this text analysis for word associations. However, the word associations do not make any sense. For example, I was interested in the association between "women" and other words. ...