Skip to main content

All Questions

Tagged with
0 votes
1 answer
59 views

catelog sentences into 5 words that represent them

I have dataframe with 1000 text rows. df['text'] I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1) every score will be in df["word1&...
rafine's user avatar
  • 471
0 votes
1 answer
66 views

similarity from word to sentence after doing words Embedding

I have dataframe with 1000 text rows. I did word2vec . Now I want to create a new field which give me the distance from each sentence to the word that i want, lets say the word "king". I ...
rafine's user avatar
  • 471
0 votes
1 answer
46 views

Extract Keywords from Text Vector -- one set of keyworks for each element

Please consider the reprex at the end of the post. It works along the lines of https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html It extracts a set ...
larry77's user avatar
  • 1,533
1 vote
1 answer
53 views

I cannot get past data(stop_words) to analyze text in text mining

It's my first attempt at text mining and I have run into a wall. This is what I have done thus far: library(tm) library(tidytext) library(dplyr) library(ggplot2) text1 <- c("Dear land of ...
Rohan Sagar's user avatar
2 votes
1 answer
93 views

Count of most commonly occurring bigram in 1500 IDs without repeating the count within an ID

I'm trying to count the most commonly occurring bigram in 1500 IDS (1 ID per row with 1 event) without counting the bigram more than 1x in each ID (row). For example, if I have the IDs below, I would ...
MfM's user avatar
  • 21
2 votes
1 answer
75 views

How to extract only those rows of the DataFrame where the values of two columns of the DataFrame are in English Language?

I have a dataframe which has 27 columns including columns FonctionsStagiaire and ExigencesParticulieres. The dataframe has 13774 rows which are either entirely in English or French. The csv file can ...
gmohor21's user avatar
0 votes
0 answers
88 views

How to use bag-of-words with one-hot encoding on txt file?

EDIT: If there's a better way to ask this question or anything I should articulate to facilitate answers, please let me know. Thanks! I’m trying to integrate one-hot encoding into my R code so I can ...
Maggie's user avatar
  • 1
0 votes
2 answers
118 views

How can I use Regex to differentiate between a fully uppercase word, and an uppercase word attached to a lower case character with missing whitespace?

Apologies for the convoluted title. I am trying to process text, with some undesirable features: some words are all in upper-case, such as 'EXAMPLE WORD', whilst in other cases there are two words ...
Boyd1878's user avatar
3 votes
1 answer
179 views

Implementing tf-idf in wordclouds

I have some google reviews for some universities in a dataframe like below df_unis. The column uni_name contains the university names. I wish to create word clouds for each university separately but ...
Saeed's user avatar
  • 2,119
1 vote
0 answers
25 views

How to apply PoS tags to a nested list in R across multiple rows?

I am trying to apply PoS tagging in R to a column that contains a variable that was a sentence. This variable has been tokenised with all irrelavanices removed (punctuation, spaces, spelling etc) Here ...
Anon's user avatar
  • 11
0 votes
1 answer
124 views

Tool for detecting differences between text passages from two different groups

I have text data from two different groups. In total I have around 4000 text passages with around 300 words. I am searching for a tool that allows me to analyze the difference between these two groups....
Irazall's user avatar
  • 167
1 vote
1 answer
81 views

Processing large text files in R (Speed up a loop for separating sentences)

I have large text documents (150K lines per document; 160 documents). Then I read them in as a large VCorpus and convert them to a dataframe it runs quite quickly. Now, I want to separate each ...
Alex's user avatar
  • 33
0 votes
1 answer
101 views

Creating a token count by date and co-occurence term proportion by date using quanteda

I have a quite massive dataset that contains reviews of utilities services from customers all over the UK, this is a small sample of what the data looks like: df <- data.frame (text = c("The ...
R_Student's user avatar
  • 779
1 vote
0 answers
369 views

How to calculate the coherence score in topic modeling. (R)

I build the topic models with topicModel <- LDA(DTM, K, method = "Gibbs", control = list(iter = 500, verbose = 25)) How to calculate the coherence in this topic modeling? Can I use the ...
Emily's user avatar
  • 37
0 votes
0 answers
70 views

non-sense word associations in Text Mining

Hi, I ran this text analysis for word associations. However, the word associations do not make any sense. For example, I was interested in the association between "women" and other words. ...
Xian Zhao's user avatar

15 30 50 per page
1
2 3 4 5
32