Newest 'nlp+text-mining' Questions

0 votes

1 answer

59 views

catelog sentences into 5 words that represent them

I have dataframe with 1000 text rows. df['text'] I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1) every score will be in df["word1&...

rafine

471

asked Dec 19, 2024 at 10:16

0 votes

1 answer

66 views

similarity from word to sentence after doing words Embedding

I have dataframe with 1000 text rows. I did word2vec . Now I want to create a new field which give me the distance from each sentence to the word that i want, lets say the word "king". I ...

rafine

471

asked Dec 9, 2024 at 8:14

0 votes

1 answer

46 views

Extract Keywords from Text Vector -- one set of keyworks for each element

Please consider the reprex at the end of the post. It works along the lines of https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html It extracts a set ...

larry77

1,533

asked Jun 18, 2024 at 20:30

1 vote

1 answer

53 views

I cannot get past data(stop_words) to analyze text in text mining

It's my first attempt at text mining and I have run into a wall. This is what I have done thus far: library(tm) library(tidytext) library(dplyr) library(ggplot2) text1 <- c("Dear land of ...

Rohan Sagar

31

asked Apr 13, 2024 at 18:02

2 votes

1 answer

93 views

Count of most commonly occurring bigram in 1500 IDs without repeating the count within an ID

I'm trying to count the most commonly occurring bigram in 1500 IDS (1 ID per row with 1 event) without counting the bigram more than 1x in each ID (row). For example, if I have the IDs below, I would ...

MfM

21

asked Aug 14, 2023 at 18:46

2 votes

1 answer

75 views

How to extract only those rows of the DataFrame where the values of two columns of the DataFrame are in English Language?

I have a dataframe which has 27 columns including columns FonctionsStagiaire and ExigencesParticulieres. The dataframe has 13774 rows which are either entirely in English or French. The csv file can ...

gmohor21

45

asked Jun 6, 2023 at 17:58

0 votes

0 answers

88 views

How to use bag-of-words with one-hot encoding on txt file?

EDIT: If there's a better way to ask this question or anything I should articulate to facilitate answers, please let me know. Thanks! I’m trying to integrate one-hot encoding into my R code so I can ...

Maggie

1

asked Mar 16, 2023 at 22:55

0 votes

2 answers

118 views

How can I use Regex to differentiate between a fully uppercase word, and an uppercase word attached to a lower case character with missing whitespace?

Apologies for the convoluted title. I am trying to process text, with some undesirable features: some words are all in upper-case, such as 'EXAMPLE WORD', whilst in other cases there are two words ...

Boyd1878

11

asked Feb 20, 2023 at 15:03

3 votes

1 answer

179 views

Implementing tf-idf in wordclouds

I have some google reviews for some universities in a dataframe like below df_unis. The column uni_name contains the university names. I wish to create word clouds for each university separately but ...

Saeed

2,119

asked Feb 10, 2023 at 23:43

1 vote

0 answers

25 views

How to apply PoS tags to a nested list in R across multiple rows?

I am trying to apply PoS tagging in R to a column that contains a variable that was a sentence. This variable has been tokenised with all irrelavanices removed (punctuation, spaces, spelling etc) Here ...

Anon

11

asked Feb 9, 2023 at 16:26

0 votes

1 answer

124 views

Tool for detecting differences between text passages from two different groups

I have text data from two different groups. In total I have around 4000 text passages with around 300 words. I am searching for a tool that allows me to analyze the difference between these two groups....

Irazall

167

asked Jan 22, 2023 at 18:01

1 vote

1 answer

81 views

Processing large text files in R (Speed up a loop for separating sentences)

I have large text documents (150K lines per document; 160 documents). Then I read them in as a large VCorpus and convert them to a dataframe it runs quite quickly. Now, I want to separate each ...

Alex

33

asked Jan 19, 2023 at 15:40

0 votes

1 answer

101 views

Creating a token count by date and co-occurence term proportion by date using quanteda

I have a quite massive dataset that contains reviews of utilities services from customers all over the UK, this is a small sample of what the data looks like: df <- data.frame (text = c("The ...

R_Student

779

asked Dec 27, 2022 at 23:58

1 vote

0 answers

369 views

How to calculate the coherence score in topic modeling. (R)

I build the topic models with topicModel <- LDA(DTM, K, method = "Gibbs", control = list(iter = 500, verbose = 25)) How to calculate the coherence in this topic modeling? Can I use the ...

Emily

37

asked Nov 21, 2022 at 7:28

0 votes

0 answers

70 views

non-sense word associations in Text Mining

Hi， I ran this text analysis for word associations. However, the word associations do not make any sense. For example, I was interested in the association between "women" and other words. ...

Xian Zhao

81

asked Nov 10, 2022 at 17:53

Collectives™ on Stack Overflow

All Questions

catelog sentences into 5 words that represent them

similarity from word to sentence after doing words Embedding

Extract Keywords from Text Vector -- one set of keyworks for each element

I cannot get past data(stop_words) to analyze text in text mining

Count of most commonly occurring bigram in 1500 IDs without repeating the count within an ID

How to extract only those rows of the DataFrame where the values of two columns of the DataFrame are in English Language?

How to use bag-of-words with one-hot encoding on txt file?

How can I use Regex to differentiate between a fully uppercase word, and an uppercase word attached to a lower case character with missing whitespace?

Implementing tf-idf in wordclouds

How to apply PoS tags to a nested list in R across multiple rows?

Tool for detecting differences between text passages from two different groups

Processing large text files in R (Speed up a loop for separating sentences)

Creating a token count by date and co-occurence term proportion by date using quanteda

How to calculate the coherence score in topic modeling. (R)

non-sense word associations in Text Mining

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags