Skip to main content
10 events
when toggle format what by license comment
Jul 28, 2016 at 0:17 history edited bones.felipe CC BY-SA 3.0
added 617 characters in body
Jul 26, 2016 at 0:20 answer added Has QUIT--Anony-Mousse timeline score: 1
Jul 24, 2016 at 23:57 history edited bones.felipe CC BY-SA 3.0
added 19 characters in body
Jul 24, 2016 at 23:56 comment added bones.felipe Not exactly, I am not doing only the term frequency matrix but adding Inverse document frequency such that kind of effect you are mentioning is neutralized, but indeed at some point the size of the documents impose some bias.
Jul 24, 2016 at 22:01 comment added roundsquare I see. Why is that your measure of importance? It basically means that documents with more words are more important... is that really true for what you are doing? I.e. with a term-frequency matrix, the "importance" is the number of words in each document. This seems problematic since there are often cluster that pick up "small" documents i.e. whose vector is closer to the origin and others that pick up "large" documents i.e. those further from origin. I think something like the number of documents would scale better and give you a better idea of the importance of a cluster.
Jul 24, 2016 at 3:53 history edited bones.felipe CC BY-SA 3.0
added 728 characters in body
Jul 24, 2016 at 3:26 comment added bones.felipe Its the "importance" of the cluster, as every document is a vector I take the sum of every vector in a cluster and then sum all the components, that gaves a sense of the cluster "importance" that is what I call the weight.
Jul 24, 2016 at 2:58 comment added roundsquare What do you mean by the "weight for each cluster"?
Jul 24, 2016 at 1:11 review First posts
Jul 24, 2016 at 3:18
Jul 24, 2016 at 1:10 history asked bones.felipe CC BY-SA 3.0