Timeline for answer to Kmean clustering on text data by Latent
Current License: CC BY-SA 4.0
Post Revisions
11 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Feb 23, 2019 at 20:43 | comment | added | jen ki | I was looking through one hot encoding and tried this for my dataset but I don't know how to approach it. | |
| Feb 23, 2019 at 9:02 | comment | added | Latent | @Anony-Mousse i've updated the main answer and added some more advanced methods which can be more beneficial for categorical clustering | |
| Feb 23, 2019 at 9:01 | history | edited | Latent | CC BY-SA 4.0 |
updated more approaches
|
| Feb 23, 2019 at 7:50 | comment | added | Has QUIT--Anony-Mousse | While you can use one-hot encoding and similar, that usually yields quite poor and uninterpretable results. Using a method that is actually designed for text or factors is better. | |
| Feb 22, 2019 at 17:09 | comment | added | Latent | @jenki , your data set type is categorical data type , i've added to the main answer the common method to handle that type of data. there are more advanced methods but One-hot-encoding is (as far as i know) the most common method for that type of data. | |
| Feb 22, 2019 at 17:07 | history | edited | Latent | CC BY-SA 4.0 |
added 544 characters in body
|
| Feb 22, 2019 at 16:11 | comment | added | jen ki | Hi, thanks for replying. In my dataset, I have text and some numeric values with plus and pound symbols. This is where I got the data from and its related to crime: old.datahub.io/dataset/uk-criminal-justice/resource/…. | |
| Feb 22, 2019 at 15:25 | history | edited | Latent | CC BY-SA 4.0 |
added 511 characters in body
|
| Feb 22, 2019 at 15:13 | comment | added | HFulcher | Maybe you could flesh out your answer a bit more by suggesting what preprocessing could be done to convert strings to a suitable format? | |
| Feb 22, 2019 at 15:08 | history | edited | Latent | CC BY-SA 4.0 |
added 1 character in body
|
| Feb 22, 2019 at 15:03 | history | answered | Latent | CC BY-SA 4.0 |