Revisions to Calculation of clustering metric in Python

deleted 9 characters in body

Source Link

edited Feb 27, 2018 at 15:45

1
10
25
35

So whenWhen I try to run the following code for arrays with more than 10k elements, it takes hours and I don't know how to make it in the most efficient way.

Any ideas?

Thank you!

Updated data file.

Source Link

edited Feb 9, 2018 at 16:23

kibs

51
2

Link to example data:Link to example data:

The data file is a Python 2.7 pickle objectxlsx. The format is as followsIt has three columns: One array "C"label (an array of clusters, in this case len(C) = 2 = 2 clusterscluster label), feature_1 and feature_2. Each cluster is an array of arrays (each cluster contains

The process to get C from the vectorized representation offile and get the observations).functions working should be something like this:

import pandas as pd
import numpy as np
df = pd.read_excel('example_data.xlsx')
c1 = np.asanyarray(df[df['labels'] == 0].apply(lambda row: ([row['feature_1'], row['feature_2']]), axis=1))
c2 = np.asanyarray(df[df['labels'] == 1].apply(lambda row: ([row['feature_1'], row['feature_2']]), axis=1))
C = [c1,c2]
H_score(C)

Link to example data:

The data file is a Python 2.7 pickle object. The format is as follows: One array "C" (an array of clusters, in this case len(C) = 2 = 2 clusters). Each cluster is an array of arrays (each cluster contains the vectorized representation of the observations).

Link to example data:

The data file is a xlsx. It has three columns: label (cluster label), feature_1 and feature_2.

The process to get C from the file and get the functions working should be something like this:

import pandas as pd
import numpy as np
df = pd.read_excel('example_data.xlsx')
c1 = np.asanyarray(df[df['labels'] == 0].apply(lambda row: ([row['feature_1'], row['feature_2']]), axis=1))
c2 = np.asanyarray(df[df['labels'] == 1].apply(lambda row: ([row['feature_1'], row['feature_2']]), axis=1))
C = [c1,c2]
H_score(C)

Uploaded example data

Source Link

edited Feb 6, 2018 at 15:59

kibs

51
2

Link to example data:

The data file is a Python 2.7 pickle object. The format is as follows: One array "C" (an array of clusters, in this case len(C) = 2 = 2 clusters). Each cluster is an array of arrays (each cluster contains the vectorized representation of the observations).

Thank you!

edited tags; edited title

Link

edited Feb 5, 2018 at 21:36

200_success

145.7k
22
191
481

Loading

Source Link

asked Feb 5, 2018 at 21:08

kibs

51
2

Loading

Stack Exchange Network

Return to Question