Link to example data:Link to example data:
The data file is a Python 2.7 pickle objectxlsx. The format is as followsIt has three columns: One array "C"label (an array of clusters, in this case len(C) = 2 = 2 clusterscluster label), feature_1 and feature_2. Each cluster is an array of arrays (each cluster contains
The process to get C from the vectorized representation offile and get the observations).functions working should be something like this:
import pandas as pd
import numpy as np
df = pd.read_excel('example_data.xlsx')
c1 = np.asanyarray(df[df['labels'] == 0].apply(lambda row: ([row['feature_1'], row['feature_2']]), axis=1))
c2 = np.asanyarray(df[df['labels'] == 1].apply(lambda row: ([row['feature_1'], row['feature_2']]), axis=1))
C = [c1,c2]
H_score(C)