I'm performing clustering analysis and visualization (hierarchal, PCA, T-SNE etc.) on a dataset, and a bit confused about the method for data preparation. I understand that the typical options are to standardize, normalize, or log transform, but it seems like there are no hard and fast rules regarding when you apply one over the other?
With standardization and log-transformation - my dataset splits into two clusters with a number of different algorithms. One cluster is large and heterogeneous (which is actually interesting as this is a biological problem and makes logical sense). However, if I normalize the data, I get three clusters out of it - splits the heterogeneous cluster into two. This could make sense as well, but it would be a stretch, and the clusters are not as clean. What could be causing this? The non-heterogeneous cluster remains the same, which is reassuring. Is it reasonable to conclude that the "instability" of the second cluster is further evidence of the heterogeneity in the dataset?