1
q = 2;
k= 2^q;
x1 = [0.0975000000000000,  0.980987500000000, -0.924672950312500, -0.710040130079246];

for i = 1 : length(x1)
    [idx_centers,location] = kmeans(x1',q);
end

temp = idx_centers;

for i = 1 : length(x1)
    if temp(i)== 2
        idx_centers(i) = 0;
    end
    BinaryCode_KMeans(i) =  idx_centers(i);  % output is say [0,0,1,1];
end

strng = num2str(BinaryCode_KMeans);  
DecX = bin2dec(strng); 

In the above code snippet, I want to express the binary string to its decimal equivalent where the binary string is obtained from kmeans clustering. The decimal equivalent should either be 1,2,3, or 4 i.e., k = 2^q when q=2.

But sometimes after conversion, the decimal equivalent is 12 because for a 4 bit binary code we get decimal numbers in 1 to 16 or 0 -- 15. the number of elements in x1 can vary and can be less than or greater than k. What should I do so that I can always get the decimal equivalent of the binary code within k for any value of q?

7
  • Have you noticed that you're passing q instead of k as the number of clusters? It means you only have 2 clusters and not 4 where in your answer you're pointing out that the decimal value could be 1, 2, 3, or 4 which represents the number of clusters as 4. Also, the decimal to binary conversion works fine. It's quite vague, I think, you need to change the way you want to encode the kmeansresults!
    – hmofrad
    Commented Jan 2, 2017 at 19:36
  • @hmofrad: I am passing q because there are 2 clusters or 2 symbols. Each element in x1 array will either belong to cluster 1 or cluster 2.As a result, an array of 4 elements containing symbols 1,2 would be produced. This is the variable idx_centers. I treat this array as a string and replace the symbol 2 with 0 in order to get a binary string BinaryCode_KMeans. I convert this string to its decimal equivalent which should either be 1,2,3 or 4. Basically, I want the decimal number to be in the range of k. For this, how should I cluster, what should be q in kmeans
    – SKM
    Commented Jan 2, 2017 at 19:48
  • Given you can't control the order of cluster naming in kmeans, it can produce idx_centers = [1 2 2 2] or idx_centers = [2 1 1 1], then the binary string would be string = [1 0 0 0] or string = [0 1 1 1] and the decimal value would be 8 or 7. How would you interpret these results because you can't have a decimal value between 1 to 4 here?
    – hmofrad
    Commented Jan 2, 2017 at 20:02
  • Can you suggest in general with any random array and any number of elements, how I can apply the technique to get a decimal representation within a range k using q bits -- how many elements should be in the data array x1 and what should be the input to kmeans so that after converting the clusters symbols, I can get a decimal representation that is in the range k ?
    – SKM
    Commented Jan 2, 2017 at 20:36
  • 1
    If you think this will help you, I can post it as an answer here?
    – hmofrad
    Commented Jan 2, 2017 at 22:15

1 Answer 1

1

First of, there is no need to run kmeans multiple times, it will calculate the cluster centers using a single run. Note that, the code below tries to find a mapping between the clustering results and n the number of samples. There are three ways in the code below to encode this information.

clear
clc

q = 2;
k= 2^q;
n = 4;
x1 = rand(n,1);
fprintf('x1 = [ '); fprintf('%d ', x1); fprintf(']\n');

[idx_centers, location] = kmeans(x1, q);
fprintf('idx_centers = [ '); fprintf('%d ', idx_centers); fprintf(']\n');

for i = 1:q
    idx_centers(idx_centers == i) = i-1;
end

fprintf('idx_centers = [ '); fprintf('%d ', idx_centers); fprintf(']\n');

string = num2str(idx_centers');

% Original decimal value
DecX = bin2dec(string);
fprintf('0 to     (2^n) - 1: %d\n', DecX);

% Reduced space decimal value
% Ignoring the 0/1 order as [ 1 1 0 0 ]
% would be the same      as [ 0 0 1 1 ]
if DecX >= (2^n)/2
    complement = bitget(bitcmp(int64(DecX)),n:-1:1);
    DecX = bin2dec(num2str(complement));
end
fprintf('0 to ((2^n)/2) - 1: %d\n', DecX);

% Minimal Decimal value based on the number of samples  
% in the 0's cluster which is in the range of 0 to n-1
fprintf('0 to         n - 1: %d\n', numel(find(idx_centers == 0)));

Hint: If you change the q to more than 2, the code will not work because bin2dec only accepts zeros and ones. In case of having more than 2 clusters, you need to elaborate the code and use multidimensional arrays to store the pairwise clustering results.

7
  • Thank you for the reply. But I am getting an error Undefined function 'bitcmp' for input arguments of type 'int64'. I do have the bitcmp function but cannot understand why the error occurs. Also, what is the final output that I should be getting? The first DecX below the comment % Original decimal value gives 12. How to fix the error?
    – SKM
    Commented Jan 3, 2017 at 4:02
  • I have another Question related to this one, actually it is in continuation to this Question asked here stackoverflow.com/questions/41387284/…. It would be very kind of you, if you can take a look at the question since you have answered this one, I am sure you would be able to help in that Question as well.
    – SKM
    Commented Jan 3, 2017 at 4:02
  • Try which bitcmp to see if Matlab finds the function path. It may return bitcmp not found or Has no license available. Depending on the output, try to resolve the issue. BTW, I'm working with Matlab R2016b. Moreover, since the code is using random data, every time the final clustering value would be different. The 2nd representation method (which is not working for you right now) would convert 12 to 3 ignoring the 0 and 1s places. Finally, Sure, I'll take a look at your other question!
    – hmofrad
    Commented Jan 3, 2017 at 4:30
  • bitcmp for int16 works. Is there a way to make the kmeans deterministic so that each time it gives the same cluster value? Can you please mention the formula / logic which you have used ?Thanks for your time and effort and it would be of immense help if you can provide a solution for the other question.
    – SKM
    Commented Jan 3, 2017 at 18:11
  • NO, you can't control the cluster naming using Matlab's kmeans. One way that you may achieve this is to modify the cluster names after running kmeans based on the data labels i.e. [ 1 0 0 0 ] will be converted to [0 1 1 1]. HINT: The int16 just works with n=4. If you want to increase the number of samples, you need to work with in32 or in64. THE FORMULA is a mapping between the clustering output and the number of samples n. Clustering n samples may lead to (2^n) different combinations for clustering index. Also, you can reduce this by ignoring the 0s and 1s places. I WILL!
    – hmofrad
    Commented Jan 4, 2017 at 2:34

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.