added 46 characters in body; edited title

Source Link

edited Mar 6, 2014 at 23:29

Jamal

35.2k
13
134
238

optimize cython Optimize Cython code with np.ndarray contained

Can someone help me further optimize the following cythonCython code snippets? Specifically, aa and bb are np.ndarraynp.ndarray with intint value (range(256)) in them, they. They are one dimension arrays with dynamic length, resultHamming. resultHamming is a one-dimension array with float value in it (dynamic length), bits. bits is an intint list (size 256).

The function is to compare two dynamic length bit vector, and return a similarity value as the distance of the two, where the length of each vector is a multiple of 2048-bit (256 bytes). I want to find the best match between these two bit vector by comparing each 2048-bit block, where each bit vector is represented as ndarrayndarray (read the bit sequence byte by byte, thus each position is range from 0 to 2^8 = 256). Rule for matching is to find global minimum distance between all block pairs, and allow one block in A to be matched with more than one block in B if they have smaller distance. Always compare the smaller size vector against the larger one.

The following code assumes bb vector is smaller, we. We can limit resultHammingresultHamming to be smaller than size of numArrayBnumArrayB and only record numArrayBnumArrayB smallest distance value, but need to track the current size when inserting new value into it. Even with current case (record all the pairwise distance), we actually know the final size of reaultHammingresultHamming at the beginingbeginning.

optimize cython code with np.ndarray contained

Can someone help me further optimize the following cython code snippets? Specifically, a and b are np.ndarray with int value (range(256)) in them, they are one dimension arrays with dynamic length, resultHamming is a one-dimension array with float value in it (dynamic length), bits is an int list (size 256).

The function is to compare two dynamic length bit vector, and return a similarity value as the distance of the two, the length of each vector is a multiple of 2048-bit (256 bytes). I want to find the best match between these two bit vector by comparing each 2048-bit block, each bit vector is represented as ndarray (read the bit sequence byte by byte, thus each position is range from 0 to 2^8 = 256). Rule for matching is to find global minimum distance between all block pairs, allow one block in A to be matched with more than one block in B if they have smaller distance. Always compare the smaller size vector against the larger one.

The following code assumes b vector is smaller, we can limit resultHamming to be smaller than size of numArrayB and only record numArrayB smallest distance value, but need to track the current size when inserting new value into it. Even with current case (record all the pairwise distance), we actually know the final size of reaultHamming at the begining.

Optimize Cython code with np.ndarray contained

Can someone help me further optimize the following Cython code snippets? Specifically, a and b are np.ndarray with int value (range(256)) in them. They are one dimension arrays with dynamic length. resultHamming is a one-dimension array with float value in it (dynamic length). bits is an int list (size 256).

The function is to compare two dynamic length bit vector, and return a similarity value as the distance of the two, where the length of each vector is a multiple of 2048-bit (256 bytes). I want to find the best match between these two bit vector by comparing each 2048-bit block, where each bit vector is represented as ndarray (read the bit sequence byte by byte, thus each position is range from 0 to 2^8 = 256). Rule for matching is to find global minimum distance between all block pairs and allow one block in A to be matched with more than one block in B if they have smaller distance. Always compare the smaller size vector against the larger one.

The following code assumes b vector is smaller. We can limit resultHamming to be smaller than size of numArrayB and only record numArrayB smallest distance value, but need to track the current size when inserting new value into it. Even with current case (record all the pairwise distance), we actually know the final size of resultHamming at the beginning.

added 1155 characters in body

Source Link

edited Mar 6, 2014 at 22:07

Rain Lee

173
1
5

Can someone help me further optimize the following cython code snippets? Specifically, a and b are np.ndarray with int value (range(256)) in them, they are one dimension arrays with dynamic length, resultHamming is a twoone-dimension array with float value in it (dynamic length), bits is an int list (size 256).

The function is to compare two dynamic length bit vector, and return a similarity value as the distance of the two, the length of each vector is a multiple of 2048-bit (256 bytes),. I want to find the best match between these two bit vector by comparing each 2048-bit block, each bit vector is represented as ndarray (read the bit sequence byte by byte, thus each position is range from 0 to 2^8 = 256). Rule for matching is to find global minimum distance between all block pairs, allow one block in A to be matched with more than one block in B if they have smaller distance. Always compare the smaller size vector against the larger one.

The following code assumes b vector is smaller, only record numArrayB smallest distance value, we can limit resultHamming to be smaller than size of numArrayB and only record numArrayB smallest distance value, but need to track the current size when inserting new value into it. Even with current case (record all the pairwise distance), we actually know the final size of reaultHamming at the begining.

Can someone help me further optimize the following cython code snippets? Specifically, a and b are np.ndarray with int value (range(256)) in them, they are one dimension arrays with dynamic length, resultHamming is a two-dimension array with float value in it (dynamic length), bits is an int list (size 256).

The function is to compare two dynamic length bit vector, and return a similarity value as the distance of the two, the length of each vector is a multiple of 2048-bit (256 bytes), find the best match between these two bit vector by comparing each 2048-bit block. Rule for matching is to find global minimum distance between all block pairs, allow one block in A to be matched with more than one block in B if they have smaller distance. Always compare the smaller size vector against the larger one.

The following code assumes b vector is smaller, only record numArrayB smallest distance value, we can limit resultHamming to be smaller than size of numArrayB, but need to track the current size when inserting new value into it. Even with current case (record all the pairwise distance), we actually know the final size of reaultHamming at the begining.

Can someone help me further optimize the following cython code snippets? Specifically, a and b are np.ndarray with int value (range(256)) in them, they are one dimension arrays with dynamic length, resultHamming is a one-dimension array with float value in it (dynamic length), bits is an int list (size 256).

The function is to compare two dynamic length bit vector, and return a similarity value as the distance of the two, the length of each vector is a multiple of 2048-bit (256 bytes). I want to find the best match between these two bit vector by comparing each 2048-bit block, each bit vector is represented as ndarray (read the bit sequence byte by byte, thus each position is range from 0 to 2^8 = 256). Rule for matching is to find global minimum distance between all block pairs, allow one block in A to be matched with more than one block in B if they have smaller distance. Always compare the smaller size vector against the larger one.

The following code assumes b vector is smaller, we can limit resultHamming to be smaller than size of numArrayB and only record numArrayB smallest distance value, but need to track the current size when inserting new value into it. Even with current case (record all the pairwise distance), we actually know the final size of reaultHamming at the begining.

added 1155 characters in body

Source Link

edited Mar 6, 2014 at 21:59

Rain Lee

173
1
5

The function is to compare two dynamic length bit vector, and return a similarity value as the distance of the two, the length of each vector is a multiple of 2048-bit (256 bytes), find the best match between these two bit vector by comparing each 2048-bit block. Rule for matching is to find global minimum distance between all block pairs, allow one block in A to be matched with more than one block in B if they have smaller distance. Always compare the smaller size vector against the larger one.

The following code assumes b vector is smaller, only record numArrayB smallest distance value, we can limit resultHamming to be smaller than size of numArrayB, but need to track the current size when inserting new value into it. Even with current case (record all the pairwise distance), we actually know the final size of reaultHamming at the begining.

def compare(a, b):
    cdef double ipdis, hammingTotal = 0
    cdef int numArrayA = int(a.size/256)
    cdef int numArrayB = int(b.size/256)
    cdef int i, j, k, l, index
    bits = list(xrange(256))
    # Prepare a bit number table for fast query
    for l in xrange(256):
        # nnz() counts the number of 1s in value
        bits[l] = nnz(l)

    resultHamming = np.zeros((numArrayB, numArrayA))[]
    for i in xrange(numArrayB):
        # Count the number of 1-bits in i-th block of B
        onesB = sum(bits[b[k+256*i]] for k in xrange(256))
        for j in xrange(numArrayA):
            # Count the number of 1-bits in j-th block of A
            onesA = sum(bits[a[k+256*j]] for k in xrange(256))
            # Calculate the hamming distance between i-th block of B and j-th block of A 
            hammingCur = sum(bits[b[i*256+k] ^ a[j*256+k]] for k in xrange(256))
            ipdis = 0(hammingCur) / (onesA + onesB)
            if# hammingCurInsertion !=current 0:
dis to resultHamming with sorted order
           ip k = len(hammingCurresultHamming) /- (onesA1
 + onesB)          if dis >= resultHamming[-1]:
            resultHamming[i][j] = ip  resultHamming.append(dis)
            else:
    # Recursively find k smallest distance from a numArrayA by numArrayB array resulthamming.append(resultHamming[k])
    for            while k in> xrange(numArrayB)0 and resultHamming[k-1] > dis:
        index            resultHamming[k] = np.argmin(resultHamming)resultHamming[k-1]
        i, j = divmod(index, (numArrayA-        k)) -= 1
        hammingTotal += resultHamming[i][j]
      resultHamming[k] = #dis
 Update the distance matrix, remove the row and col that contain the
    # Extract k smallest distance
  from the distance array
    resultHammingfor =k np.delete(np.deletein xrange(resultHamming,numArrayB):
 i, 0), j, 1)    hammingTotal += resultHamming[k]
    return round(hammingTotal/(numArrayB), 3)

def compare(a, b):
    cdef double ip, hammingTotal = 0
    cdef int numArrayA = int(a.size/256)
    cdef int numArrayB = int(b.size/256)
    cdef int i, j, k, l, index
    bits = list(xrange(256))
    # Prepare a bit number table for fast query
    for l in xrange(256):
        # nnz() counts the number of 1s in value
        bits[l] = nnz(l)

    resultHamming = np.zeros((numArrayB, numArrayA))
    for i in xrange(numArrayB):
        onesB = sum(bits[b[k+256*i]] for k in xrange(256))
        for j in xrange(numArrayA):
            onesA = sum(bits[a[k+256*j]] for k in xrange(256))
            hammingCur = sum(bits[b[i*256+k] ^ a[j*256+k]] for k in xrange(256))
            ip = 0
            if hammingCur != 0:
                ip = (hammingCur) / (onesA + onesB)
            resultHamming[i][j] = ip
            
    # Recursively find k smallest distance from a numArrayA by numArrayB array
    for k in xrange(numArrayB):
        index = np.argmin(resultHamming)
        i, j = divmod(index, (numArrayA-k))
        hammingTotal += resultHamming[i][j]
        # Update the distance matrix, remove the row and col that contain the smallest distance
         resultHamming = np.delete(np.delete(resultHamming, i, 0), j, 1)
    return round(hammingTotal/(numArrayB), 3)

The function is to compare two dynamic length bit vector, and return a similarity value as the distance of the two, the length of each vector is a multiple of 2048-bit (256 bytes), find the best match between these two bit vector by comparing each 2048-bit block. Rule for matching is to find global minimum distance between all block pairs, allow one block in A to be matched with more than one block in B if they have smaller distance. Always compare the smaller size vector against the larger one.

The following code assumes b vector is smaller, only record numArrayB smallest distance value, we can limit resultHamming to be smaller than size of numArrayB, but need to track the current size when inserting new value into it. Even with current case (record all the pairwise distance), we actually know the final size of reaultHamming at the begining.

def compare(a, b):
    cdef double dis, hammingTotal = 0
    cdef int numArrayA = int(a.size/256)
    cdef int numArrayB = int(b.size/256)
    cdef int i, j, k, l, index
    bits = list(xrange(256))
    # Prepare a bit number table for fast query
    for l in xrange(256):
        # nnz() counts the number of 1s in value
        bits[l] = nnz(l)

    resultHamming = []
    for i in xrange(numArrayB):
        # Count the number of 1-bits in i-th block of B
        onesB = sum(bits[b[k+256*i]] for k in xrange(256))
        for j in xrange(numArrayA):
            # Count the number of 1-bits in j-th block of A
            onesA = sum(bits[a[k+256*j]] for k in xrange(256))
            # Calculate the hamming distance between i-th block of B and j-th block of A 
            hammingCur = sum(bits[b[i*256+k] ^ a[j*256+k]] for k in xrange(256))
            dis = (hammingCur) / (onesA + onesB)
            # Insertion current dis to resultHamming with sorted order
            k = len(resultHamming) - 1
            if dis >= resultHamming[-1]:
                resultHamming.append(dis)
            else:
                resulthamming.append(resultHamming[k])
                while k > 0 and resultHamming[k-1] > dis:
                    resultHamming[k] = resultHamming[k-1]
                    k -= 1
                resultHamming[k] = dis
            
    # Extract k smallest distance from the distance array
    for k in xrange(numArrayB):
        hammingTotal += resultHamming[k]
    return round(hammingTotal/(numArrayB), 3)

Tweeted twitter.com/#!/StackCodeReview/status/440889916487385088

occurred Mar 4, 2014 at 16:42

Source Link

asked Mar 4, 2014 at 15:18

Rain Lee

173
1
5

Loading

Stack Exchange Network

Return to Question

optimize cython Optimize Cython code with np.ndarray contained

optimize cython code with np.ndarray contained

Optimize Cython code with np.ndarray contained