1

Assume I have a sorted array of tuples which is sorted by the first value. I want to find the first index where a condition on the first element of the tuple holds. i.e. How do I replace the following code

test_array = [(1,2),(3,4),(5,6),(7,8),)(9,10)]
min_value = 5
index = 0
for c in test_array:
        if c[0] > min_value:
           break
        else:
            index = index + 1

With the equivalent of a matlab find ?

i.e. At the end of this loop I expect to get 3 but I'd like to make this more efficient. I an fine with using numpy for this. I tried using argmax but to no avail.

Thanks

1
  • Don't you mean you want to find the last index where the condition holds, rather than the first? Because that's what you're doing here. Can you add a brief example of how you would do this in matlab so we can better understand what you're asking? Commented Feb 9, 2017 at 20:12

3 Answers 3

4

Since the list is sorted and if you know the max possible value for the second element (or if there can only be 1 element with the same first value), you could apply bisect on the list of tuples (returns the sorted insertion position in the list)

import bisect
test_array = [(1,2),(3,4),(5,6),(7,8),(9,10)]
min_value = 5

print(bisect.bisect_left(test_array,(min_value,10000)))

Hardcoding to 10000 is bad, so if you only have integers you can do that instead:

print(bisect.bisect_left(test_array,(min_value+1,)))

result: 3

if you had floats (also works with integers) you could use sys.float_info.epsilon like this:

print(bisect.bisect_left(test_array,(min_value*(1+sys.float_info.epsilon),)))

It has O(log(n)) complexity so it's much better than a simple for loop when there are a lot of elements.

Sign up to request clarification or add additional context in comments.

4 Comments

I did not know about bisect. Excellent! Thanks.
yes, that's a good one. See my edit. Depending on the data, my previous answer could fail.
It appears you can use None as the second parameter instead of 1000 as well.
not in python 3 you can't unorderable types: NoneType() < int(). But without element it works (but it's minimum value, not maximum): >>> (5,) < (5,-30000) yields True
0

In general, numpy's where is used in a fashion similar to MATLAB's find. However, from an efficiency standpoint, I where cannot be controlled to return only the first element found. So, from a computational perspective, what you're doing here is not arguably less inefficient.

The where equivalent would be

index = numpy.where(numpy.array([t[0] for t in test_array]) >= min_value)
index = index[0] - 1

Comments

0

You can use numpy to indicate the elements that obey the conditions and then use argmax(), to get the index of the first one

import numpy
test_array = numpy.array([(1,2),(3,4),(5,6),(7,8),(9,10)])
min_value = 5

print (test_array[:,0]>min_value).argmax()

if you would like to find all of the elements that obey the condition, use can replace argmax() by nonzero()[0]

5 Comments

I'd say this is overkill
I would like the index where the condition holds. Further, this returned (3,4). I want to return the index of (7,8). Thanks
@jphollowed, what do you mean by overkill? That's a simple one line solution.
@user2476373 Just because numpy isn't necessary for something you can do trivially with python builtins. But more so because nonzero() is doing more than is needed. I don't consider it clean coding to immediately index the return of a function because you only even need part of it. Might as well use a more appropriate tool
@jphollowed, Thanks for the feedback. I edited the answer and replaced nonzero with argmax, and now it makes the solution cleaner. Re libraries, I disagree with you. I don't see a difference between import bisect as the accepted answer or import numpy as in this answer. I think numpy is preferable since it makes the code more concise, readble and in many times, faster

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.