Replace a loop in python with the equivalent of a matlab find

Question

Assume I have a sorted array of tuples which is sorted by the first value. I want to find the first index where a condition on the first element of the tuple holds. i.e. How do I replace the following code

test_array = [(1,2),(3,4),(5,6),(7,8),)(9,10)]
min_value = 5
index = 0
for c in test_array:
        if c[0] > min_value:
           break
        else:
            index = index + 1

With the equivalent of a matlab find ?

i.e. At the end of this loop I expect to get 3 but I'd like to make this more efficient. I an fine with using numpy for this. I tried using argmax but to no avail.

Thanks

Don't you mean you want to find the last index where the condition holds, rather than the first? Because that's what you're doing here. Can you add a brief example of how you would do this in matlab so we can better understand what you're asking? — pretzlstyle
– pretzlstyle, Commented Feb 9, 2017 at 20:12

Jean-François Fabre · Accepted Answer · 2017-02-10 13:16:48Z

4

Since the list is sorted and if you know the max possible value for the second element (or if there can only be 1 element with the same first value), you could apply bisect on the list of tuples (returns the sorted insertion position in the list)

import bisect
test_array = [(1,2),(3,4),(5,6),(7,8),(9,10)]
min_value = 5

print(bisect.bisect_left(test_array,(min_value,10000)))

Hardcoding to 10000 is bad, so if you only have integers you can do that instead:

print(bisect.bisect_left(test_array,(min_value+1,)))

result: 3

if you had floats (also works with integers) you could use sys.float_info.epsilon like this:

print(bisect.bisect_left(test_array,(min_value*(1+sys.float_info.epsilon),)))

It has O(log(n)) complexity so it's much better than a simple for loop when there are a lot of elements.

edited Feb 10, 2017 at 13:16

answered Feb 9, 2017 at 20:19

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

LostInTheFrequencyDomain Over a year ago

I did not know about bisect. Excellent! Thanks.

Jean-François Fabre Over a year ago

yes, that's a good one. See my edit. Depending on the data, my previous answer could fail.

LostInTheFrequencyDomain Over a year ago

It appears you can use None as the second parameter instead of 1000 as well.

Jean-François Fabre Over a year ago

not in python 3 you can't unorderable types: NoneType() < int(). But without element it works (but it's minimum value, not maximum): >>> (5,) < (5,-30000) yields True

heyiamt · Accepted Answer · 2017-02-09 20:23:01Z

0

In general, numpy's where is used in a fashion similar to MATLAB's find. However, from an efficiency standpoint, I where cannot be controlled to return only the first element found. So, from a computational perspective, what you're doing here is not arguably less inefficient.

The where equivalent would be

index = numpy.where(numpy.array([t[0] for t in test_array]) >= min_value)
index = index[0] - 1

answered Feb 9, 2017 at 20:23

heyiamt

2112 silver badges6 bronze badges

Comments

Yuval Atzmon · Accepted Answer · 2017-02-10 14:06:44Z

0

You can use numpy to indicate the elements that obey the conditions and then use argmax(), to get the index of the first one

import numpy
test_array = numpy.array([(1,2),(3,4),(5,6),(7,8),(9,10)])
min_value = 5

print (test_array[:,0]>min_value).argmax()

if you would like to find all of the elements that obey the condition, use can replace argmax() by nonzero()[0]

edited Feb 10, 2017 at 14:06

answered Feb 9, 2017 at 20:16

Yuval Atzmon

6,0034 gold badges48 silver badges76 bronze badges

5 Comments

pretzlstyle Over a year ago

I'd say this is overkill

LostInTheFrequencyDomain Over a year ago

I would like the index where the condition holds. Further, this returned (3,4). I want to return the index of (7,8). Thanks

Yuval Atzmon Over a year ago

@jphollowed, what do you mean by overkill? That's a simple one line solution.

pretzlstyle Over a year ago

@user2476373 Just because numpy isn't necessary for something you can do trivially with python builtins. But more so because nonzero() is doing more than is needed. I don't consider it clean coding to immediately index the return of a function because you only even need part of it. Might as well use a more appropriate tool

Yuval Atzmon Over a year ago

@jphollowed, Thanks for the feedback. I edited the answer and replaced nonzero with argmax, and now it makes the solution cleaner. Re libraries, I disagree with you. I don't see a difference between import bisect as the accepted answer or import numpy as in this answer. I think numpy is preferable since it makes the code more concise, readble and in many times, faster

Collectives™ on Stack Overflow

Replace a loop in python with the equivalent of a matlab find

3 Answers 3

4 Comments

Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

5 Comments

Linked

Related