1. Bug
The code in the post does not work!
>>> index_of('abba', 'bba')
>>> # expected 1 but got None
Read on to see how you could have discovered this.
2. Review of the unit tests
Having a suite of unit tests is excellent, better than 99% of submissions to Code Review.
Using assert for the unit tests means that you don't get much information out of a test case failure. For example, if I introduce a bug then I might get this test output:
======================================================================
FAIL: test_index_of_with_matching_patterns (cr179311.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "cr179311.py", line 38, in test_index_of_with_matching_patterns
assert index_of('abc', 'c') == 2
AssertionError
----------------------------------------------------------------------
At this point it would be helpful to know what the incorrect value was. The unittest.TestCase class has a bunch of assert methods that produce better output. For example, if I replace the line:
assert index_of('abc', 'c') == 2
with:
self.assertEqual(index_of('abc', 'c'), 2)
then the test failure message includes the incorrect value:
======================================================================
FAIL: test_index_of_with_matching_patterns (cr179311.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "cr179311.py", line 38, in test_index_of_with_matching_patterns
self.assertEqual(index_of('abc', 'c'), 2)
AssertionError: -1 != 2
----------------------------------------------------------------------
Each unit test knows its expected result. But you had to type these in by hand and could have made a mistake. It would be more reliable if the expected result were computed. In this case we could use the built-in str.find method, for example:
self.assertEqual(index_of('abc', 'c'), 'abc'.find('c'))
This approach is known as using a test oracle.
The unit tests are very repetitive. Every test has the form:
assert index_of(haystack, needle) == expected_index
or if you follow my suggestions above,
self.assertEqual(index_of(haystack, needle), str.find(haystack, needle))
The repetition could be avoided by making the tests data-driven, like this:
class TestIndexOf(unittest.TestCase):
CASES = [
# Positive cases
('abc', ''), # all strings contain empty string
('abc', 'a'), # single letters are easy
('abc', 'b'),
('abc', 'c'),
('abc', 'ab'), # multiple letters are harder
('abc', 'bc'),
('abc', 'abc'), # all strings contain themselves
('aaa', 'a'), # multiple occurrences
('aaa', 'aa'), # overlapping pattern
('thisisabsolutelycrazy', 't'),
('abcdef', 'abcdef'), # all strings contain themselves
('ababc', 'abc'), # overlapping prefix
('bananas', 'nas'), # overlapping prefix
('abcabcabc', 'abc'), # multiple occurrences
('abcabcab', 'abc'), # multiple occurrences
('abcabcdef', 'abcd'), # overlapping prefix
('abcabcdef', 'abcdef'), # overlapping prefix
('abcabcdabcde', 'abcde'), # overlapping prefix
('abcabcdabcde', 'abcd'), # multiple occurrences, overlapping prefix
('abra cadabra', 'abra'), # multiple occurrences
('abra cadabra', 'adab'), # overlapping prefix
# Negative cases
('abc', 'z'), # remember to test other letters
('abc', 'ac'), # important to test close cases
('abc', 'az'), # first letter, but not last
('abc', 'abz'), # first 2 letters, but not last
]
def check(self, haystack, needle):
expected = haystack.find(needle)
if expected == -1:
expected = None
self.assertEqual(index_of(haystack, needle), expected,
"index_of({!r}, {!r})".format(haystack, needle))
def test_cases(self):
for haystack, needle in self.CASES:
self.check(haystack, needle)
This makes it easy to add more test cases.
- Since we have a test oracle, it would make a lot of sense to generate random test cases. This allows us to explore a much larger space of possibilities than we could do just by writing out test cases by hand.
To implement this, it's convenient to have a function for choosing a random substring. (See this answer on Stack Overflow for an explanation.)
from math import sqrt
import random
def random_substring(string):
"Return a non-empty substring of string, chosen uniformly at random."
n = len(string)
k = random.randrange(n * (n + 1) // 2)
stop = int((sqrt(8 * k + 1) - 1) / 2) + 1
start = k - stop * (stop - 1) // 2
return string[start:stop]
Now it's easy to generate random (positive) test cases:
def test_random(self):
for n in range(1, 100):
haystack = ''.join(random.choice('abc') for _ in range(n))
needle = random_substring(haystack)
self.check(haystack, needle)
And if we run these randomly generated test cases, we soon get a test failure as described in §1 above:
======================================================================
FAIL: test_random (cr179311.TestIndexOf)
----------------------------------------------------------------------
Traceback (most recent call last):
File "cr179311.py", line 91, in test_random
self.check(haystack, needle)
File "cr179311.py", line 81, in check
"index_of({!r}, {!r})".format(haystack, needle))
AssertionError: None != 3 : index_of('cacaaaaaa', 'aaaaaa')
----------------------------------------------------------------------