1

I'm currently using the find function and found a slight problem.

theres gonna be a fire here

If I have a sentence with the word "here" and "theres" and I use find() to find "here"s index, I instead get "theres"

I thought find() would be like if thisword in thatword:

as it would find the word, not a substring within a string.

Is there another function that may work similarly? I'm using find() quite heavily would like to know of alternatives before I clog the code with string.split() then iterate until I find the exact match with an index counter on the side.

MainLine = str('theres gonna be a fire here')
WordtoFind = str('here')
#String_Len =  MainLine.find(WordtoFind)
split_line = MainLine.split()

indexCounter = 0
for i in range (0,len(split_line)):
     indexCounter += (len(split_line[i]) + 1)
     if WordtoFind in split_line[i]:
          #String_Len =  MainLine.find(split_line[i])
          String_Len = indexCounter 
          break
3
  • 2
    why not look for " here" with the leading space? Commented Nov 30, 2018 at 21:59
  • 4
    str.find finds substrings, not words. It has no notion of words. What you described about using split then iterating is the first step beyond using find, that is, tokenization and search. The way to avoid clogging code is to re-use code with functions, classes, etc Commented Nov 30, 2018 at 22:00
  • 2
    @DanielJimenez not if "here" could be the fist word, better to split or use a regex Commented Nov 30, 2018 at 22:02

1 Answer 1

2

The best route would be regular expressions. To find a "word" just make sure that the leading and ending characters are not alphanumeric. It uses no splits, has no exposed loops, and even works when you run into a weird sentence like "There is a fire,here". A find_word function might look like this

import re
def find_word_start(word, string):
    pattern = "(?<![a-zA-Z0-9])"+word+"(?![a-zA-Z0-9])"
    result = re.search(pattern, string)
    return result.start()
>> find_word_start("here", "There is a fire,here")
>> 16

The regex I made uses a trick called lookarounds that make sure that the characters preceding and after the word are not letters or numbers. https://www.regular-expressions.info/lookaround.html. The term [a-zA-Z0-9] is a character set that is comprised of a single character in the sets a-z, A-Z, and 0-9. Look up the python re module to find out more about regular expressions.

Sign up to request clarification or add additional context in comments.

2 Comments

Question, if the string in question was "There is a fire here, and here as well!", I notice it sends me the first location. Any way to get both locations?
stackoverflow.com/questions/4664850/… try the finditer function and then iterate over the result, and call m.start() on each.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.