String Searching Algorithms
1 Searching for single strings
1.1 Knuth-Morris-Pratt
kmp-string-contains
kmp-string-contains-ci
kmp-make-matcher
kmp-matcher?
kmp-matcher-ci?
kmp-matcher-pattern
kmp-find-string
kmp-find-all-strings
1.2 Boyer-Moore-Horspool
1.2.1 Strings
bmh-string-contains
bmh-string-contains-ci
bmh-make-matcher
bmh-matcher?
bmh-matcher-ci?
bmh-matcher-pattern
bmh-find-string
bmh-find-all-strings
1.2.2 Byte Strings
bmh-byte-string-contains
bmh-make-byte-matcher
bmh-byte-matcher?
bmh-byte-matcher-pattern
bmh-find-byte-string
bmh-find-all-byte-strings
2 Searching for multiple strings
2.1 Aho-Corasick
ahoc-make-matcher
list->ahoc-matcher
ahoc-matcher?
ahoc-matcher-patterns
ahoc-find-string
ahoc-find-allstrings
8.16

String Searching Algorithms๐Ÿ”—โ„น

    1 Searching for single strings

    2 Searching for multiple strings

 (require string-searchers) package: string-searchers

Provides a variety of string search algorithms written in Typed Racket. They look for sequences of exact code points or bytes, not equivalencies. When using with non-ASCII text, consider normalizing strings first.

1 Searching for single strings๐Ÿ”—โ„น

Functions for searching for a single substring in strings.

1.1 Knuth-Morris-Pratt๐Ÿ”—โ„น

Search for strings using the Knuth-Morris-Pratt algorithm. These functions are also available in the string-searchers/kmp module, without the kmp- prefix.

procedure

(kmp-string-contains haystack    
  needle    
  [start    
  end]) โ†’ (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(kmp-string-contains-ci haystack    
  needle    
  [start    
  end]) โ†’ (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, ignoring case, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(kmp-make-matcher pattern 
  [#:case-insensitive case-insensitive?]) 
 โ†’ kmp-matcher
  pattern : String
  case-insensitive? : Boolean = #f
Compiles a search string into a matcher object, optionally using case-insensitive searching.

procedure

(kmp-matcher? obj) โ†’ Boolean

  obj : Any
Returns true if its argument is a kmp-matcher object.

procedure

(kmp-matcher-ci? m) โ†’ Boolean

  m : kmp-matcher
Tests if a matcher is case-insensitive or not

procedure

(kmp-matcher-pattern m) โ†’ String

  m : kmp-matcher
Returns the search string for this matcher.

procedure

(kmp-find-string m text [start end]) โ†’ (Option Index)

  m : kmp-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
Return the index of the first occurance of the matcherโ€™s search string in text, or #f if not found.

procedure

(kmp-find-all-strings m    
  text    
  [start    
  end    
  #:overlap overlap?]) โ†’ (Listof Index)
  m : kmp-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
  overlap? : Boolean = #t
Return the indexs of all occurances of the matcherโ€™s search string in text, or an empty list if not found. If the overlap option is true, found matches can overlap - searching for "bb" in "bbb" will return '(0 1) when true, (0) when false.

1.2 Boyer-Moore-Horspool๐Ÿ”—โ„น

Search for strings using the Boyer-Moore-Horspool algorithm. These functions are also available in the string-searchers/bmh module, without the bmh- prefix.

1.2.1 Strings๐Ÿ”—โ„น

procedure

(bmh-string-contains haystack    
  needle    
  [start    
  end]) โ†’ (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(bmh-string-contains-ci haystack    
  needle    
  [start    
  end]) โ†’ (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, ignoring case, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(bmh-make-matcher pattern 
  [#:case-insensitive case-insensitive?]) 
 โ†’ bmh-matcher
  pattern : String
  case-insensitive? : Boolean = #f
Compiles a search string into a matcher object, optionally using case-insensitive searching.

procedure

(bmh-matcher? obj) โ†’ Boolean

  obj : Any
Returns true if its argument is a bmh-matcher object.

procedure

(bmh-matcher-ci? m) โ†’ Boolean

  m : bmh-matcher
Tests if a matcher is case-insensitive or not

procedure

(bmh-matcher-pattern m) โ†’ String

  m : bmh-matcher
Returns the search string for this matcher.

procedure

(bmh-find-string m text [start end]) โ†’ (Option Index)

  m : bmh-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
Return the index of the first occurance of the matcherโ€™s search string in text, or #f if not found.

procedure

(bmh-find-all-strings m    
  text    
  [start    
  end    
  #:overlap overlap?]) โ†’ (Listof Index)
  m : bmh-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
  overlap? : Boolean = #t
Return the indexes of all occurances of the matcherโ€™s search string in text, or an empty list if not found. If the overlap option is true, found matches can overlap - searching for "bb" in "bbb" will return '(0 1) when true, '(0) when false.

1.2.2 Byte Strings๐Ÿ”—โ„น

Note: There are no case-insensitive routines for byte strings.

procedure

(bmh-byte-string-contains haystack    
  needle    
  [start    
  end]) โ†’ (Option Index)
  haystack : Bytes
  needle : Bytes
  start : Index = 0
  end : Index = (bytes-length haystack)
Returns the first index of haystack where needle occurs, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(bmh-make-byte-matcher pattern) โ†’ bmh-byte-matcher

  pattern : Bytes
Return a new matcher that searches for the given byte string.

procedure

(bmh-byte-matcher? obj) โ†’ Boolean

  obj : Any
Returns true if its argument is a bmh-byte-matcher object.

procedure

(bmh-byte-matcher-pattern m) โ†’ Bytes

  m : bmh-byte-matcher
Returns the search byte string for this matcher.

procedure

(bmh-find-byte-string m text [start end]) โ†’ (Option Index)

  m : bmh-byte-matcher
  text : Bytes
  start : Index = 0
  end : Index = (bytes-length text)
Return the index of the first occurance of the matcherโ€™s search byte string in text, or #f if not found.

procedure

(bmh-find-all-byte-strings m    
  text    
  [start    
  end    
  #:overlap overlap?]) โ†’ (Listof Index)
  m : bmh-byte-matcher
  text : Bytes
  start : Index = 0
  end : Index = (bytes-length text)
  overlap? : Boolean = #t
Return the indexes of all occurances of the matcherโ€™s search byte string in text, or an empty list if not found. If the overlap option is true, found matches can overlap - searching for #"bb" in #"bbb" will return '(0 1) when true, '(0) when false.

2 Searching for multiple strings๐Ÿ”—โ„น

Sections for searching for any of multiple different substrings in a string.

2.1 Aho-Corasick๐Ÿ”—โ„น

Search for strings using the Aho-Corasick algorithm. These functions are also available in the string-searchers/ahoc module, without the ahoc- prefix.

procedure

(ahoc-make-matcher s ...) โ†’ ahoc-matcher

  s : String
Create a new Aho-Corasick matcher object that looks for the given string(s).

procedure

(list->ahoc-matcher strings) โ†’ ahoc-matcher

  strings : (Listof String)
Create a new Aho-Corasick matcher object that looks for the string(s) in the given list.

procedure

(ahoc-matcher? obj) โ†’ Boolean

  obj : Any
Tests if its argument is an Aho-Corasick matcher or not.

procedure

(ahoc-matcher-patterns m) โ†’ (Listof String)

  m : ahoc-matcher
Return the search strings this matcher looks for.

procedure

(ahoc-find-string m text) โ†’ (Option (Pair Index String))

  m : ahoc-matcher
  text : String
Returns the first index of a matched string and which string it is, or #f if there are no matches.

procedure

(ahoc-find-allstrings m text) โ†’ (List (Pair Index String))

  m : ahoc-matcher
  text : String
Returns the locations of all matched strings, or an empty list if none match. If one of the search strings is a prefix of another, multiple results can have the same index.