Hottest 'string-matching' Answers - Software Engineering Stack Exchange

6 votes

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

The regular expression libraries in some languages let you specify a function to determine the replacement instead of a single string. That, combined with a hash of your strings and their ...

Blrfl

20.5k

answered Jul 14, 2018 at 23:13

6 votes

What algorithm would you best use for string similarity?

You ask about string similarity algorithms but your strings are addresses. I would submit the addresses to a location API such as Google Place Search and use the formatted_address as a point of ...

Dan Wilson

3,163

answered Sep 11, 2018 at 19:59

5 votes

Efficient multiple substrings search

Create a regular expression from your list (something like "word1|word2|word3") and use the regular expression functions available for your language. It will hopefully create a data structure ...

Hans-Martin Mosner

18.6k

answered Apr 15, 2018 at 8:40

4 votes

Accepted

Find a string in list of strings

Your binary search is probably good enough. Alternative approaches will lead to a lot more complexity, for likely little gain. There are a couple of optimization approaches that you can try: A trie ...

amon

136k

answered Aug 3, 2018 at 21:07

4 votes

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

This seems like something which can be efficiently addressed by building a Trie out of all the strings that need to be replaced. Then in one pass, go through the string to be replaced on, character ...

hocho

193

answered Jul 15, 2018 at 5:28

4 votes

Data structure for grouping strings in a collection when they share common substrings

Hmm depends on what your goal is. On the one hand your description is for a data structure that sort of functions like a dictionary except each word references other words that are partials of itself, ...

Kain0_0

16.6k

answered Feb 8 at 11:13

3 votes

Are there typo-tolerance algorithms (as opposed to string similarity)?

I don't think there is a single algorithm that's going to do what you want. For point 1 (transpositions) you can use Damerau-Levenshtein Distance, or the simpler Optimal String Alignment distance (...

rghome

688

answered May 23, 2022 at 14:20

2 votes

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

Your idea of using regexes and a hash table is probably ideal. It is likely much better to perform all replacements in a single pass instead of using multiple replacement passes, because: correctness: ...

amon

136k

answered Jul 14, 2018 at 21:36

2 votes

Algorithm for optimizing text compression

Text compression algorithms Word based compression You already explained yourself the basic principles of a dictionary compression that works by word. However interesting article that you referred ...

Christophe

82.3k

answered Aug 23, 2018 at 18:26

1 vote

Data Matching In VBA - Best way to deal with dynamic data and user entry?

Your approach looks pretty reasonable to me. You can store your unused row data in a 2nd worksheet. For your Update logic, you might try for each row on the main worksheet: copy entire row to ...

Dan Pichelman

13.9k

answered Jan 25, 2019 at 17:56

1 vote

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

As others have said, you must first define "distance". Once you have done so, however, standard approaches can be used. I have implemented Levenshtein this way--most changes were counted ...

Loren Pechtel

3,441

answered Sep 26, 2022 at 2:02

1 vote

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

First: Define what the distance between two characters is. For example, p and b, g and k, d and t Are similar. A and e are reasonably similar. If I had software converting speech to text, and the ...

gnasher729

49.4k

answered Sep 22, 2022 at 21:21

1 vote

Are there typo-tolerance algorithms (as opposed to string similarity)?

There are a couple of string similarity algorithms that could be useful in an editor for detecting typos. The trick is to use the current position, find the start of the current token by moving back ...

Christophe

82.3k

answered May 23, 2022 at 19:20

1 vote

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

Interesting problem! I am sure that someone can and will come up with a better solution, but here are my initial thoughts: Your biggest bottleneck will be the continual reallocation of memory. ...

Aphton

98

answered Jul 14, 2018 at 21:39

1 vote

Why try to match for nth negatives, as opposed to matching lesser positives?

This is a matter of what you want the default to be. If you want to allow by default, you need to list the blocked ones (blacklist). If you want to deny by default, you need to list the allowed ones (...

Jan Hudec

18.5k

answered Dec 4, 2017 at 10:27

Stack Exchange Network

Tag Info

Hot answers tagged string-matching

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

What algorithm would you best use for string similarity?

Efficient multiple substrings search

Find a string in list of strings

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

Data structure for grouping strings in a collection when they share common substrings

Are there typo-tolerance algorithms (as opposed to string similarity)?

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

Algorithm for optimizing text compression

Data Matching In VBA - Best way to deal with dynamic data and user entry?

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

Are there typo-tolerance algorithms (as opposed to string similarity)?

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

Why try to match for nth negatives, as opposed to matching lesser positives?

Tag Info

Hot answers tagged string-matching

Related Tags