Skip to main content
6 votes

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

The regular expression libraries in some languages let you specify a function to determine the replacement instead of a single string. That, combined with a hash of your strings and their ...
Blrfl's user avatar
  • 20.5k
6 votes

What algorithm would you best use for string similarity?

You ask about string similarity algorithms but your strings are addresses. I would submit the addresses to a location API such as Google Place Search and use the formatted_address as a point of ...
Dan Wilson's user avatar
  • 3,163
5 votes

Efficient multiple substrings search

Create a regular expression from your list (something like "word1|word2|word3") and use the regular expression functions available for your language. It will hopefully create a data structure ...
Hans-Martin Mosner's user avatar
4 votes
Accepted

Find a string in list of strings

Your binary search is probably good enough. Alternative approaches will lead to a lot more complexity, for likely little gain. There are a couple of optimization approaches that you can try: A trie ...
amon's user avatar
  • 136k
4 votes

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

This seems like something which can be efficiently addressed by building a Trie out of all the strings that need to be replaced. Then in one pass, go through the string to be replaced on, character ...
hocho's user avatar
  • 193
4 votes

Data structure for grouping strings in a collection when they share common substrings

Hmm depends on what your goal is. On the one hand your description is for a data structure that sort of functions like a dictionary except each word references other words that are partials of itself, ...
Kain0_0's user avatar
  • 16.6k
3 votes

Are there typo-tolerance algorithms (as opposed to string similarity)?

I don't think there is a single algorithm that's going to do what you want. For point 1 (transpositions) you can use Damerau-Levenshtein Distance, or the simpler Optimal String Alignment distance (...
rghome's user avatar
  • 688
2 votes

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

Your idea of using regexes and a hash table is probably ideal. It is likely much better to perform all replacements in a single pass instead of using multiple replacement passes, because: correctness: ...
amon's user avatar
  • 136k
2 votes

Algorithm for optimizing text compression

Text compression algorithms Word based compression You already explained yourself the basic principles of a dictionary compression that works by word. However interesting article that you referred ...
Christophe's user avatar
  • 82.3k
1 vote

Data Matching In VBA - Best way to deal with dynamic data and user entry?

Your approach looks pretty reasonable to me. You can store your unused row data in a 2nd worksheet. For your Update logic, you might try for each row on the main worksheet: copy entire row to ...
Dan Pichelman's user avatar
1 vote

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

As others have said, you must first define "distance". Once you have done so, however, standard approaches can be used. I have implemented Levenshtein this way--most changes were counted ...
Loren Pechtel's user avatar
1 vote

Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?

First: Define what the distance between two characters is. For example, p and b, g and k, d and t Are similar. A and e are reasonably similar. If I had software converting speech to text, and the ...
gnasher729's user avatar
  • 49.4k
1 vote

Are there typo-tolerance algorithms (as opposed to string similarity)?

There are a couple of string similarity algorithms that could be useful in an editor for detecting typos. The trick is to use the current position, find the start of the current token by moving back ...
Christophe's user avatar
  • 82.3k
1 vote

What is the optimal way to perform 5000 unique string replace functions in terms of performance?

Interesting problem! I am sure that someone can and will come up with a better solution, but here are my initial thoughts: Your biggest bottleneck will be the continual reallocation of memory. ...
Aphton's user avatar
  • 98
1 vote

Why try to match for nth negatives, as opposed to matching lesser positives?

This is a matter of what you want the default to be. If you want to allow by default, you need to list the blocked ones (blacklist). If you want to deny by default, you need to list the allowed ones (...
Jan Hudec's user avatar
  • 18.5k

Only top scored, non community-wiki answers of a minimum length are eligible