6
votes
What is the optimal way to perform 5000 unique string replace functions in terms of performance?
The regular expression libraries in some languages let you specify a function to determine the replacement instead of a single string. That, combined with a hash of your strings and their ...
6
votes
What algorithm would you best use for string similarity?
You ask about string similarity algorithms but your strings are addresses. I would submit the addresses to a location API such as Google Place Search and use the formatted_address as a point of ...
5
votes
Efficient multiple substrings search
Create a regular expression from your list (something like "word1|word2|word3") and use the regular expression functions available for your language. It will hopefully create a data structure ...
4
votes
Accepted
Find a string in list of strings
Your binary search is probably good enough. Alternative approaches will lead to a lot more complexity, for likely little gain.
There are a couple of optimization approaches that you can try:
A trie ...
4
votes
What is the optimal way to perform 5000 unique string replace functions in terms of performance?
This seems like something which can be efficiently addressed by building a Trie out of all the strings that need to be replaced.
Then in one pass, go through the string to be replaced on, character ...
4
votes
Data structure for grouping strings in a collection when they share common substrings
Hmm depends on what your goal is.
On the one hand your description is for a data structure that sort of functions like a dictionary except each word references other words that are partials of itself, ...
3
votes
Are there typo-tolerance algorithms (as opposed to string similarity)?
I don't think there is a single algorithm that's going to do what you want.
For point 1 (transpositions) you can use Damerau-Levenshtein Distance, or the simpler Optimal String Alignment distance (...
2
votes
What is the optimal way to perform 5000 unique string replace functions in terms of performance?
Your idea of using regexes and a hash table is probably ideal. It is likely much better to perform all replacements in a single pass instead of using multiple replacement passes, because:
correctness: ...
2
votes
Algorithm for optimizing text compression
Text compression algorithms
Word based compression
You already explained yourself the basic principles of a dictionary compression that works by word. However interesting article that you referred ...
1
vote
Data Matching In VBA - Best way to deal with dynamic data and user entry?
Your approach looks pretty reasonable to me.
You can store your unused row data in a 2nd worksheet.
For your Update logic, you might try
for each row on the main worksheet:
copy entire row to ...
1
vote
Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?
As others have said, you must first define "distance". Once you have done so, however, standard approaches can be used. I have implemented Levenshtein this way--most changes were counted ...
1
vote
Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?
First: Define what the distance between two characters is. For example, p and b, g and k, d and t Are similar. A and e are reasonably similar. If I had software converting speech to text, and the ...
1
vote
Are there typo-tolerance algorithms (as opposed to string similarity)?
There are a couple of string similarity algorithms that could be useful in an editor for detecting typos. The trick is to use the current position, find the start of the current token by moving back ...
1
vote
What is the optimal way to perform 5000 unique string replace functions in terms of performance?
Interesting problem!
I am sure that someone can and will come up with a better solution, but here are my initial thoughts:
Your biggest bottleneck will be the continual reallocation of memory.
...
1
vote
Why try to match for nth negatives, as opposed to matching lesser positives?
This is a matter of what you want the default to be.
If you want to allow by default, you need to list the blocked ones (blacklist). If you want to deny by default, you need to list the allowed ones (...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
string-matching × 39algorithms × 24
strings × 17
data-structures × 5
search × 4
regular-expressions × 4
java × 3
c# × 2
python × 2
data × 2
sequence × 2
database × 1
javascript × 1
c × 1
coding-standards × 1
optimization × 1
functions × 1
wpf × 1
complexity × 1
big-o × 1
null × 1
conventions × 1
comparison × 1
problem-solving × 1
numbers × 1