Fastest way to build string feature vectors Python

Ask Question

Asked 7 years, 10 months ago

Modified 7 years, 10 months ago

Viewed 249 times

I have a list of strings to search along with a list of words and want to produce boolean feature vectors indicating presence of each word within the strings. What is the fastest way to build these feature vectors in Python?

In the example below the first string would output [1, 0, 1, 1] for example. I'm currently using the Aho Corasick algorithm (https://stackoverflow.com/questions/44310516/python-find-occurrences-of-list-of-strings-within-string) to search each string in a list comprehension but think it's possible there's a faster way. The code below uses this method and averages the time over 10 runs.

import time
import numpy as np
import ahocorasick

def check_strings(A, search_list, string_to_search):
    """Use Aho Corasick algorithm to produce boolean list indicating
    prescence of strings within a longer string"""
    index_list = []
    for item in A.iter(string_to_search):
        index_list.append(item[1][0])

    output_list = np.array([0] * len(search_list))
    output_list[index_list] = 1
    return output_list.tolist()


word_list = ["foo", "bar", "hello", "world"]
strings_to_check = ["hello world foo", "foo bar", "bar world"]

A = ahocorasick.Automaton()
for idx, s in enumerate(word_list):
    A.add_word(s, (idx, s))
A.make_automaton()

run_times = []
for i in range(10):
    t0 = time.time()
    feature_vectors = [check_strings(A, word_list, s) for s in strings_to_check]
    run_times.append(time.time()-t0)

print(feature_vectors)
print(np.mean(np.array(run_times)))

Output is:

[[1, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 1]]
1.65939331055e-05

Looking for a more efficient way of doing this, I thought it might be possible somehow by concatenating the strings to avoid the loop but can't quite figure out how to get it to work.

asked Jun 25, 2017 at 11:08

Danny Friar

1311 bronze badge

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Fastest way to build string feature vectors Python

0

Hot Network Questions

Fastest way to build string feature vectors Python

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions