1

I have a black list that contains banned substrings: I need to make an if statement that checks if ANY of the banned substrings are contained in given url. If it doesn't contain any of them, I want it to do A (and do it only once if any banned is present, not for each banned substring). If url contains one of the banned substrings I want it to do B.

black_list = ['linkedin.com', 'yellowpages.com', 'facebook.com', 'bizapedia.com', 'manta.com',
              'yelp.com', 'nextdoor.com', 'industrynet.com', 'twitter.com', 'zoominfo.com', 
              'google.com', 'yellow-listings.com', 'kompass.com', 'dnb.com', 'tripadvisor.com']

here are just two simple examples of urls that I'm using to check if it works. Url1 have banned substring inside, while url2 doesn't.

url1 = 'https://www.dnb.com/'
url2 = 'https://www.ok/'

I tried the code below that works but was wandering if there is better way (more computationally efficient) of doing it? I have a data frame of 100k+ urls so worried that this will be super slow.

mask = []
for banned in black_list:
    if banned in url:
        mask.append(True)
    else:
        mask.append(False)

if any(mask):
    print("there is a banned substring inside")
else:
    print("no banned substrings inside")      

Does anybody knows more efficient way of doing this?

1
  • I'm afraid that the proposed solutions are not very effective in the case of huge black_list. The proposed have a time complexity of O(mn) where m and n are the size of the black_list and the url set. I think with proper preprocessing it should be possible to reduce it to O(n). The only method I come up with is to use re, but I'm not sure if it delivers this improvement. Commented Feb 25, 2023 at 9:56

2 Answers 2

2

Here is a possible one-line solution:

print('there is a banned substring inside'
      if any(banned_str in url for banned_str in black_list)
      else 'no banned substrings inside')

If you prefer a less pythonic approach:

if any(banned_str in url for banned_str in black_list):
    print('there is a banned substring inside')
else:
    print('no banned substrings inside')
Sign up to request clarification or add additional context in comments.

Comments

0

You should add a flag depending on which perform either A or B.

ban_flag = False
for banned in black_list:
    if banned not in url:
        continue
    else:
        ban_flag = True
if ban_flag:
    print("there is a banned substring inside")
else:
    print("no banned substrings inside")

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.