3

I have a huge directory list of URLs from my Web site. Example:

/folder/folder2/folder3/page.htm
/folder/folder2/folder3/page2.htm
/folder/folder2/folder3/page3.htm
/folder/folder2/folder3/page4.htm

I want to clean this list of all items that have /folder2 in the path. I need a regular expression to perform a find and replace for everything that uses /folder2/ and delete those lines from my list. So find/replace it with blank.

Does anyone know what the proper regular expression for this would be? I should specify I am using Dreamweaver as my editor, which may use different regular expressions.

0

2 Answers 2

7

This expression will match the entire line such that the string "/folder2" occurs in it:

^.+?\/folder2/.+$

HTH.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi - this didn't work in Dreamweaver. Sorry I should have specified that was the program I am using. Maybe the regular expressions are different in Dreamweaver?
Hmmm...shouldn't be. This is plain-vanilla regex. Maybe it doesn't like my second forward-slash, which is unescaped...try this: ^.+\/folder2\/.+$.
Nothing with that either. I have the "Use regular expressions" box checked as well, so it's just not finding any matches.
Simplified even further: .+/folder2/.+
0

In Python that would be:

import re
regex = re.compile('.*/folder2/.*')
f = open("filtered_file.txt", "w")
map(lambda x: f.write(x), filter(lambda x: not regex.match(x), open("input.txt")))
f.close()

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.