Confusion with Linux find regex

Question

I'm getting quite confused with Linux find command's regular expression usage.

I'm aware that there is an option regextype, but without that, according to the current man page, it is supposed to use Emacs regular expressions. This page seems to say that character classes are supported ("this is a POSIX feature"), but my experiments seem to show that nothing like [[:ascii:]] or [[:digit:]] or [[:alnum:]] ever works, quite apart from the fact that these are truly archaic ways of handling characters classes. Instead you seem to have to use [a-zA-Z] which, apart from anything else, is useless for Unicode characters.

So I turned to regextype: I find that you get a list of possible settings by going find -regextype help. This gives:

find: Unknown regular expression type ‘help’; valid types are ‘findutils-default’, ‘awk’, ‘egrep’, ‘ed’, ‘emacs’, ‘gnu-awk’, ‘grep’, ‘posix-awk’, ‘posix-basic’, ‘posix-egrep’, ‘posix-extended’, ‘posix-minimal-basic’, ‘sed’.

... so I assumed that by including -regextype posix-basic, for example, I'd be able to run something like this:

find . -maxdepth 1 -regextype posix-basic -regex .*\d.*

This produces results, but not the ones I was hoping for: all the files and folders in the current directory with the lower-case letter "d" in their names! I was expecting all names with at least one digit.

I've looked at quite a lot of Linux find regex questions here on Stack Exchange, but I don't think I've seen a single one where "modern" character class handling is demonstrated. Is any of the regextype options able to handle something like this:

find . -maxdepth 1 -regextype ??? -regex '.*\d{3}\s+.*'

where I mean "contains three digits followed by one or more empty space characters". I.e. something like regex rules from a normal language like Java, Python, Javascript, etc...?

later, following comments

Here's an exercise: make a directory and put a few files into it with random names. Then added files with the following names: 'ctb117b', 'ctb117c', 'trt117a'.

I then want to isolate the '117' files. There may be files called 'xxx0009333qqq'. So using a modern regex engine I'd go like this, for example (allowing for the preceding ./):

find . -regex './\w{3}\d\{3}.*'

Using these more venerable Linux regex rules, what do I put that works?

find . -regextype posix-basic -regex '.*[[:digit:]]{3}.*'

produces nothing. Nor does '.*[[:digit:]]+.*', for example. If anyone's sufficiently interested, please show me something which works for you (lists the above files).

\d is a Perl-like expression, also supported by some GNU tools as a short way of writing what would be written as [[:digit:]] in a POSIX expression. Same for \s ([[:blank:]]). The {3} modifier is a POSIX extended expression modifier. — Kusalananda
– Kusalananda ♦, Commented Jan 7, 2020 at 20:34
When using *, be sure to quote it. find . -maxdepth 1 -regextype posix-basic -regex .*\d.* probably gave you unexpected results because *\d.* matched something in the current directory, so the shell expanded it before find ever saw your regex. — Andy Dalton
– Andy Dalton, Commented Jan 7, 2020 at 20:35
[[:digit:]] works with all posix-* -regextype, and also with other (egrep, etc). — user313992
– user313992, Commented Jan 7, 2020 at 20:36
You could use -name '*[[:digit:]][[:digit:]][[:digit:]][[:blank:]]*' — Kusalananda
– Kusalananda ♦, Commented Jan 7, 2020 at 20:36
@AndyDalton I just tried that command with first single quotes and then double quotes. Both returned the same (wrong) results. — mike rodent
– mike rodent, Commented Jan 7, 2020 at 20:38

Stéphane Chazelas · Accepted Answer · 2025-11-11 06:41:22Z

1

With GNU find, I would recommend using this :

find . -maxdepth 1 -regextype posix-extended -regex '.*[[:digit:]]{3}[[:space:]].*'

On BSDs, you'd use the -E option (like for grep or sed) instead of the -regextype posix-extended predicate:

find -E . -maxdepth 1 -regex '.*[[:digit:]]{3}[[:space:]].*'

GNU find, including with posix-extended regextype, also supports \s¹ as an equivalent of POSIX [[:space:]], but not \d as an equivalent of POSIX [[:digit:]].

Note the + in your .*\d{3}\s+.* perl-like regexp would be redundant. one or more (+, short for {1,}) whitespace character (\s) followed by any number of (*) characters (.) is the same as one whitespace character followed by any number of characters (\s.*).

^{¹ like in perl, which itself possibly got it in 2.0 in 1988 from Henry Spencer's library. In emacs, \s is for something else (and was long before perl or Spencer's regexps), but while GNU find's default regexps are emacs-like, they are not the same as in emacs.}

edited Nov 11 at 6:41

Stéphane Chazelas

587k96 gold badges1.1k silver badges1.7k bronze badges

answered Jan 7, 2020 at 20:42

Philippe

1,5059 silver badges15 bronze badges

Thanks. I tried it. Firstly it produced no results, but that wouldn't be surprising, as there are no spaces in the names of the files (see suggested exercise). If I drop the \s+ things get listed. But explain this: if I put '.*[[:digit:]]{80}.*' it returns exactly the same files! It completely ignores the "number of matches" directive!

mike rodent
– mike rodent

2020-01-07 20:59:01 +00:00
Commented Jan 7, 2020 at 20:59
In your post, you said "contains three digits followed by one or more empty space characters". Are you sure you get same files when you put '.*[[:digit:]]{80}.*' and -regextype posix-extended ?

Philippe
– Philippe

2020-01-07 21:02:55 +00:00
Commented Jan 7, 2020 at 21:02
Sorry, no... I've got that wrong. The {80} directive is working as expected. Apologies. By the way, are you saying that backslash character classes work with posix-extended? So should "\w" work? I find that it does... but './\w{3}\d{3}.*' doesn't... I must find the right documentation for posix-extended obviously!

mike rodent
– mike rodent

2020-01-07 21:07:04 +00:00
Commented Jan 7, 2020 at 21:07
\w does not work with posix-extended

Philippe
– Philippe

2020-01-07 21:09:57 +00:00
Commented Jan 7, 2020 at 21:09
1

Nitpick: \w, \d, \s etc. are not POSIX extended regular expression things, but GNU's implementation of POSIX extended regular expressions includes them (for convenience). They are originally from Perl.

Kusalananda
– Kusalananda ♦

2020-01-07 21:23:56 +00:00
Commented Jan 7, 2020 at 21:23

| Show 6 more comments

Stéphane Chazelas · Accepted Answer · 2025-11-11 06:49:56Z

I believe the basic mistake in all the attempts is that the -regex primary checks the entire path, not only the final file name like -name does and is implicitly anchored at start and end (as if there was a hidden ^ at the start and $ at the end of the regex).

Therefore, if you want to check on the last part (component) of the path, you need to prepend .*/ to your search pattern, so that you match any parent path for the file name you want to find.

In other words:

find . -name 'filename'

is equivalent to:

find . -regextype posix-extended -regex '(.*/)?filename' # GNU
find -E . -regex '(.*/)?filename' # BSD

(with the caveat that that .* may fail to match the leading part to the filename if it can't be decoded as valid text in the user's locale).

And for the equivalent of something like -name 'foo*.bar', you'd need (again with extended regex): -regex '(.*/)?foo[^/]*\.bar'.

Stack Exchange Network

Confusion with Linux find regex

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Confusion with Linux find regex

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions