5

I'm running a reasonably up-to-date Msys2:

$ uname -a; bash -version
MSYS_NT-10.0-26100 md2syz7c 3.6.6-2369286a.x86_64 2026-01-14 12:29 UTC x86_64 Msys
GNU bash, version 5.3.9(1)-release (x86_64-pc-cygwin)

I'm confronted with a third-party script which tries to match a string which may be empty. The simplified version is:

[[ 0 =~ ^(0|)$ ]] && echo match

The goal is to match single zeroes or the empty string (note the ^ and $ anchors, otherwise an empty alternative would match anything). This works as expected under old and new gnu/Linuxes (e.g. GNU bash, version 4.4.12(1)-release (i686-pc-linux-gnu), or GNU bash, version 5.2.21(1)-release (x86_64-pc-linux-gnu), as well as under cygwin CYGWIN_NT-10.0-26100 md2syz7c 3.4.6-1.x86_64 2023-02-14 13:23 UTC x86_64 Cygwin.

However, it does not work with the recent Msys2 with the specs mentioned at the top:

$ [[ 0 =~ ^(0|)$ ]] && echo match
-bash: [[: invalid regular expression `^(0|)$': empty (sub)expression

Question: Should it work? (Is this a legit regular expression?) If yes, is this a known bug in Msys2's bash?

4
  • Always paste your script into https://shellcheck.net, a syntax checker, or install shellcheck locally. Make using shellcheck part of your development process. Commented 2 days ago
  • 2
    @waltinator Thanks for the pointer; shellcheck finds no issue though. Commented 2 days ago
  • 1
    Is this for just a one-off or occasional regex match in your script? If so, using bash regex matches is fine. If not, if you're using it in, say, a while read loop or a for loop, then you really should rewrite that section of the script to use sed or awk or perl. If regex matches are the majority of the script then rewrite the whole thing in either awk or perl, optionally with a shell wrapper. Use shell for what it's good at, and use other tools for what they're good at. Commented yesterday
  • @cas The real expression is used to check for any of a fixed number of keywords with the same meaning ("n", "no", "none", "0", "false" or, which creates the complication, an empty string ) that may be present in a variable, once, in the beginning of the script. I think the use case is totally fine. Commented yesterday

1 Answer 1

9

That's not standard extended regular expression syntax.

bash defers regexp evaluation to the system's regcomp()/regexec() regexp interface so all you can count on portably is the common denominator which should be the POSIX standard (not the latest (2024) edition yet as the new perl-like *? and co operators have not made it to many implementations yet).

Quoting the standard (emphasis mine).

|
The <vertical-line> is special except when used in a bracket expression (see RE Bracket Expression). A <vertical-line> appearing first or last in an ERE, or immediately following a <vertical-line> or a <left-parenthesis>, or immediately preceding a <right-parenthesis>, produces undefined results.

If you had to use regexp, you'd use:

[[ $var =~ ^0?$ ]]

but you might as well use ksh-style glob operators (which are enabled by default inside [[...]] in later versions of bash regardless of the setting of the extglob option):

[[ $var = ?(0) ]]

Contrary to regexp matching, bash does its glob pattern matching internally, but may defer to the system's fnmatch() for some of it (such as for things like [[=equiv=][.collating-element.]]), not for ?(0) here.

With POSIX sh syntax:

[ "${var:-0}" = 0 ]

Or more legibly:

case $var in
  (0 | '') ...
esac

Or:

[ -z "$var" ] || [ "$var" = 0 ]

See also [ "$var" -eq 0 ] which does a numeric decimal integer¹ comparison, so would also return true if $var contained 000, -0, +0, some implementations allowing whitespace around the number.

For completeness, the zsh glob syntax would actually be (0|). That's the way you do the regexp ? there (though with extendedglob enabled you could also do 0(#c0,1) as equivalent of ERE 0{0,1}).

zsh's =~ uses either the system's extended regexps or PCRE2 (formerly PCRE) if the rematchpcre option is enabled. [[ $var =~ '^(0|)$' ]] will work with PCRE but with ERE, like for bash, whether that will work will depend on the system's regexp implementation. Obviously, with rematchpcre, you can use *? and all other PCRE operators.


¹ Except in ksh/mksh which accept arithmetic expressions there causing the same kind of vulnerabilities as with the equivalent [[...]] and ((...)) operators.

3
  • 2
    The closest valid (according to those rules) version of the OP's regex would be (^$|^0$). Which does indeed simplify to ^0?$. Is it always the case that an empty part of a | can be replaced by using ?, even with (x|y|)? Probably yes, with arbitrary x and y that should still be equivalent to (x|y)? unless there's any gotchas with capture groups, especially if you need a () around an expression which you wouldn't otherwise. I didn't know POSIX REs had this limitation on empty operand to an | operation. Bash extglob @(x|y|) is sometimes useful, so easy to make this mistake. Commented yesterday
  • 1
    @PeterCordes, note that what you call Bash extglobs are actually ksh globs (from the 80s). zsh's equivalent of ksh's @(x|y|) would be (x|y|), but in ksh, as noted here, you may prefer ?(x|y), of which zsh has no equivalent other than that (x|y|) or (x|y)(#c0,1) (like ksh93's {0,1}(x|y), that one not copied by bash) as I noted. POSIX EREs leave (x|y|) undefined because some implementations didn't allow that. I'd agree it's not particularly useful to reject those. Commented yesterday
  • The pointer to the POSIX standard in particular was convincing ;-). Always best to have a canonical reference. Commented yesterday

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.