They're not the same thing at all, even after IFS=$'\n'
.
In bash specifically (though that syntax was borrowed from zsh¹):
arr=( $(cmd) )
(arr+=( $(cmd) )
would be used to append elements to the array; so would be compared with keys=( -1 "${!arr[@]}" ); readarray -tO "$(( ${keys[@]: -1} + 1))" arr < <(cmd)
²).
Does:
- Run
cmd
in a subshell with its stdout open on the writing end of a pipe.
- Simultaneously, the parent shell process reads from the other end of the pipe and:
- removes the NUL characters and trailing newline characters
- splits the resulting string based on the contents of the
$IFS
special variable. For those characters in $IFS
that are whitespace characters such as newline, the behaviour is more complex in that:
- leading and trailing ones are removed (in the case of newline, the trailing ones have been removed by command substitution already as seen above)
- sequences of one or more are treated as one separator. As an example, the output of
printf '\n\n\na\n\n\nb\n\n\n'
is split into two elements only: a
and b
.
- each of these words is then subject to filename generation aka globbing aka pathname expansion, whose behaviour is affected by a number of options including
noglob
, nullglob
, failglob
, extglob
, globasciiranges
, globstar
, nocaseglob
. That applies to those words that contain characters such as *
, ?
, [
, and with some bash versions \
, and more if extglob
is enabled.
- Then the resulting words are assigned as elements to the
$arr
array.
Example:
bash-5.1$ touch x '\x' '?x' aX $'foo\n\n\n\n*'
bash-5.1$ IFS=$'\n'
bash-5.1$ ls | cat
aX
foo
*
?x
\x
x
bash-5.1$ arr=( $(ls) )
bash-5.1$ typeset -p arr
declare -a arr=([0]="aX" [1]="foo" [2]="aX" [3]=$'foo\n\n\n\n*' [4]="?x" [5]="\\x" [6]="x" [7]="?x" [8]="\\x" [9]="\\x" [10]="x")
As you can see, the $'foo\n\n\n\n*'
file was split into foo
and *
and *
was expanded to the list of files in the current working directory which explains why we get both foo
and $'foo\n\n\n\n*'
, same for ?x
which explains why we get \x
(shown as "\\x"
) 3 times as there's the \x
line in the output of ls
and it's matched by both *
and ?x
.
With bash 5.0, we get:
bash-5.0$ arr=( $(ls) )
bash-5.0$ typeset -p arr
declare -a arr=([0]="aX" [1]="foo" [2]="aX" [3]=$'foo\n\n\n\n*' [4]="?x" [5]="\\x" [6]="x" [7]="?x" [8]="\\x" [9]="x" [10]="x")
With \x
only twice but x
three times as in that version, backslash was a globbing operator even when not followed by a globbing operator so \x
as a glob matches x
.
After shopt nocaseglob
, we get:
bash-5.1$ shopt -s nocaseglob
bash-5.1$ arr=( $(ls) )
bash-5.1$ typeset -p arr
declare -a arr=([0]="aX" [1]="foo" [2]="aX" [3]=$'foo\n\n\n\n*' [4]="?x" [5]="\\x" [6]="x" [7]="aX" [8]="?x" [9]="\\x" [10]="\\x" [11]="x")
With aX
shown 3 times as it matches ?x
as well.
After shopt -s failglob
:
bash-5.0$ shopt -s failglob
bash-5.0$ arr=( $(printf '\\z\n') )
bash: no match: \z
bash-5.0$ arr=( $(printf 'WTF?') )
bash: no match: WTF?
And arr=( $(echo '/*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/*') )
Runs out of memory after having made your system unusable for several minutes.
So, to sum up, IFS=$'\n'; arr=( $(cmd) )
doesn't store the lines of the output of cmd
in the array, but the filenames resulting from the expansion of the non-empty lines of the output of cmd
which are treated as glob patterns.
With mapfile
or its less misleading readarray
alias:
readarray -t arr < <(cmd)
- as above runs
cmd
in a subshell with its stdout open on the writing end of a pipe.
- the
<(...)
is expanded to something like /dev/fd/63
or /proc/self/fd/63
where 63
is a file descriptor of the parent shell open on the reading end of that pipe.
- with the
<
redirection short for 0<
, that /dev/fd/63 is opened for reading on fd 0, which means the stdin of readarray
will also be the reading end of that pipe.
readarray
reads each line from that pipe (simultaneously from cmd
writing to it), discards the line delimiter (-t
), and stores it (up to the first NUL if it contains any, at least in current versions of bash) in a new element of the $arr
array.
So in the end $arr
, assuming cmd
outputs no NUL will contain the contents of each line of the output of cmd
, regardless of whether they're empty or not of whether they contain glob characters or not.
With the example above:
bash-5.1$ readarray -t arr < <(ls)
bash-5.1$ typeset -p arr
declare -a arr=([0]="aX" [1]="foo" [2]="" [3]="" [4]="" [5]="*" [6]="?x" [7]="\\x" [8]="x")
That's consistent with what we saw in the output of ls | cat
earlier, but that's still wrong if the intention was to get the list of files in the current working directory. The output of ls
cannot be post-processed unless you use some extensions of the GNU implementation of ls
such as --quoting-style=shell-always
or the --zero
of recent versions (9.0 or above):
bash-5.2$ readarray -td '' arr < <(ls --zero)
bash-5.2$ typeset -p arr
declare -a arr=([0]="aX" [1]=$'foo\n\n\n\n*' [2]="?x" [3]="\\x" [4]="x")
This time, readarray
stores the contents of the NUL-d
elimited records into $arr
. IFS=$'\0'
can't be used in bash
as bash
can't store NULs in its variables.
Or:
bash-5.1$ eval "arr=( $(ls --quoting-style=shell-always) )"
bash-5.1$ typeset -p arr
declare -a arr=([0]="aX" [1]=$'foo\n\n\n\n*' [2]="?x" [3]="\\x" [4]="x")
In any case, the correct way to get the list of non-hidden files in the current working directory into an array would be with:
shopt -s nullglob
shopt -u failglob
arr=( * )
You'd only resort to ls --zero
if you wanted for instance the list to be sorted by size or modification time which bash globs (contrary to zsh's) cannot do.
As in:
zsh |
recent GNU bash + GNU coreutils |
new_to_old=( *.txt(Nom) ) |
readarray -td '' new_to_old < <(ls -td --zero -- *.txt) |
four_largest=( *.txt(NOL[1,4]) ) |
readarray -td '' four_largest < <(ls -tdrS --zero -- *.txt | head -zn4) |
Another difference between a=($(cmd))
and readarray < <(cmd)
is the exit status which in the former is that of cmd
and in the latter that of readarray
. With recent versions of bash
, you can get the exist status of cmd
in the latter with wait "$!"; cmd_status=$?
.
¹ the arr=( ... )
syntax comes from zsh (bash didn't have arrays until 2.0 in 1996), but note that in zsh, command substitution, while it's also stripping trailing newlines and subject to $IFS
-stripping, does not discard NULs (NUL is even in the default value of $IFS
there) and is not subject to globbing like in other Bourne-like shells, contributing to making it a safer shell in general.
² readarray
aka mapfile
doesn't have an append mode, but in recent versions you can tell it the index of the first element where to start storing the elements with -O
as shown here. To find out the index of the last element in bash (where arrays are sparse like in ksh!), it's awfully difficult. Here to append the lines of the output of cmd
to $arr
, instead of that very convoluted code, you might as well read those lines into a temporary array with readarray -r tmp < <(cmd)
and append the elements to $arr
with arr+=( "${tmp[@]}" )
. Also note that if the arr
variable was declared as scalar or assoc, the behaviour will vary between those approaches.