Creating and appending to an array, mapfile vs arr+=(input) same thing or am I missing something?

Question

Is there a case where mapfile has benefits over arr+=(input)?

Simple examples

mapfile array name, arr:

mkdir {1,2,3}

mapfile -t arr < <(ls)

declare -p arr

output:

declare -a arr=([0])="1" [1]="2" [2]="3")

Edit:

changed title for below; the body had y as the array name, but the title had arr as the name, which this could lead to confusion.

y+=(input)

IFS=$'\n'

y+=($(ls))

declare -p y

output:

declare -a y=([0])="1" [1]="2" [2]="3")

An advantage to mapfile is you don't have to worry about word splitting I think.

For the other way you can avoid word splitting by setting IFS=$'\n' although for this example it's nothing to worry about.

The second example just seems easier to write, anything I'm missing out on?

Stéphane Chazelas · Accepted Answer · 2025-01-31 14:29:02Z

They're not the same thing at all, even after IFS=$'\n'.

In bash specifically (though that syntax was borrowed from zsh¹):

arr=( $(cmd) )

(arr+=( $(cmd) ) would be used to append elements to the array; so would be compared with keys=( -1 "${!arr[@]}" ); readarray -tO "$(( ${keys[@]: -1} + 1))" arr < <(cmd)²).

Does:

Run cmd in a subshell with its stdout open on the writing end of a pipe.
Simultaneously, the parent shell process reads from the other end of the pipe and:
- removes the NUL characters and trailing newline characters
- splits the resulting string based on the contents of the $IFS special variable. For those characters in $IFS that are whitespace characters such as newline, the behaviour is more complex in that:
  - leading and trailing ones are removed (in the case of newline, the trailing ones have been removed by command substitution already as seen above)
  - sequences of one or more are treated as one separator. As an example, the output of printf '\n\n\na\n\n\nb\n\n\n' is split into two elements only: a and b.
- each of these words is then subject to filename generation aka globbing aka pathname expansion, whose behaviour is affected by a number of options including noglob, nullglob, failglob, extglob, globasciiranges, globstar, nocaseglob. That applies to those words that contain characters such as *, ?, [, and with some bash versions \, and more if extglob is enabled.
Then the resulting words are assigned as elements to the $arr array.

Example:

bash-5.1$ touch x '\x' '?x' aX $'foo\n\n\n\n*'
bash-5.1$ IFS=$'\n'
bash-5.1$ ls | cat
aX
foo



*
?x
\x
x
bash-5.1$ arr=( $(ls) )
bash-5.1$ typeset -p arr
declare -a arr=([0]="aX" [1]="foo" [2]="aX" [3]=$'foo\n\n\n\n*' [4]="?x" [5]="\\x" [6]="x" [7]="?x" [8]="\\x" [9]="\\x" [10]="x")

As you can see, the $'foo\n\n\n\n*' file was split into foo and * and * was expanded to the list of files in the current working directory which explains why we get both foo and $'foo\n\n\n\n*', same for ?x which explains why we get \x (shown as "\\x") 3 times as there's the \x line in the output of ls and it's matched by both * and ?x.

With bash 5.0, we get:

bash-5.0$ arr=( $(ls) )
bash-5.0$ typeset -p arr
declare -a arr=([0]="aX" [1]="foo" [2]="aX" [3]=$'foo\n\n\n\n*' [4]="?x" [5]="\\x" [6]="x" [7]="?x" [8]="\\x" [9]="x" [10]="x")

With \x only twice but x three times as in that version, backslash was a globbing operator even when not followed by a globbing operator so \x as a glob matches x.

After shopt nocaseglob, we get:

bash-5.1$ shopt -s nocaseglob
bash-5.1$ arr=( $(ls) )
bash-5.1$ typeset -p arr
declare -a arr=([0]="aX" [1]="foo" [2]="aX" [3]=$'foo\n\n\n\n*' [4]="?x" [5]="\\x" [6]="x" [7]="aX" [8]="?x" [9]="\\x" [10]="\\x" [11]="x")

With aX shown 3 times as it matches ?x as well.

After shopt -s failglob:

bash-5.0$ shopt -s failglob
bash-5.0$ arr=( $(printf '\\z\n') )
bash: no match: \z
bash-5.0$ arr=( $(printf 'WTF?') )
bash: no match: WTF?

And arr=( $(echo '/*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/*') )

Runs out of memory after having made your system unusable for several minutes.

So, to sum up, IFS=$'\n'; arr=( $(cmd) ) doesn't store the lines of the output of cmd in the array, but the filenames resulting from the expansion of the non-empty lines of the output of cmd which are treated as glob patterns.

With mapfile or its less misleading readarray alias:

readarray -t arr < <(cmd)

as above runs cmd in a subshell with its stdout open on the writing end of a pipe.
the <(...) is expanded to something like /dev/fd/63 or /proc/self/fd/63 where 63 is a file descriptor of the parent shell open on the reading end of that pipe.
with the < redirection short for 0<, that /dev/fd/63 is opened for reading on fd 0, which means the stdin of readarray will also be the reading end of that pipe.
readarray reads each line from that pipe (simultaneously from cmd writing to it), discards the line delimiter (-t), and stores it (up to the first NUL if it contains any, at least in current versions of bash) in a new element of the $arr array.

So in the end $arr, assuming cmd outputs no NUL will contain the contents of each line of the output of cmd, regardless of whether they're empty or not of whether they contain glob characters or not.

With the example above:

bash-5.1$ readarray -t arr < <(ls)
bash-5.1$ typeset -p arr
declare -a arr=([0]="aX" [1]="foo" [2]="" [3]="" [4]="" [5]="*" [6]="?x" [7]="\\x" [8]="x")

That's consistent with what we saw in the output of ls | cat earlier, but that's still wrong if the intention was to get the list of files in the current working directory. The output of ls cannot be post-processed unless you use some extensions of the GNU implementation of ls such as --quoting-style=shell-always or the --zero of recent versions (9.0 or above):

bash-5.2$ readarray -td '' arr < <(ls --zero)
bash-5.2$ typeset -p arr
declare -a arr=([0]="aX" [1]=$'foo\n\n\n\n*' [2]="?x" [3]="\\x" [4]="x")

This time, readarray stores the contents of the NUL-delimited records into $arr. IFS=$'\0' can't be used in bash as bash can't store NULs in its variables.

Or:

bash-5.1$ eval "arr=( $(ls --quoting-style=shell-always) )"
bash-5.1$ typeset -p arr
declare -a arr=([0]="aX" [1]=$'foo\n\n\n\n*' [2]="?x" [3]="\\x" [4]="x")

In any case, the correct way to get the list of non-hidden files in the current working directory into an array would be with:

shopt -s nullglob
shopt -u failglob
arr=( * )

You'd only resort to ls --zero if you wanted for instance the list to be sorted by size or modification time which bash globs (contrary to zsh's) cannot do.

As in:

zsh	recent GNU bash + GNU coreutils
`new_to_old=( *.txt(Nom) )`	`readarray -td '' new_to_old < <(ls -td --zero -- *.txt)`
`four_largest=( *.txt(NOL[1,4]) )`	`readarray -td '' four_largest < <(ls -tdrS --zero -- *.txt \| head -zn4)`

Another difference between a=($(cmd)) and readarray < <(cmd) is the exit status which in the former is that of cmd and in the latter that of readarray. With recent versions of bash, you can get the exist status of cmd in the latter with wait "$!"; cmd_status=$?.

^{¹ the arr=( ... ) syntax comes from zsh (bash didn't have arrays until 2.0 in 1996), but note that in zsh, command substitution, while it's also stripping trailing newlines and subject to $IFS-stripping, does not discard NULs (NUL is even in the default value of $IFS there) and is not subject to globbing like in other Bourne-like shells, contributing to making it a safer shell in general.}

^{² readarray aka mapfile doesn't have an append mode, but in recent versions you can tell it the index of the first element where to start storing the elements with -O as shown here. To find out the index of the last element in bash (where arrays are sparse like in ksh!), it's awfully difficult. Here to append the lines of the output of cmd to $arr, instead of that very convoluted code, you might as well read those lines into a temporary array with readarray -r tmp < <(cmd) and append the elements to $arr with arr+=( "${tmp[@]}" ). Also note that if the arr variable was declared as scalar or assoc, the behaviour will vary between those approaches.}

arr+=(cmd) creates and appends the array as does mapfile -t arr < <<(cmd) from what I got mapfile is better to use in some cases as you don't have to set things, is that right? I notice you hate the word mapfile :), would mapArray be better? Point noted on not using ls, I'll do *.mp4 or just * next time. — Nickotine, Commented Jun 25, 2023 at 11:54
@Nickotine, it doesn't map, it only reads, readarray is perfectly fine. — Stéphane Chazelas, Commented Jun 25, 2023 at 12:06
@Nikotine, note that zsh had $mapfile[some/file] to truely map a file long before bash added its mapfile builtin which doesn't map and works on streams, not files specifically (can be used on sockets or pipes just the same for instance) — Stéphane Chazelas, Commented Jun 25, 2023 at 12:11
I appreciate that zsh is better, but is the bash syntax not a bit easier? The zsh syntax is easy as well though. Well now I understand your frustration with mapfile since it doesn't map. Can you please answer my question on the first comment so I know I understand properly? On one of the questions I thought I understood you then you pointed out that I didn't and gave me an example which made me get it. — Nickotine, Commented Jun 25, 2023 at 12:25
I think you would've called me out if my understanding was incorrect. Much appreciated as always. — Nickotine, Commented Jun 25, 2023 at 12:59

Stack Exchange Network

Creating and appending to an array, mapfile vs arr+=(input) same thing or am I missing something?

Simple examples

mapfile array name, arr:

output:

y+=(input)

output:

1 Answer 1

You must log in to answer this question.

Hot Network Questions