3

I am trying to create an array based upon filenames, and get in trouble with whitespaces. This seems common. But - as far as I can see - the quotes are set correctly, I guess it must be the way the array is built.

to_dump="$(find . -maxdepth 1 -print0 )"
to_dump_array=($to_dump)

read -p " ->  " final
case "$final" in
   a) for drop in "${to_dump_array[@]}" ;
      do cp "$drop" --recursive --force Destination_Folder && \
      echo "dropped \"$drop\" ;
      done ;;
   b) echo "Won't drop anything" ;;
esac

I guess there should be a nicer way to build an array from a find query. Also, where else am I wrong?

2
  • I think thats an extra quote after drop.
    – mikeserv
    Commented Jul 13, 2015 at 7:54
  • @mikeserv ah yes, that was a typo, thanks
    – erch
    Commented Jul 13, 2015 at 8:07

4 Answers 4

7

-print0 should not be used in a $(...) substitution, because strings in bash variables are null-terminated.

I asked a question whose answer was similar to what this question requires: https://stackoverflow.com/a/30469553/1091693

Adapting that answer to your question:

to_dump=()
while IFS= read -r -d ''; do
  to_dump+=( "$REPLY" )
done < <(find . -maxdepth 1 -print0)

This creates an array called to_dump and uses the read command to read NULL-delimited elements from find. The reason < <(...) is being used here rather than a pipe is to avoid an implicit subshell which would prevent the array from being modified.

It's worth noting that your original find command probably wants a -mindepth 1, or it will pick . (the current directory) and you'll end up doing a recursive copy on that.


I've noticed you use -maxdepth 1 as an argument to find, so perhaps this will be more useful:

shopt -s nullglob
to_dump=( * .[!.]* ..?* )

Avoiding find, this uses bash builtins only, doesn't fork, and is for the most part quite clean.

The first line, shopt -s nullglob, is a bash(-only) command which turns on the nullglob option. This option is described in man 1 bash:

If set, bash allows patterns which match no files (see Pathname Expansion above) to expand to a null string, rather than themselves.

In simpler terms, if you type * but it doesn't match files, it will remove the *. The default behaviour is to put the * in there anyway.

The second line adds 3 globs to the array:

  • *: All files not beginning with .
  • .[!.]*: All files beginning with one . and one non-. character. This is to avoid matching the . and .. directories.
  • ..?*: All files beginning with .. and at least one more character. Added for the same reason as the previous glob, covering the cases it missed.

Bash expands the globs into the definition of the array, and it expands them correctly -- no splitting on whitespace or anything like that.

A caveat on the usage of nullglob: If you have nullglob turned on, curl google.com/search?q=test will result in curl complaining at you for not passing it arguments, and ls /var/fasdfasafs* will give you a listing of the current directory. This is one of the reasons it's not turned on by default.

3
  • Very nice. But could you please explain your "bash builtin" solution a bit more? I think to get most of it, but then, it's just guesswork with missing bits. Thanks!
    – erch
    Commented Jul 13, 2015 at 21:45
  • I've gone into a little more depth for the pure-bash solution. Let me know if it helps Commented Jul 15, 2015 at 6:52
  • Yes, helps a lot! As I'm just getting into shell scripting, this is a very interesting outlook into things to come. Thanks!
    – erch
    Commented Jul 15, 2015 at 16:45
3

Try building the array like this:

read -d $'\0' -r -a to_dump <<< $(find . -maxdepth 1 -print0)
3
  • I have some problems with that read command: -d '\n' splits on backslash, not newline (splitting on newline is the default, or you can explicitly do it with -d $'\n'). You haven't set IFS so it will still do word splitting before it lands in the array. Even with -a, read stops reading at the delimiter so read -d $'\n' will NOT read an entire newline-delimited array. Newline is a valid character in filenames, so this is an unsafe way of getting a list of filenames. You have a . at the end of the here-string which will corrupt the last filename. Commented Jul 13, 2015 at 8:14
  • You are right of course. I took out the typo and switched it to the null character. Setting or not setting IFS does not change the result for me. Commented Jul 13, 2015 at 10:10
  • It still has very strange behaviour: paste.pound-python.org/show/MMZXMlGzKFa0S3F6H9ma Commented Jul 15, 2015 at 6:41
1
 find . -maxdepth 1

...would appear to me to indicate you want:

a=()
for f in ./..?* ./.[!.]* ./*
do  [ -e "$f" ] && a+=$f
done
5
  • oh, wow. Yes, somehow. but I want quite more and tried to keep the question simple. But I keep this for later, thanks!
    – erch
    Commented Jul 13, 2015 at 8:14
  • I edited something similar into my answer, refreshed, and saw your answer - great minds think alike? ;) I have one question though, are you avoiding nullglob intentionally or is that a stylistic choice? Commented Jul 13, 2015 at 8:19
  • I wasn't aware of nullglob until now … this changes things quite a bit … I'll be back later
    – erch
    Commented Jul 13, 2015 at 8:22
  • 1
    @Score_Under - i never use that stuff if it can be helped. It doesn't require much to avoid it, and in doing so you can write code which you can expect to perform much the same in most any shell. I don't even use shell arrays - i kind of hate names and memory storage and would rather just process, but I think that's how the += thing works. If this were my script i wouldn't store anything, the above would be a function, would process args as needed, and return accordingly, This kind of thing aalways seems beside the point.
    – mikeserv
    Commented Jul 13, 2015 at 8:27
  • @mikeserv Could you plaese explain your solution a bit more? I think to get most of it, but then, it's just guesswork with missing bits. Thanks!
    – erch
    Commented Jul 13, 2015 at 20:58
0

I'm not sure the script you have is doing what you think it's doing. I don't think that this converts the null terminated output of the find into an bash array of file names:

to_dump_array=($to_dump)

Have you checked the output of the for loop to see what you're getting?

for drop in "${to_dump_array[@]}"
do
    echo -e "$drop\n"
done

There's a stack overflow with some suggestions on filling an array from find -print0.

https://stackoverflow.com/questions/1116992/capturing-output-of-find-print0-into-a-bash-array

You also may be better using the array index to process the array items rather than trying to assign to the variable directly within the for loop, it avoids relying on the shell splitting correctly:

for drop in $(seq 0 $((${#to_dump_array[@]} - 1)))
do
    cp "${to_dump_array[$drop]}"

This may not be very popular, but if you need to use bash arrays and multiple escapes to manage white space (among other even more unusual characters) in file names then you might be going beyond what the shell was designed for.

You may find something like python, perl, ruby to be faster, more reliable and easier to manage and debug the code base.

I have a number of "red flags" which I use to tell me when I might be going beyond what is wise when shell scripting.

  • Multiple levels of escaping or null termination to handle all file name cases.
  • Several levels of shell functions calling one another.
  • Complex data structures. I.e. going beyond strings, numbers.
  • I find myself doing "performance optimisation".
  • "Line noise" in expressions.

Yes, you certainly can do all of the above, but should you? Seriously... Python, perl, ruby etc. In python it's this simple, and you don't need to worry about escaping or file names with white space or binary characters.

import os

dirListing = os.listdir("somedirectory")

for eachEntry in dirlisting:
    doSomething
10
  • 1
    Ok, but what in the world is wrong w/ performance optimization? Isn't that what programmers are for?
    – mikeserv
    Commented Jul 13, 2015 at 10:12
  • There is nothing wrong with performance optimisation by itself. But If I'm performance optimising a shell script, there is almost certainly a "better way". Commented Jul 13, 2015 at 10:35
  • Maybe so, but I don't think it's fair to classify shells as a lump, especially while you advocate python in their stead. bash is probably the slowest shell of them all - I really do not understand why people use it - but for drop in $(seq 0 $((${#to_dump_array[@]} - 1))) that does not help matters. Why not just count? x=0; while [ "$((x+=1))" -lt "$#" ]; do ...; done?
    – mikeserv
    Commented Jul 13, 2015 at 10:54
  • I think it's fair. If you're worrying about the performance of a shell script you're certainly beyond the optimal usage of the shell language and likely making use of it inappropriately. Commented Jul 13, 2015 at 11:03
  • 1
    What is optimal use? It surely can't be many myriad layers and many hundreds of lines of shell-script (which is not to mention extremely, elaborately confusing sym-link indirection) to deal with namespacing issues for an interpreter which was already far too bloated before the schism in the first place, right? Python has never offered anything of value to me - not since community college - but shell-scripts can be made to work very fast in ways Python will never do. A shell can facilitate data flow through pipelines constructed out of proven, honed tools. That's powerful.
    – mikeserv
    Commented Jul 13, 2015 at 11:14

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.