3

If I run the following grep command I get the output I am expecting:

$ grep -o -E '/ |:|\\|<|>|\*|\?|\"| $' <<< '/home/ttwaro/temp/Test/This is a Test: Testing Number 1?/This is Test: Testing Number 1?.eng.srt'
:
?
:
?

But when I attempt to put the output into an array I get the character 'C' instead of the question mark '?'. What am I doing wrong?

$ array=()
$ array+=($(grep -o -E '/ |:|\\|<|>|\*|\?|\"| $' <<< '/home/ttwaro/temp/Test/This is a Test: Testing Number 1?/This is Test: Testing Number 1?.eng.srt'))
$ printf '%s\n' "${array[@]}"
:
C
:
C
6
  • 3
    ... you have a file named C in the current directory, that is matched by the unquoted ? glob perhaps? Commented Mar 26, 2022 at 18:11
  • When I run this code, I get ? on the first line then : and then ?; I don't get the first ':", but no 'C's either.
    – frabjous
    Commented Mar 26, 2022 at 18:49
  • @frabjous try again in a directory with at least one file or subdir whose name is a single character. The OP happens to have a file/dir named C which is why they see C.
    – terdon
    Commented Mar 26, 2022 at 18:51
  • I see. The middle command is basically array+=(? : ?) which is globbing to the single character file. The reason I don't get the first : is that the OP is using a different string after the <<< in the two examples, which is confusing.
    – frabjous
    Commented Mar 26, 2022 at 19:10
  • @frabjous exactly. And yes, it is indeed confusing, it took me a while to spot it while writing my answer too!
    – terdon
    Commented Mar 26, 2022 at 19:12

1 Answer 1

4

The reason this is happening is because you have a file or subdirectory named C in the directory where you are running the command and so the glob ? is expanded to match it. To illustrate with a simpler example:

$ ls -la
total 92
drwxr-xr-x   2 terdon terdon 16384 Mar 26 18:53 .
drwxr-xr-x 170 terdon terdon 73728 Mar 26 13:57 ..

$ array=( $( printf '?\n' ) )
$ printf '%s\n' "${array[@]}"
?

Now, I create a file named C and try again:

$ touch C
$ array=( $( printf '?\n' ) )
$ printf '%s\n' "${array[@]}"
C

This is down to the order of operations. When you run array=( $( printf '?\n' ) ), what happens, in order, is:

  1. The printf '?\n' is executed and returns a ?.
  2. The unquoted ? is expanded by the shell to any matching file names, in this case one called C.
  3. The expanded glob, the C, is stored as element 0 of the array array.

To avoid this, you need to quote the array assignment so that any globbing characters are protected from the shell:

$ array=( "$( printf '?\n' )" )
$ printf '%s\n' "${array[@]}"
?

Or, with your original example:

$ array=( "$(grep -o -E '/ |:|\\|<|>|\*|\?|\"| $' <<< '/home/ttwaro/temp/Test/This is a Test: Testing Number 1?/This is Test: Testing Number 1?.eng.srt')" )
$ printf '%s\n' "${array[@]}"
:
?
:
?

However, because it is quoted, the entire result is presented as a single string and so the array will only have one element:

$ for ((i=0;i<${#array[@]}; i++)); do 
    printf 'element %s is %s\n' "$i" "${array[i]}"; 
done
element 0 is :
?
:
?

To get each result as a separate element in the array, you need to do something like this as suggested by Gordon Davisson in the comments:

$ readarray -t array < <(grep -o -E '/ |:|\\|<|>|\*|\?|\"| $' <<< '/home/ttwaro/temp/Test/This is a Test: Testing Number 1?/This is Test: Testing Number 1?.eng.srt')

$ for ((i=0;i<${#array[@]}; i++)); do 
    printf 'element %s is %s\n' "$i" "${array[i]}"; 
done
element 0 is :
element 1 is ?
element 2 is :
element 3 is ?

This will circumvent the need for quoting since the output of grep is never seen by the shell.

Finally, the reason you got two Cs in your result is because each of the ? was expanded to a C when it was stored in the array:

$ array=( ? ? ? )
$ printf '%s\n' "${array[@]}"
C
C
C
5
  • 3
    The version with double-quotes creates an array with just a single element (which contains all of the matches, separated by newline characters). To get each match in a separate element, use readarray -t array < <(grep ...) Commented Mar 26, 2022 at 20:44
  • @terdon: Thank you for the great explanation. i now get the expected result by enclosing the array with quotes in the print statement but I still don't understand why the '?" is replaced by a "C". The is no file with the name "C"
    – Tommy
    Commented Mar 27, 2022 at 10:59
  • @GordonDavisson: Thanks, that was going to be my next question. My reason for doing this is to find all the files names that contain invalid windows characters. I not only print the invalid file name but accumulate the invalid characters and finally sort them with the --unique parameter to quickly see which characters are invalid.
    – Tommy
    Commented Mar 27, 2022 at 11:45
  • @GordonDavisson duh, of course! Thanks, answer edited.
    – terdon
    Commented Mar 27, 2022 at 12:46
  • @Tommy pelase see updated answer which includes Gordon's fix. That said, there must be a file or directory or something named C, try running your original command in a new, empty directory, do you still get the C?
    – terdon
    Commented Mar 27, 2022 at 12:47

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.