4

This is a bit tricky; I'm trying to work out the best approach to this problem. I have a couple of approaches, but they seem really hacky and I'd like something a little more elegant.

I want to parse a whitespace delimited file, ignoring #comment lines and complaining of any non-empty lines that don't have exactly 4 fields. This is easy enough in awk:

awk '/^#/ {next}; NF == 0 {next}; NF != 4 {exit 1}; (dostuff)'

The trick is what I want to do with the data, is actually set it as variables in bash and then run a bash function, unless $2 contains a specific value.

Here is some pseudocode (mostly real but mixed languages) to explain what I mean:

# awk
/^#/ {next}
NF == 0 {next}
NF != 4 {exit 1}
$2 == "manual" {next}
# bash
NAME=$1
METHOD=$2
URL=$3
TAG=$4
complicated_bash_function_that_calls_lots_of_external_commands
# then magically parse the next line with awk.

I don't know how to do this without some ugly workarounds, such as calling awk or sed separately for each line of the file. (Originally I put the question as "How to call bash function from within awk or each output line of awk from within bash?")

Possibly it would work to modify the bash function into its own script, and make it accept arguments 1, 2, 3, 4 as above. I'm not sure how to call that from within awk, though; hence my question title.

What I would actually prefer to do, is have the whole thing in one file and make it a bash script - calling awk from within bash rather than bash from within awk. But I will still need to call the bash function from within awk--once for each non-comment line of the input file.

How can I do this?

4
  • can you pipe from awk into a while IFS= read -r loop or similar?
    – cas
    Commented Nov 13, 2015 at 2:10
  • Just to mention for future readers another possibility I considered, was the system() command in awk.
    – Wildcard
    Commented Nov 13, 2015 at 3:58
  • 1
    another method is to have your awk/perl/whatever script generate shell commands and pipe them into /bin/sh or /bin/bash. pipe into /bin/cat for testing, then pipe into /bin/sh to run them. another non-UUOC from the makers of "substitute head -n 10000 for cat when testing scripts with huge input files".
    – cas
    Commented Nov 13, 2015 at 4:07
  • No, the system() command just executes the command in a subshell and returns to awk, the output of that command may even be mangled with the output of your awk script in some funny cases unless you use fflush("/dev/stdout"). If you need to parse the output of the command you need to use the | getline syntax.
    – asoundmove
    Commented Nov 17, 2015 at 1:31

2 Answers 2

5

You may be able to do what you want by piping awk's output into a while read loop. For example:

awk '/^#/ {next}; NF == 0 {next}; NF != 4 {exit 1} ; {print}' | 
    while read -r NAME METHOD URL TAG ; do
        :  # do stuff with $NAME, $METHOD, $URL, $TAG
        echo "$NAME:$METHOD:$URL:$TAG"
    done

if [ "$PIPESTATUS" -eq 1 ] ; then
    : # do something to handle awk's exit code
fi

Tested with:

$ cat input.txt 
# comment
NAME METHOD URL TAG
a b c d
1 2 3 4
x y z
a b c d

$ ./testawk.sh input.txt 
NAME:METHOD:URL:TAG
a:b:c:d
1:2:3:4

Note that it correctly exits on the fifth x y z input line.


It's worth pointing out that because the while loop is the target of a pipe, it executes in a sub-shell and is therefore unable to alter the environment (including environment variables) of its parent script.

If that is required, then don't use a pipe, use redirection and process substitution instead:

while read -r NAME METHOD URL TAG ; do
  :  # do stuff with $NAME, $METHOD, $URL, $TAG
  echo "$NAME:$METHOD:$URL:$TAG"
done < <(awk '(/^#/ || NF == 0) {next};
              NF != 4 {
                printf "%s:%s:Wrong number of fields\n", FILENAME, NR > "/dev/stderr";
                exit 1
               };
              {print}' input.txt)

# getting the exit code from the <(...) requires bash 4.4 or newer:
wait $!

if [ "$?" -ne 0 ] ; then
 : # something went wrong in the process substitution, deal with it
fi

Alternatively, you can use the coproc built-in to run the awk script in the background as a co-process:

# By default, array var $COPROC holds the co-process' stdout and
# stdin file descriptors.   See `help coproc`.
coproc {
  awk '(/^#/ || NF == 0) {next};
       NF != 4 {
         printf "%s:%s:Wrong number of fields\n", FILENAME, NR > "/dev/stderr";
         exit 1
       };
       {print}' input.txt
}
awkpid="$!"
#declare -p COPROC # uncomment to see the FDs

while read -r NAME METHOD URL TAG ; do
  echo "$NAME:$METHOD:$URL:$TAG"
done <&"${COPROC[0]}"

wait "$awkpid"
echo "$?"
9
  • Wow! This is great! (I didn't realize a newline after a pipe doesn't have to be escaped...that's just an extra bonus for me.) ;)
    – Wildcard
    Commented Nov 13, 2015 at 2:34
  • 1
    ps: if there were too many other complications (e.g. white-space in the fields) i'd probably just rewrite the whole thing in perl - which is the ideal language when you need to combine the features of awk (and sed and tr etc) and shell.
    – cas
    Commented Nov 13, 2015 at 2:34
  • Yes, I definitely need to learn perl. Thank you! Regarding traps—can you point me to where I can learn about them with a view to implementing one for this command? (I'm sure that's another thing that perl would handle more easily for this case....)
    – Wildcard
    Commented Nov 13, 2015 at 2:37
  • 1
    btw, the while loop will finish when the awk script stops piping data into it - i.e. when it exits.
    – cas
    Commented Nov 13, 2015 at 3:11
  • 1
    anonymous downvoter: if you're going to downvote, at least have the courtesy to explain why.
    – cas
    Commented Nov 17, 2015 at 2:15
4

cas's answer is good, but if you actually need to parse the output in awk again and want to do this from within the first awk command you have a fantastic pipe command syntax in awk:

awk '
{
  cmd = "echo name:tag:url:method" # (very simple example)
  while (cmd | getline)
  {
    #process output ($0)
    print
  }
  close(cmd)
}
'
3
  • Interesting...could you provide a link to the documentation for the awk pipe command syntax? Also is this POSIX compatible? And, what is that getline in there?
    – Wildcard
    Commented Nov 17, 2015 at 1:44
  • Sorry just closed my laptop, so not sure if it is posix. Look up getline in the info/manpage it is the internal awk function to read a line. You can either read from the awk input, from a different file or from a command.
    – asoundmove
    Commented Nov 17, 2015 at 1:51
  • I use gawk (GNU awk) and it seems to be POSIX compliant, see awk(1) - Linux man page, look for command |
    – asoundmove
    Commented Nov 18, 2015 at 0:49

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.