2

So I have been searching like crazy, but I still have not found a satisfactory solution. I have some output which looks like the following

kdeconnec   1625     1000   11u  IPv6 414426      0t0  UDP *:1716 
vivaldi-b   1937     1000  263u  IPv4 440390      0t0  UDP 224.0.0.251:5353 
electron    9522     1000   23u  IPv4 414465      0t0  TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED)
flask      27084     1000    3u  IPv4 109532      0t0  TCP 127.0.0.1:3000 (LISTEN)
firefox    27094     1000   99u  IPv4 425877      0t0  TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED)
python     36425     1000    3u  IPv4 109532      0t0  TCP 127.0.0.1:3000 (LISTEN)
chromium  110937     1000  130u  IPv4 439461      0t0  UDP 224.0.0.251:5353 

I want to apply a function called exec_path_from_process_id to each value in the second column, and insert it as the second column. Resulting in the following. The exact formating (alignment) is not important, as long as it is aligned.

kdeconnec  /usr/lib/kdeconnectd        1625    1000  11u   IPv6  414426  0t0  UDP  *:1716                                  
vivaldi-b  /opt/vivaldi/vivaldi-bin    1937    1000  263u  IPv4  440390  0t0  UDP  224.0.0.251:5353                        
electron   /usr/lib/electron/electron  9522    1000  23u   IPv4  414465  0t0  TCP  192.168.0.17:58692->157.240.194.18:443  (ESTABLISHED)
flask      /usr/bin/python3.10         27084   1000  3u    IPv4  109532  0t0  TCP  127.0.0.1:3000                          (LISTEN)
firefox    /usr/lib/firefox/firefox    27094   1000  99u   IPv4  425877  0t0  TCP  192.168.0.17:34114->54.191.222.112:443  (ESTABLISHED)
python     /usr/bin/python3.10         36425   1000  3u    IPv4  109532  0t0  TCP  127.0.0.1:3000                          (LISTEN)
chromium   /usr/lib/chromium/chromium  110937  1000  130u  IPv4  439461  0t0  UDP  224.0.0.251:5353                        
kioslave5  /usr/lib/kf5/kioslave5      133514  1000  6u    IPv4  499063  0t0  TCP  192.168.0.17:54238->84.208.4.225:443    (ESTABLISHED)

My current code is a hot mess, but I got it working at least. The only restraint is that it has to work on bash 3.2+

listeners=$(
    lsof -Pnl +M -i |
        awk -F" " '!_[$1]++' |
        tail -n +2
)

function exec_path_from_process_id () {
    local pid="${1}"
    path=$(readlink -f /proc/"$pid"/exe)
    if [ -z "${path}" ]; then
        path=$(awk '{print $(NF)}' <<< $(ls -alF /proc/"$pid"/exe))
    fi
    echo ${path:-null}
}

pids=($(awk '{ print $2 }' <<< "$listeners"))
IFS=$'\n' read -rd '' -a listeners_array <<< "$listeners"
IFS=$'\n' read -rd '' -a paths <<< $(for i in "${pids[@]}"; do exec_path_from_process_id "$i"; done)
for i in "${!pids[@]}"; do
  row="${listeners_array[i]}"
  row=$(awk -v r="${paths[i]}" '{ print $1 " " r " " $2 " " $3 " " $4 " " $5 " " $6 " " $7 " " $8 " " $9 " " $10}' <<< $row)
  printf '%s\n' "${row[@]}"
done | column -t
3
  • @ibuprofen For instance I get it for flatpak and sometimes snap packages Commented Aug 28, 2022 at 10:45
  • 1
    Ah, OK. Could use realpath -m instead perhaps
    – ibuprofen
    Commented Aug 28, 2022 at 10:51
  • 1
    YMMV with piping the output to column -t to get alignment since that will split the input at any field that contains blanks, e.g. a file path. You can see in your question it's already splitting the final field on each line into 2 separate fields, e.g. <127.0.0.1:3000 (LISTEN)> becomes <127.0.0.1:3000> <(LISTEN)>.
    – Ed Morton
    Commented Aug 28, 2022 at 12:49

2 Answers 2

3

Perhaps something like:

lsof -Pnl +M -i | awk '
# Use: NR > 1 to skip header
NR > 1 && !x[$1]++ {
    # realpath -m
    # (no path components need exist or be a directory)
    cmd = "realpath -m /proc/"$2"/exe"
    cmd | getline path
    close(cmd)
    # We can edit field $2 and print $0
    $2 = path" "$2
    print $0
}' | column -t

The line cmd | getline path executes the command cmd and reads the output into variable path. The command is not closed unless one do close(expression), hence I have it in a variable.

6
  • This looks really promissing! what does the line c | getline p do? Commented Aug 28, 2022 at 11:00
  • @N3buchadnezzar expanded on it a little. See also for example: stackoverflow.com/q/1960895/3342816 , unix.stackexchange.com/a/139559/140633 etc.
    – ibuprofen
    Commented Aug 28, 2022 at 11:10
  • 2
    cmd = "realpath -m /proc/"$2"/exe" should be cmd = "realpath -m \047/proc/"$2"/exe\047" or the contents of $2 will be exposed to the shell for globbing, word splitting, and filename expansion. cmd | getline path should be if ( (cmd | getline path) > 0 ) { ... or similar to protect against failures in cmd | getline, see awk.freeshell.org/AllAboutGetline.
    – Ed Morton
    Commented Aug 28, 2022 at 12:08
  • 2
    Regarding !x[$1]++ = an array used in that context is commonly, idiomatically named seen instead of x for clarity, i.e. !seen[$1]++. Changing a field as in $2 = path" "$2 will change all of the white space on the line to individual blank chars which may be undesirable. I know the OP is outputing to column -t but that will corrupt the output if a path contains spaces. print $0 can be written as just print since $0 is what is printed by default.
    – Ed Morton
    Commented Aug 28, 2022 at 12:10
  • 1
    @EdMorton Thanks for the pointers.
    – ibuprofen
    Commented Aug 30, 2022 at 5:46
3

You said you don't care about formatting as long as the fields are aligned so, just pick a width that'll be wide enough for your needs and then:

$ while read -r a pid b; do
    printf "%-12s%-10s%10s %s\n" "$a" "<$(wc -c <<<"$pid")>" "$pid" "$b"
done < <(lsof -Pnl +M -i)
kdeconnec   <5>             1625 1000   11u  IPv6 414426      0t0  UDP *:1716
vivaldi-b   <5>             1937 1000  263u  IPv4 440390      0t0  UDP 224.0.0.251:5353
electron    <5>             9522 1000   23u  IPv4 414465      0t0  TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED)
flask       <6>            27084 1000    3u  IPv4 109532      0t0  TCP 127.0.0.1:3000 (LISTEN)
firefox     <6>            27094 1000   99u  IPv4 425877      0t0  TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED)
python      <6>            36425 1000    3u  IPv4 109532      0t0  TCP 127.0.0.1:3000 (LISTEN)
chromium    <7>           110937 1000  130u  IPv4 439461      0t0  UDP 224.0.0.251:5353

The above assumes your first column doesn't contain any spaces.

Obviously just change <$(wc -c <<<"$pid")> to whatever the real command is you need to run and the first %-10s to be whatever max width string that command could output. If you REALLY feel there is no max value you could use for that width, let us know as then it'd take a 2-pass approach - 1 to produce the output and then 2 to format the output. If you're happy with using column -t for the formatting then it'd be (replace file with <(lsof -Pnl +M -i) which obviously I don't really have available):

$ while read -r a pid b; do
    printf "%s %s %s %s\n" "$a" "<$(wc -c <<<"$pid")>" "$pid" "$b"
done < file | column -t
kdeconnec  <5>  1625    1000  11u   IPv6  414426  0t0  UDP  *:1716
vivaldi-b  <5>  1937    1000  263u  IPv4  440390  0t0  UDP  224.0.0.251:5353
electron   <5>  9522    1000  23u   IPv4  414465  0t0  TCP  192.168.0.17:58692->157.240.194.18:443  (ESTABLISHED)
flask      <6>  27084   1000  3u    IPv4  109532  0t0  TCP  127.0.0.1:3000                          (LISTEN)
firefox    <6>  27094   1000  99u   IPv4  425877  0t0  TCP  192.168.0.17:34114->54.191.222.112:443  (ESTABLISHED)
python     <6>  36425   1000  3u    IPv4  109532  0t0  TCP  127.0.0.1:3000                          (LISTEN)
chromium   <7>  110937  1000  130u  IPv4  439461  0t0  UDP  224.0.0.251:5353

but that would fail if any part of your line contained spaces, e.g. the output of the command you're running on the pid.

Since you asked, here's a 2-pass approach:

  1. Instead of outputting text that has spaces separating fields and newlines separating records as above, produce output that uses newlines to separate fields and NUL to separate records:
while read -r a pid b; do printf "%s\n%s\n%s\n%s\0" "$a" "<$(wc -c <<<"$pid")>" "$pid" "$b"; done < file
  1. Write an awk script that reads NUL-separated records containing newline-separated fields, calculate the max width of each field when reading the input and output each field in that width when printing the output, recombining the fields into single lines:
$ while read -r a pid b; do printf "%s\n%s\n%s\n%s\0" "$a" "<$(wc -c <<<"$pid")>" "$pid" "$b"; done < file |
awk -v RS='\0' -F'\n' '
    { recs[NR]=$0; for (i=1; i<=NF; i++) wids[i]=(length($i)>wids[i] ? length($i) : wids[i]) }
    END { for (n=1; n<=NR; n++) { $0=recs[n]; for (i=1;i<=NF;i++) printf "%-*s%s", wids[i], $i, (i<NF ? OFS : ORS) } }
'
kdeconnec <5> 1625   1000   11u  IPv6 414426      0t0  UDP *:1716
vivaldi-b <5> 1937   1000  263u  IPv4 440390      0t0  UDP 224.0.0.251:5353
electron  <5> 9522   1000   23u  IPv4 414465      0t0  TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED)
flask     <6> 27084  1000    3u  IPv4 109532      0t0  TCP 127.0.0.1:3000 (LISTEN)
firefox   <6> 27094  1000   99u  IPv4 425877      0t0  TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED)
python    <6> 36425  1000    3u  IPv4 109532      0t0  TCP 127.0.0.1:3000 (LISTEN)
chromium  <7> 110937 1000  130u  IPv4 439461      0t0  UDP 224.0.0.251:5353

That requires an awk that can read NUL-separated input, e.g. GNU awk. It assumes that none of your path names or other fields can contain newlines.

If you REALLY wanted to do all of the above in a single awk script, that means awk would have to spin off a subshell every time your external command is called which would be slow and you'd have to ensure you get the quoting right (see http://awk.freeshell.org/AllAboutGetline) but here you go, assuming no spaces that you care about retaining within fields in your input but non-newline spaces in paths would be fine:

$ awk '
    {
        recs[NR] = $0
        for (i=1; i<=NF; i++) {
            lgth = length($i)
            wids[i] = ( lgth > wids[i] ? lgth : wids[i] )
        }

        cmd = "wc -c <<<\047" $2 "\047"
        paths[NR] = ( (cmd | getline line) > 0 ? line : "N/A" )
        close(cmd)
        lgth = length(paths[NR])
        pathWid = ( lgth > pathWid ? lgth : pathWid )

    }
    END {
        for (n=1; n<=NR; n++) {
            $0 = recs[n]
            for (i=1; i<=NF; i++) {
                if ( i == 2 ) {
                    printf "%-*s%s", pathWid, paths[n], OFS
                }
                printf "%-*s%s", wids[i], $i, (i<NF ? OFS : ORS)
            }
        }
    }
' < file
kdeconnec 5 1625   1000 11u  IPv6 414426 0t0 UDP *:1716
vivaldi-b 5 1937   1000 263u IPv4 440390 0t0 UDP 224.0.0.251:5353
electron  5 9522   1000 23u  IPv4 414465 0t0 TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED)
flask     6 27084  1000 3u   IPv4 109532 0t0 TCP 127.0.0.1:3000                         (LISTEN)
firefox   6 27094  1000 99u  IPv4 425877 0t0 TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED)
python    6 36425  1000 3u   IPv4 109532 0t0 TCP 127.0.0.1:3000                         (LISTEN)
chromium  7 110937 1000 130u IPv4 439461 0t0 UDP 224.0.0.251:5353
4
  • Hmm, would normal file paths ever contain spaces? I do not min the end of the output getting aligned. Commented Aug 28, 2022 at 12:59
  • @N3buchadnezzar yes, file paths often contain spaces, assuming they won't and so not handling them properly when they do is a common point of failure for shell scripts.
    – Ed Morton
    Commented Aug 28, 2022 at 13:02
  • Hmm, interesting. I would be interested in the two pass solution then. As the number of elements is quite small the difference in speed would be negligible. Commented Aug 28, 2022 at 13:06
  • 1
    OK I posted a 2-pass approach. I'm using a shell loop to run your shell command (which is what shell is good for), then an awk script to format the output (which is what awk is good for).
    – Ed Morton
    Commented Aug 28, 2022 at 13:19

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.