Getting output of /usr/bin/time in CSV format

Question

I am using gawk to parse the output of macos' /usr/bin/time into CSV format as shown below. The problem is that gawk is returning the 'involuntary context switches' value for 'voluntary context switches'. I tried /^voluntary context switches/ { vol=$1 } but then, 'vol' has no value.

gawk '
    /real/ { real=$1 }
    /user/ { user=$3 }
    /sys/ { sys=$5 }
    /maximum resident set size/ { maxrss=$1 }
    /page reclaims/ { minor=$1 }
    /page faults/ { major=$1 }
    /voluntary context switches/ { vol=$1 }
    /involuntary context switches/ { invol=$1 }
    /instructions retired/ { instret=$1 }
    /cycles elapsed/ { cycle=$1 }
    /peak memory footprint/ { pkmem=$1 }
    
    END { print real "," user "," sys "," maxrss "," major "," minor "," vol "," invol "," instret "," cycle "," pkmem }
' "$inputfile"

Sample input:

57.03 real       212.49 user        16.24 sys
88588288  maximum resident set size
0  average shared memory size
0  average unshared data size
0  average unshared stack size
5531  page reclaims
2  page faults
0  swaps
0  block input operations
0  block output operations
0  messages sent
0  messages received
0  signals received
1  voluntary context switches
3337714  involuntary context switches
2273379580064  instructions retired
693450611012  cycles elapsed
87999936  peak memory footprint

Results:

57.03,212.49,16.24,88588288,2,5531,3337714,3337714,2273379580064,693450611012,87999936

Which distro are you on, which version of time are you running, with which parameters? Mine (debian 12, GNU time 1.9-0.2) doesn't output things like context switches by default ... please add that info to your question (not in the comments). — tink, Commented Jun 11, 2024 at 19:38
Please edit your question and add i) an example input and ii) the output you expect from that example so we can understand what you need. Also, /usr/bin/time has a lot of formatting options, have you considered using those? Have you seen /usr/bin/time --help? Looks like it can mostly do what I think you want itself. — terdon, Commented Jun 11, 2024 at 19:42
Please edit your question to change Results to either Expected Output or Output I Get depending on whether that's the output you want or the output you get that you don't want and, if it's the latter, also add the expected output. Also, your code is printing column header line so make sure to include that in your expected output and get rid of the **s in the output that are making it such that we can't copy/paste it to test with. — Ed Morton, Commented Jun 12, 2024 at 12:17

ilkkachu · Accepted Answer · 2024-06-13 12:24:30Z

You didn't mention what system you're running on, or what the output from time looks like.

IF you're on macOS or FreeBSD where the output of time -l looks something like this (as described in the man pages, macOS, FreeBSD):

% /usr/bin/time -l sleep 1
        1.00 real         0.00 user         0.00 sys
             1441792  maximum resident set size
                      [...]
                   2  voluntary context switches
                   7  involuntary context switches
                      [...]

then /^voluntary context switches/ indeed would not work, because the label is not at the start of the line. You'd need to use something else, like / voluntary context switches/ (matching the space before "voluntary"), or $2 == "voluntary" (checking only the second field of the line), or e.g. /^ *[0-9]+ +voluntary context switches *$/ (matching against the full line).

Then again, if your system happens to have the GNU version of time as described e.g. in the Debian manpage, you could use the --format option to have it produce the desired format directly.

E.g.

$ /usr/bin/time -f "real %E, user %U, system %S, maxrss %M, major %F, minor %R, vol %w, invol %c" sleep 1
real 0:01.00, user 0.00, system 0.00, maxrss 1872, major 0, minor 86, vol 2, invol 0

or

$ /usr/bin/time --format "real, user, system, maxrss, major, minor, vol, invol\n%E, %U, %S, %M, %F, %R, %w, %c\n" sleep 1
real, user, system, maxrss, major, minor, vol, invol
0:01.00, 0.00, 0.00, 1856, 0, 85, 2, 0

You could do a similar thing with the TIMEFMT variable in zsh (which macOS should have). On a quick look at the manual, it seems to support the same format specifiers as the GNU implementation, so the same format strings might work (but you may want to double-check anyway):

% TIMEFMT="real %E, user %U, system %S, maxrss %M, major %F, minor %R, vol %w, invol %c";  time  sleep 1
real 1.01s, user 0.00s, system 0.00s, maxrss 1408, major 18, minor 196, vol 5, invol 6

or

% TIMEFMT=$'real, user, system, maxrss, major, minor, vol, invol\n%E, %U, %S, %M, %F, %R, %w, %c\n'; time sleep 1
real, user, system, maxrss, major, minor, vol, invol
1.01s, 0.00s, 0.00s, 1408, 0, 226, 0, 10

The OP's looks like macos time which doesn't have a -f but macos does come with zsh whose time keyword also has a customisable output format ($TIMEFMT variable) — Stéphane Chazelas, Commented Jun 11, 2024 at 19:58
I edited my post and added the time output, and the result of the gawk parsing. — jonathannah, Commented Jun 12, 2024 at 6:33
Changing '/voluntary ...' to '/ voluntary ...' fixed the issue I was seeing. Thanks! — jonathannah, Commented Jun 12, 2024 at 6:40

Stéphane Chazelas · Accepted Answer · 2024-06-13 20:27:53Z

Converting to CSV, assuming the output is in:

        Command being timed: "sh -c sleep "5""
        User time (seconds): 0.00
        System time (seconds): 0.00
        Percent of CPU this job got: 0%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.00
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1920
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 176
        Voluntary context switches: 7
        Involuntary context switches: 1
        Swaps: 0
        File system inputs: 88
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

format like that of time -v with the GNU standalone utility (not the time builtin of the GNU shell) may just be a matter of piping to:

perl -MText::CSV -ne '
  if (/^\s*(.*):\s+(.*)/) {
    push @header, $1;
    push @values, $2;
  }
  END {
    $csv = Text::CSV->new;
    $csv->say(*STDOUT, \@header);
    $csv->say(*STDOUT, \@values);
  }'

Which on the sample above gives me:

"Command being timed","User time (seconds)","System time (seconds)","Percent of CPU this job got","Elapsed (wall clock) time (h:mm:ss or m:ss)","Average shared text size (kbytes)","Average unshared data size (kbytes)","Average stack size (kbytes)","Average total size (kbytes)","Maximum resident set size (kbytes)","Average resident set size (kbytes)","Major (requiring I/O) page faults","Minor (reclaiming a frame) page faults","Voluntary context switches","Involuntary context switches",Swaps,"File system inputs","File system outputs","Socket messages sent","Socket messages received","Signals delivered","Page size (bytes)","Exit status"
"""sh -c sleep ""5""""",0.00,0.00,0%,0:05.00,0,0,0,0,1920,0,1,176,7,1,0,88,0,0,0,0,4096,0

If it's more like the format seen on the macos time(1) man page for /usr/bin/time -l -h -p sleep 5:

perl -MText::CSV -ne '
  if (/^\s*(?|(?<v>\d+)\s+(?<h>.*)|(?<h>\S+)\s+(?<v>\S+))/) {
    push @header, $+{h};
    push @values, $+{v};
  }
  END {
    $csv = Text::CSV->new;
    $csv->say(*STDOUT, \@header);
    $csv->say(*STDOUT, \@values);
  }'

Which on the sample at that page gives me:

real,user,sys,"maximum resident set size","average shared memory size","average unshared data size","average unshared stack size","page reclaims","page faults",swaps,"block input operations","block output operations","messages sent","messages received","signals received","voluntary context switches","involuntary context switches","instructions retired","cycles elapsed","peak memory footprint"
5.01,0.00,0.00,0,0,0,0,80,0,0,1,0,0,0,0,3,0,2054316,2445544,241664

For the sample now added to your question:

perl -MText::CSV -ne '
  if (/^(\S+)\s+(real)\s+(\S+)\s+(user)\s+(\S+)\s+(sys)/) {
    push @header, $2,$4,$6;
    push @values, $1,$3,$5;
  } elsif (/^\s*(\d+)\s+(.*)/) {
    push @header, $2;
    push @values, $1;
  }
  END {
    $csv = Text::CSV->new;
    $csv->say(*STDOUT, \@header);
    $csv->say(*STDOUT, \@values);
  }'

Which gives:

real,user,sys,"maximum resident set size","average shared memory size","average unshared data size","average unshared stack size","page reclaims","page faults",swaps,"block input operations","block output operations","messages sent","messages received","signals received","voluntary context switches","involuntary context switches","instructions retired","cycles elapsed","peak memory footprint"
57.03,212.49,16.24,88588288,0,0,0,5531,2,0,0,0,0,0,0,1,3337714,2273379580064,693450611012,87999936

Ed Morton · Accepted Answer · 2024-06-12 13:22:32Z

The problem in your code is that the regexp /voluntary context switches/ also matches the string involuntary context switches, it should be /[[:space:]]voluntary context switches/ or /[^n]voluntary context switches/ or similar so it doesn't match involuntary...:

/[^n]voluntary context switches/ { vol=$1 }
/involuntary context switches/ { invol=$1 }

or reverse the order of the comparisons and add a next:

/involuntary context switches/ { invol=$1; next }
/voluntary context switches/ { vol=$1 }

But I wouldn't bother with regexp comparisons and a bunch of variables for this and instead create an array (tag2val[] below) that maps tags (strings use to identify values) to the associated values then you can just access the values using their tags to do anything you like regarding printing, comparing, re-arranging, etc.

For example using any POSIX awk:

$ cat tst.awk
BEGIN { OFS="," }
NR == 1 {
    for ( i=2; i<=NF; i+=2 ) {
        tag2val[$i] = $(i-1)
    }
    next
}
{
    val = $1
    sub(/^[[:space:]]*[^[:space:]]+[[:space:]]+/,"")
    tag2val[$0] = val
}
END {
    n = split("real,real,user,user,sys,sys,maximum resident set size,maxrss,page reclaims,minor,page faults,major,voluntary context switches,vol,involuntary context switches,invol,instructions retired,instret,cycles elapsed,cycle,peak memory footprint,pkmem",tmp,OFS)

    for ( i=2; i<=n; i+=2 ) {
        hdr = tmp[i]
        printf "%s%s", hdr, (i<n ? OFS : ORS)
    }

    for ( i=2; i<=n; i+=2 ) {
        tag = tmp[i-1]
        val = tag2val[tag]
        printf "%s%s", val, (i<n ? OFS : ORS)
    }
}

$ awk -f tst.awk file
real,user,sys,maxrss,minor,major,vol,invol,instret,cycle,pkmem
57.03,212.49,16.24,88588288,5531,2,1,3337714,2273379580064,693450611012,87999936

With that you can change the output field order just by changing the order in which the tags appear in the split(), print any value by it's tag, compare values against each other or hard-coded values, etc. You can even compare values you don't intend to print by their tags and easily change it to use the output header string as the tag if you prefer (and are happy just discarding the values you don't intend to print):

$ cat tst.awk
BEGIN {
    OFS = ","
    n = split("real,real,user,user,sys,sys,maximum resident set size,maxrss,page reclaims,minor,page faults,major,voluntary context switches,vol,involuntary context switches,invol,instructions retired,instret,cycles elapsed,cycle,peak memory footprint,pkmem",tmp,OFS)
    for ( i=2; i<=n; i+=2 ) {
        inTag = tmp[i-1]
        outTag = tmp[i]
        in2out[inTag] = outTag
        tags[++numTags] = outTag
    }
}
NR == 1 {
    for ( i=2; i<=NF; i+=2 ) {
        tag2val[in2out[$i]] = $(i-1)
    }
    next
}
{
    val = $1
    sub(/^[[:space:]]*[^[:space:]]+[[:space:]]+/,"")
    tag2val[in2out[$0]] = val
}
END {
    for ( i=1; i<=numTags; i++ ) {
        printf "%s%s", tags[i], (i<numTags ? OFS : ORS)
    }

    for ( i=1; i<=numTags; i++ ) {
        printf "%s%s", tag2val[tags[i]], (i<numTags ? OFS : ORS)
    }
}

Kaz · Accepted Answer · 2024-06-29 09:00:58Z

Solution in TXR:

We write a program that works like the external /usr/bin/time utility, taking arguments that are executed as a command, and which prints the CSV format:

$ uname -s
Darwin
$ txr mactime.txr echo foo
foo
0.00,0.00,0.00,1441792,2,167,5,4,3758622,1628194,901696

Code in mactime.txr, which requires TXR 295 or newer due to the use of "?2" (see below).

@(next (open-process  "/usr/bin/time" "?2" (cons "-l" *args*)))
 @real real @user user @sys sys
 @maxrss maximum resident set size
@(skip)
 @minor page reclaims
 @major page faults
@(skip)
 @vol voluntary context switches
 @invol involuntary context switches
 @instret instructions retired
 @cycle cycles elapsed
 @pkmem peak memory footprint
@(output)
@real,@user,@sys,@maxrss,@major,@minor,@vol,@invol,@instret,@cycle,@pkmem
@(end)

Note: this approach assumes that nothing writes to standard error, only /usr/bin/time.

In color:

In TXR 294 or earlier, we need to write a function to open a process in such a way that we capture its standard error rather than standard output as a stream; the "?2" mode syntax is not recognized:

@(do
  (defun capture-stderr (cmd . args)
    (tree-bind (rdfd . wrfd) (pipe)
      (let ((*stderr* (open-fileno wrfd "w")))
        (run cmd args))
      (close wrfd)
      (open-fileno rdfd))))
@(next (capture-stderr "/usr/bin/time" "-l" . *args*))
 @real real @user user @sys sys
 @maxrss maximum resident set size
@(skip)
 @minor page reclaims
 @major page faults
@(skip)
 @vol voluntary context switches
 @invol involuntary context switches
 @instret instructions retired
 @cycle cycles elapsed
 @pkmem peak memory footprint
@(output)
@real,@user,@sys,@maxrss,@major,@minor,@vol,@invol,@instret,@cycle,@pkmem
@(end)

Stack Exchange Network

Getting output of /usr/bin/time in CSV format

4 Answers 4

You must log in to answer this question.

Hot Network Questions

Getting output of /usr/bin/time in CSV format

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions