standard deviation using awk [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? Add details and clarify the problem by editing this post.

Closed 9 years ago.

I am using below command to get standard deviation of file A names

   1         2         3        avg 
23.3107  20.0372    21.7236   21.6905

awk '{x[NR]=$0 ;} END{a=$4; for (i in x){ss += (x[i]-a)^2} sd = sqrt(ss/n); print $5 = sd}'

getting fatal: division by zero attempted

modified above command as

awk '{x[NR]=$0 ;} END{a=$4; for (i in x){if (a == 0) $6 ="N/A"; else ss += (x[i]-a)^2} sd = sqrt(ss/n); print $5 = sd}'

but still error persisted? thank you cas for understanding my quest?

Please edit your post to 1) get rid of the "hi" at the beginning and "any help would be appreciated" at the end 2) to do some formatting (indent code by 4 spaces. 3) add a question (sentence with question mark). Now it reads like ou use the file command to get standard deviations, which is unlikely. — Anthon, Commented Apr 25, 2016 at 18:37
your awk script makes little or no sense. Why are you putting all input lines into an array? Why are you setting a=$5 when there are only 4 fields - and, worse, doing it in the END block where there aren't any input fields anyway (because you've already processed them in the main block). Why are you trying to process the array of input lines in mathematical functions? those array elements don't contain single floating-point numbers, they are strings containing entire lines (with all 4 fields). Finally n isn't defined anywhere, so ss/n is always going to be a division by 0 error. — cas, Commented Apr 25, 2016 at 23:36

Guido · Accepted Answer · 2016-04-25 18:56:05Z

2

Where is "n"?

You write:

sd = sqrt(ss/n)

but where in your code did you assign the variable "n"? The way awk sees it, "n" is zero.

Also, where is column 5 in a=$5 (and, third issue, why is this assignment in the END section)? Your example contains only 4 columns.

answered Apr 25, 2016 at 18:56

Guido

4,2241 gold badge15 silver badges22 bronze badges

Add a comment |

cas · Accepted Answer · 2016-04-26 01:06:25Z

Did you mean to do something like this? It's the only way i can think of to make sense of your script.

awk -v OFS=$'\t' '
FNR == 1 { $5 = "sdev" ; print }

FNR > 1  { a = $4    # field 4 is 'avg'
           n = NF-1  # exclude the 'avg' field from the ss calculations.

           for (i=1; i <= n; i++) { ss += ($i - a)^2 } 

           $5 = sqrt(ss/n)
           print
         }' inputfile

Note: $i on the for line refers NOT to the value of i, but to the input field numbered i - i.e. it loops through $1, $2, and $3. This may not be obvious to shell or perl users where (scalar) variables are normally prefixed by $.

NF is the number of fields on a line, and FNR is the record (line) number of the current input file (so this awk script supports multiple input files, each with their own header line. If there's only ever going to be one input file at a time, you could use NR instead of FNR).

Sample output:

1       2       3       avg     sdev
23.3107 20.0372 21.7236 21.6905 1.33661

Here's another version which works with any number of fields per line. It assumes that the last field of a line contains the average of all the previous fields on that line.

$NF refers to the value of the last field (i.e. the 'avg') and $new refers to the (last field + 1), i.e. assigning a value to it adds a new field to the end of the line.

awk -v OFS=$'\t' '
FNR == 1 { new = NF+1   # number of new field to add
           $new = "sdev"
           print 
         }

FNR > 1  { a = $NF   # last field is 'avg'
           n = NF-1  # exclude the 'avg' field from the ss calculations.

           for (i=1; i <= n; i++) { ss += ($i - a)^2 } 

           $new = sqrt(ss/n)
           print
         }' inputfile

Sample ouput with 5 values plus an average on each input line:

1       2       3       4       5       avg     sdev
23.3107 20.0372 21.7236 20.5328 21.2016 21.3611 1.13107

Stack Exchange Network

standard deviation using awk [closed]

2 Answers 2

Where is "n"?

Hot Network Questions

standard deviation using awk [closed]

2 Answers 2

Where is "n"?

Related

Hot Network Questions