I have a list of data, like a CSV but some lines are missing a value. I'd like to generate a value for the missing line based on the lines before and after using linux shell script.
Take this table for instance.
line | person | age |
---|---|---|
1 | Adam | 45 |
2 | Bob | 50 |
3 | Cindy | 47 |
4 | * | # |
5 | Ed | 49 |
What I'd like to do is fill in the "*" in line 4 with "Cindy:Ed" (a concatenation of the nearest, valid data in each direction in column B with a ":" delimiter) and the "#" with 48 (the average of 47 and 49, the nearest valid data points in each direction from column C).
Output:
line | person | age |
---|---|---|
1 | Adam | 45 |
2 | Bob | 50 |
3 | Cindy | 47 |
4 | Cindy:Ed | 48 |
5 | Ed | 49 |
My data is formatted as a space-delimited text file of arbitrary row count. All rows are three columns.
While I know my way around a For loop and grep etc., I'm at a loss as to how I'd handle this in vanilla linux shell script.
My guess is to make an initial pass to find the lines that have asterisks and hashes. Then make a second pass to replace the asterisks with (awk '{print $2}'):(awk '{print $2}') of the lines before and after, respectively.
If the missing data is on the first or last line, I'm happy to leave it as is. If missing data is on consecutive lines, I'm ok with setting all missing lines to the same "Cindy:Ed" and same average. It'd be even cooler if I could set "Cindy:Ed:1" and Cindy:Ed:2" etc.
An accurate example of worst case scenario raw input: (it's a traceroute with added "#" for the missing latency)
1 192.168.200.2 1
2 192.168.200.1 1
3 10.10.10.1 1
4 11.22.33.44 2
5 11.22.33.55 5
6 * #
7 11.22.44.66 9
8 * #
9 * #
10 8.8.8.0 25
11 * #
12 * #
13 * #
What I'd like:
1 192.168.200.2 1
2 192.168.200.1 1
3 10.10.10.1 1
4 11.22.33.44 2
5 11.22.33.55 5
6 11.22.33.55:11.22.44.66 7
7 11.22.44.66 9
8 11.22.44.66:8.8.8.0 17
9 11.22.44.66:8.8.8.0 17
10 8.8.8.0 25
11 * #
12 * #
13 * #
*
for the 2nd field?