Help with awk / sed shell script

Question

I have to make a script using the info in the following table (fake info)

AnimalNumber,DOB,Gender,Breed,Date-moved-in
IE161289240602,04/02/2010,M,AAX,20/07/2011,
IE141424490333,13/01/2009,M,LMX,21/09/2010,
IE151424420395,19/01/2007,F,LMX,20/08/2010,

basically I need to list only the DOB and animalnumber but the animal number should be broken up like this

IE161289240602 should be 1612892 4 0602

and also only the month and year of birth should be listed so something like this for the first line

Feb 2010 1412892 4 0602

Any ideas on how to do this ? I am afraid its a bit outside my skill set

Costas · Accepted Answer · 2015-10-05 12:18:52Z

2

For GNU awk

awk -F, '
    NR>1{
        sub("..","")                   #remove first two letters (mean IE)
        d=""
        for(i=split($2,D,"/");i>0;i--) #format 2nd field into `YY MM DD` 
            d=d D[i] " "
        print strftime("%b %Y",mktime(d 0" "0" "0)),gensub("[0-9]"," & ",8,$1)
    }' file

mktime produces timestamp in seconds from EPOCH from string in format YYYY MM DD HH MM SS
strftime converts timestamp in desired format (in the case %b %Y)
gensub substitutes 8th digit([0-9]) in 1st field($1) by itself(&) with trailing spaces

We see just string formatting so can use sed:

sed -r '
    1d
    s/./ & /10
    s|(../)(../)|\2\1|
    s/..([^,]*),([^,]*).*/date -d "\2" +"%b %Y \1"/e
    ' file

or for sed without e command

sed '
    1d
    s/./ & /10
    s|\(../\)\(../\)|\2\1|
    s/..\([^,]*\),\([^,]*\).*/date -d "\2" +"%b %Y \1"/
    ' file | bash

or

sed '
    s/./ & /10
    s/../+"%b %Y /
    s/,/" -d /
    s|\(../\)\(../\)|\2\1|
    s/,/\n/
    1!P
    d' file | xargs -n3 date

edited Oct 5, 2015 at 12:18

answered Oct 5, 2015 at 9:24

Costas

15.1k24 silver badges38 bronze badges

can i get a code explanation on this please ?

johndoe12345
– johndoe12345

2015-10-05 09:44:17 +00:00
Commented Oct 5, 2015 at 9:44
awsome thanks and how can i sort based on last 4 digist ?

johndoe12345
– johndoe12345

2015-10-05 10:04:47 +00:00
Commented Oct 5, 2015 at 10:04
1

Simplest way is to pass output via sort : ...}' file | sort -k5n

Costas
– Costas

2015-10-05 10:39:46 +00:00
Commented Oct 5, 2015 at 10:39
Or save data in array then print it sorted: …d=d D[i] " " ; S[$1%10000]=strftime("%b %Y", mktime(d 0" "0" "0)) " " gensub("[0-9]"," & ", 8, $1)}END{asorti(S , N);for(i in N)print S[N[i]]}' file

Costas
– Costas

2015-10-05 10:56:46 +00:00
Commented Oct 5, 2015 at 10:56

Add a comment |

Sobrique · Accepted Answer · 2015-10-05 10:56:50Z

I'd be thinking "use perl":

#!/usr/bin/env perl 
use strict;
use warnings;

use Time::Piece;

#get the column names out of the file. We remove the trailing linefeed. 
#<> is the magic input file handle, so it reads from STDIN or files
#specified on command line, e.g. myscript.pl file_to_process.csv
my @headers = split ( /,/, <> =~ s/\n//r );

while ( <> ) { 
    chomp; #strip linefeed. 
    my %stuff;
    #this makes use of the fact we know the headers already
    #so we can map from the line into named columns. 
    @stuff{@headers} = split /,/; #read comma sep into hash

    #DOB:
    #take date, parse it into a unix time, then use strftime to output "Mon year"
    print Time::Piece -> strptime ( $stuff{'DOB'}, "%d/%m/%Y" ) -> strftime("%b %Y");
    #regex match against AnimalNumber, and then join it with space separation. 
    print "\t"; #separator
    print join ( " ", $stuff{'AnimalNumber'} =~ m/(\d+)(\d)(\d{4})/ );
    print "\n";
}

This outputs:

Feb 2010    1612892 4 0602
Jan 2009    1414244 9 0333
Jan 2007    1514244 2 0395

This works by:

Reading <> which is the magic file handle - takes input from pipes or filenames.
We read the first line, and turn that into an array of @headers.
we iterate each additional line, and map the comma separated values into a hash (called %stuff).
Extract DOB from %stuff - and process it using strptime/strftime into a date as required.
extract AnimalNumber from %stuff and use a regex pattern to extract the numbers you're after
because we use multiple capture groups, the captured elements are returned as a list, which we can then stick together (with a delimiter of space) using join.

Edit: Because you're looking at sorting - you'll have to read the whole lot first into memory (which the above doesn't for efficiency reasons).

However:

#!/usr/bin/env perl 
use strict;
use warnings;

use Data::Dumper;
use Time::Piece;

my @headers = split( /,/, <> =~ s/\n//r );

my @records;

while (<>) {
    chomp;    #strip linefeed.
    my %stuff;

    #this makes use of the fact we know the headers already
    #so we can map from the line into named columns.
    @stuff{@headers} = split /,/;    #read comma sep into hash

 #DOB:
 #take date, parse it into a unix time, then use strftime to output "Mon year"
    $stuff{'formtime'} =
        Time::Piece->strptime( $stuff{'DOB'}, "%d/%m/%Y" )->strftime("%b %Y");

    #regex match against AnimalNumber, and then join it with space separation.
    #separator
    $stuff{'number_arr'} = [ $stuff{'AnimalNumber'} =~ m/(\d+)(\d)(\d{4})/ ];

    push( @records, \%stuff );
}

foreach
    my $record ( sort { $b->{'number_arr'}->[2] <=> $a->{'number_arr'}->[2] }
    @records )
{
    print join( "\t",
        $record->{'formtime'}, join( " ", @{ $record->{'number_arr'} } ),
        ),
        "\n";
}

Similar to the above, but we pre-process each record into an array of hashes, and then use sort the output before printing - based on the "key" field - last group of 4 digits in number_arr.

terdon · Accepted Answer · 2015-10-05 11:14:31Z

Another Perl way, using GNU date:

$ perl -F, -lane 'next if $.==1; $F[0]=~s/IE(\d{7})(\d)(\d{4})/$1 $2 $3/; 
                  $F[1]=~s#(..).(..).(.*)#$2/$1/$3#; 
                  chomp($d=`date -d "$F[1]" +"%b %Y"`); 
                  print "$d $F[0]"' file
Feb 2010 1612892 4 0602
Jan 2009 1414244 9 0333
Jan 2007 1514244 2 0395

The -a makes perl act like awk, splitting its input line on the character given by -F and saving it as the array @F. The $F[0]=~s/IE... removes the IE from the first field and splits the rest as requested. The $F[1]=~s#... will reformat the date into MM/DD/YYYY. The chomp(... will run GNU date, asking it to return Mon YYYY format (the chomp removes trailing newlines) which is saved as $d. Finally, $d and the modified 1st field are printed.

Stack Exchange Network

Help with awk / sed shell script

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

Help with awk / sed shell script

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions