I'd be thinking "use perl":
#!/usr/bin/env perl
use strict;
use warnings;
use Time::Piece;
#get the column names out of the file. We remove the trailing linefeed.
#<> is the magic input file handle, so it reads from STDIN or files
#specified on command line, e.g. myscript.pl file_to_process.csv
my @headers = split ( /,/, <> =~ s/\n//r );
while ( <> ) {
chomp; #strip linefeed.
my %stuff;
#this makes use of the fact we know the headers already
#so we can map from the line into named columns.
@stuff{@headers} = split /,/; #read comma sep into hash
#DOB:
#take date, parse it into a unix time, then use strftime to output "Mon year"
print Time::Piece -> strptime ( $stuff{'DOB'}, "%d/%m/%Y" ) -> strftime("%b %Y");
#regex match against AnimalNumber, and then join it with space separation.
print "\t"; #separator
print join ( " ", $stuff{'AnimalNumber'} =~ m/(\d+)(\d)(\d{4})/ );
print "\n";
}
This outputs:
Feb 2010 1612892 4 0602
Jan 2009 1414244 9 0333
Jan 2007 1514244 2 0395
This works by:
- Reading
<> which is the magic file handle - takes input from pipes or filenames.
- We read the first line, and turn that into an array of
@headers.
- we iterate each additional line, and map the comma separated values into a hash (called
%stuff).
- Extract
DOB from %stuff - and process it using strptime/strftime into a date as required.
- extract
AnimalNumber from %stuff and use a regex pattern to extract the numbers you're after
- because we use multiple capture groups, the captured elements are returned as a list, which we can then stick together (with a delimiter of space) using
join.
Edit: Because you're looking at sorting - you'll have to read the whole lot first into memory (which the above doesn't for efficiency reasons).
However:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
my @headers = split( /,/, <> =~ s/\n//r );
my @records;
while (<>) {
chomp; #strip linefeed.
my %stuff;
#this makes use of the fact we know the headers already
#so we can map from the line into named columns.
@stuff{@headers} = split /,/; #read comma sep into hash
#DOB:
#take date, parse it into a unix time, then use strftime to output "Mon year"
$stuff{'formtime'} =
Time::Piece->strptime( $stuff{'DOB'}, "%d/%m/%Y" )->strftime("%b %Y");
#regex match against AnimalNumber, and then join it with space separation.
#separator
$stuff{'number_arr'} = [ $stuff{'AnimalNumber'} =~ m/(\d+)(\d)(\d{4})/ ];
push( @records, \%stuff );
}
foreach
my $record ( sort { $b->{'number_arr'}->[2] <=> $a->{'number_arr'}->[2] }
@records )
{
print join( "\t",
$record->{'formtime'}, join( " ", @{ $record->{'number_arr'} } ),
),
"\n";
}
Similar to the above, but we pre-process each record into an array of hashes, and then use sort the output before printing - based on the "key" field - last group of 4 digits in number_arr.