Creating two output files from two formatted files

Question

I've been working on a project to take data from csv files, re-format it, and display it as charts. I'm working for a company over the summer and when I leave, I would like my code to be readable/editable by whoever takes over in the future.

This code reads in two files, formats them, and creates two output files. It takes in one command line argument as a filepath to the source folder for the data. The formatting just counts the number of occurrences of a word and increments a corresponding value in a 2D array.

Would you understand what this is doing if I hadn't given an explanation? Should I change anything about it to make it more readable?

#!/usr/bin/perl

use strict;
use warnings;

my $src = $ARGV[0];
my %files = (
    'LM' => [$src . '\raw\casesByCategory_LM.csv', $src . '\casesByCategoryLM.csv'],
    'TM' => [$src . '\raw\casesByCategory_TM.csv', $src . '\casesByCategoryTM.csv'],
);

foreach my $file (keys %files) {
    my @categories = (
        ['RMA Request', 0],
        ['Product Info Request', 0],
        ['Dealer Locator (New or Resupply)', 0],
        ['Order Request', 0],
        ['Order Status', 0],
        ['Wireless Activation/Deactivation', 0],
        ['Pricing', 0],
        ['Account Setup/Registration', 0],
        ['Other', 0],
    );
    my $total = 0;
    open (DATA, $files{$file}[0]) or die $!;

        # Remove header
        <DATA>;
        my $match = 0;
        while (my $line = <DATA>) {
            # Remove \n character
            chomp $line;
            # Search through list of categories to find a match
            if ($line =~ /^"(.+)"$/) {
                foreach my $cat (@categories) {
                    if ($1 eq $cat->[0]) {
                        $cat->[1]++;
                        $match = 1;
                    }
                }
            }
            # Add case to 'Other' if no match was found
            if (not $match) {
                $categories[8]->[1]++;
            } else {
                $match = 0;
            }
            $total++;
        }

    close (DATA);

    # Write organized data to new CSV file for chart
    open (OUTPUT,'>', $files{$file}[1]) or die $!;

        print OUTPUT "Total: $total\n";
        foreach my $cat (@categories) {
            print OUTPUT "$cat->[0],$cat->[1]\n";
        }

    close (OUTPUT);
}

I actually wouldn't have, but I was asked to. Gave me a reason to learn a new language, although this may be the only time I use it! — Ryan McClure
– Ryan McClure, Commented Aug 1, 2014 at 17:46

rolfl · Accepted Answer · 2014-08-01 18:37:24Z

All in all I can understand it OK, it reads fairly well. There are a few structures I would recommend you change.... for a start, I would separate the category definitions from the counters, as it would be easier to maintain:

Note, I am defining @categories as a simple array, not a nested array here.... so it is a different thing from yours:

my @categories = ('RMA Request', 'Product Info Request', 'Dealer Locator (New or Resupply)',
                  'Order Request', 'Order Status', 'Wireless Activation/Deactivation',
                  'Pricing', 'Account Setup/Registration', 'Other');

my %catcounts;
foreach (@categories) {
    $catcounts{$_} = 0;
}

OK, this does three things, it makes adding/changing the categories easy, and it separates the count setup from the category setup. It also sets up a hash fro the counter, so you don't need to loop through them all but can do it by key, so the code:

        if ($line =~ /^"(.+)"$/) {
            foreach my $cat (@categories) {
                if ($1 eq $cat->[0]) {
                    $cat->[1]++;
                    $match = 1;
                }
            }
        }
        # Add case to 'Other' if no match was found
        if (not $match) {
            $categories[8]->[1]++;
        } else {
            $match = 0;
        }

can become:

        if ($line =~ /^"(.+)"$/ && defined $catcount{$1}) {
            $catcount{$1}++;
        } else {
            $catcount{'Other'}++;
        }

This will be much faster because it uses the hash lookup instead of the array loop to find the category.

you will need to adapt the output to loop through the @categories, but print out the %catcount

Thanks for the suggestions, I modified the code accordingly and it works great. I had originally used a hash to store the categories and count but couldn't think of a way to read them back out in order. This is what I was looking for. — Ryan McClure
– Ryan McClure, Commented Aug 1, 2014 at 18:55

janos · Accepted Answer · 2014-08-03 19:16:26Z

Since you handle all the file I/O with open(X, $path) || die, execution will stop immediately. I'm not sure if that's fully intended, but maybe it is. Otherwise, this could be a more graceful alternative:

if (open(my $fh, $path)) {
    <$fh>;
    while (<$fh>) {
        if (/^"(.+)"$/) {
            // ...
        }
    }
    close($fh);
}

This segment demonstrates some other improvements:

The form open(NAME, $path) is not recommended, it's better to use a proper variable.
In a while (<$fh>) loop, each line read is stored in the $_ special variable. The if (/.../) statement can use it too.

Stack Exchange Network

Creating two output files from two formatted files

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Creating two output files from two formatted files

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions