I wrote this simple code in 2003, and I haven't modified it since because it just works the way I need it to.
It reads in a list of numeric values and displays an ASCII histogram of the data. If the numbers are in a file, the file should be a simple list of numbers, with one number per line.
Here is an example of usage, where I use Perl to generate a stream of random
integers and pipe it into the hist program:
perl -E 'say int rand 99 for 1..666' | hist
Number of samples in population: 666
Value range: 0 - 98
Mean value: 49
Median value: 51
10% value: 10
90% value: 89
< 0: 0
0 - 9: 54
9 - 18: 56
18 - 27: 59
27 - 36: 60
36 - 45: 66
45 - 54: 60
54 - 63: 65
63 - 72: 70
72 - 81: 60
81 - 90: 51
>= 90: 65
To see the usage, run:
hist -h
Here is the hist code:
#!/usr/bin/env perl
# Create a histogram of numeric values.
#
# Input is a file (or a pipe) consisting of a single column of values.
# Output is STDOUT.
#
# Usage: hist file
# Example: hist data.txt
use warnings;
use strict;
use List::Util qw(sum);
my $nbins = 10; # Number of bins
my @sorted;
my @freq;
my $nsamp;
my $maxval;
my $minval;
my $upper_lim;
my $lower_lim;
my $user_upper_lim;
my $user_lower_lim;
my $binsize;
parse_args();
read_input();
create_histogram();
print_histogram();
# Check for any unexpected error conditions:
if ($!) {
warn "Error message = $!\n\n";
exit 1;
}
else {
exit 0;
}
########################################################################
########################################################################
sub read_input {
my @raw;
while (<>) {
s/\s+//g; # Remove all whitespace
unless (check_if_numeric($_)) {
die "Error: Non-numeric data '$_' found. " .
"Can't calculate histogram.";
}
push @raw, $_;
}
unless (@raw) {
die "Error: No data. Can't calculate histogram.\n";
}
@sorted = sort {$a <=> $b} @raw;
$nsamp = scalar @sorted; # Number of elements in array.
$maxval = $sorted[-1]; # Last element in sorted array
# is the maximum value.
$minval = $sorted[0]; # First element is the minimum.
$lower_lim = int( (defined $user_lower_lim) ? $user_lower_lim : $minval );
$upper_lim = int( (defined $user_upper_lim) ? $user_upper_lim : $maxval );
if ($upper_lim < ($lower_lim + 10)) {
$upper_lim = $lower_lim + 10;
}
if ($lower_lim > $maxval) {
die "Error: Lower limit must be less than $maxval. " .
"Can't calculate histogram.";
}
$binsize = int(($upper_lim-$lower_lim)/$nbins);
}
sub create_histogram {
my $absmin = -9e99;
my $absmax = 9e99;
my $lo = $absmin;
my $hi = $lower_lim;
my $cnt = 0;
for (@sorted) {
until ( ($lo <= $_) && ($_ < $hi) ) {
push @freq, $cnt;
$cnt = 0;
$lo = $hi;
if ($hi < $upper_lim) {
$hi += $binsize;
}
else {
$hi = $absmax;
}
}
$cnt++;
}
push @freq, $cnt;
}
sub check_if_numeric {
# Check to see if a value is numeric.
# This nasty regular expression belongs in its own sub.
# From Perl Cookbook, sec. 2.1
my $value = shift;
if ($value =~ /^-?(?:\d+(?:\.\d*)?|\.\d+)$/) {
return 1; # value is numeric
}
else {
return 0; # value contains non-numeric characters
}
}
sub print_histogram {
my $lower = $lower_lim;
my $nfreq = scalar @freq; # Number of elements in array.
my $mid = int($nsamp/2) - 1;
my $pct10 = int($nsamp * 0.1) - 1;
my $pct90 = int($nsamp * 0.9) - 1;
my $median;
if (($nsamp % 2) == 0) { # Even number of samples
$median = int( ($sorted[$mid] + $sorted[$mid+1])/2 );
}
else { # Odd number of samples
$median = int($sorted[$mid+1]);
}
print "\n";
print "\tNumber of samples in population: $nsamp\n";
print "\tValue range: $minval - $maxval\n";
print "\tMean value: ", int(sum(@sorted)/$nsamp), "\n";
print "\tMedian value: $median\n";
print "\t10% value: $sorted[$pct10]\n";
print "\t90% value: $sorted[$pct90]\n";
print "\n";
my $range = sprintf ' < %d', $lower;
printf "%20s: %6d\n", $range, $freq[0];
for (my $i=1; $i<($nfreq-1); $i++) {
$range = sprintf '%d - %d', $lower, ($lower + $binsize);
printf "%20s: %6d\n", $range, $freq[$i];
$lower = $lower + $binsize;
}
$range = sprintf ' >= %d', $lower;
printf "%20s: %6d\n", $range, $freq[$nfreq-1];
print "\n";
if (sum(@freq) != $nsamp) {
die "Error: Histogram not calculated properly. " .
"Number of samples ($nsamp) should be equal to " .
"sum of frequencies (". sum(@freq) . ").\n";
}
}
sub parse_args {
use Getopt::Std;
use vars qw($opt_h $opt_l $opt_u);
unless (getopts('hl:u:')) {
print_usage();
die "Error: Unsupported command option.";
}
if ($opt_h) {
print_usage();
exit 1;
}
# The "defined" check is necessary since the value "0" is
# a valid value. Perl treats "0" as a special value.
if (defined $opt_l) {
$user_lower_lim = $opt_l;
unless (check_if_numeric($user_lower_lim)) {
print_usage();
die "Error: Lower limit must not contain non-numerics.";
}
}
if (defined $opt_u) {
$user_upper_lim = $opt_u;
unless (check_if_numeric($user_upper_lim)) {
print_usage();
die "Error: Upper limit must not contain non-numerics.";
}
}
}
sub print_usage {
warn <<"EOF";
USAGE
$0 [-h] [-u upper_lim] [-l lower_lim] [file ...]
DESCRIPTION
Create a histogram from a column of numeric data values.
The histogram is printed to STDOUT. The input data must be formatted as
a single column of numeric data. By default, the histogram is auto-scaled
based on the minimum and maximum values of the input data. The histogram
can be rescaled by the user.
OPTIONS
-h Print this help message
-u upper_lim User-defined upper limit
-l lower_lim User-defined lower limit (lower-case letter L)
OPERANDS
file A path name of a file containing numerical data.
If no file operands are specified, the standard input
will be used.
EXAMPLES
$0 data.txt
awk '{print \$4}' file.txt | $0
$0 -l 0.0 -u 300 data.txt
$0 -h
EXIT STATUS
0 Successful completion
>0 An error occurred
NOTES
This program performs some rounding off to integer values
to simplify printout.
EOF
}
If I were to re-write this in 2026, I would use a different style in a number of places. I just thought it would be fun to revisit some code from a different era. Feel free to offer any type of feedback.