1

I have a file contains the following data (only sample data is shown. the file will contain maximum 2001 lines)

0001:3002:2018/07/16:12.34.31:ERR 
0002:3002:2018/07/16:12.34.44:ERR 
0003:3002:2018/07/16:12.34.57:ERR 
0004:3002:2018/07/16:12.35.10:ERR 
0005:3002:2018/07/16:12.35.23:ERR 
0006:3002:2018/07/16:12.35.36:ERR 
0007:3002:2018/07/16:12.35.49:ERR 
0008:3002:2018/07/16:12.36.02:ERR 
0009:3002:2018/07/16:12.36.15:ERR

I'll be passing a date say 2018/07/16:12.36.15 to the bash script. I want to read each line from this file and compare the date in the line with the passed date and return the line whose date is greater than passed date.

What I have done so far?

#!/bin/sh

SEARCH_DATE=$1
errorCodeFilePath=/home/.errorfile.log
lines=`cat $errorCodeFilePath`
for line in $lines; do
   errorCodeDate=$(echo $line |grep -Eo '[[:digit:]]{4}/[[:digit:]]{2}/[[:digit:]]{2}:[[:digit:]]{2}.[[:digit:]]{2}.[[:digit:]]{2}');  
   if [ $errorCodeDate -ge $SEARCH_DATE ];
    then
        echo $errorCodeDate
    fi
done

Questions

  1. I'm not sure if the date comparison will work? I'm getting "error integer expression expected". I literally have no idea how to write Bash scripts and this is my first try at all.

  2. How to make this date comparison works? Also after the date comparison work I need to get the digits between first : and the second : for all the matching lines.

2
  • 2
    In order to use integer comparison with dates you would first need to convert them into something like seconds since epoch. Regardless I encourage you to use something like Awk or Perl rather than shell scripting for this task - see for example Why is using a shell loop to process text considered bad practice? Commented Jul 16, 2018 at 14:24
  • Thanks for the valuable suggestion. As this is my first time(or first day) in bash scripting i don't know what is the good practice and bad practice in bash scripting. but i'll surly look into this. thanks
    – Sharon
    Commented Jul 17, 2018 at 5:04

5 Answers 5

2

Your script reads the whole file into a variable and then iterates over the value of that variable. This has three issues:

  1. In the most general case, one may not know the size of the input file, which means that under some circumstances, the variable may become very big.
  2. Looping over the unquoted value of the variable will rely on the shell splitting the data on whitespaces (spaces, tabs and newlines). If the data contains any whitespaces apart from newlines, the loop will probably do the wrong thing.
  3. The shell will perform filename globbing on the values of the unquoted variable before looping over it. This means that if the data contains globbing patterns, such as * or [...], then these will be matched against existing filenames.

This answer uses the fact that the timestamps used are sane in the sense that they later timestamps sort after earlier ones (at least in the POSIX locale).

#!/bin/bash

while IFS= read -r line; do
    timestamp=${line%:*}            # Remove ":ERR" at the end
    timestamp=${timestamp#*:*:}     # Remove numbers from start ("0001:3002:")
    if [[ "$timestamp" > "$1" ]]; then
        # According to the current locale, the timestamp in "$timestamp"
        # sorts after the timestamp in "$1".
        printf "Greater: %s\n" "$line"
    fi
done <file

This script will take a timestamp on the same format that's in the file as its only argument. It iterates over the contents of the file file and for each line, it parses out the timestamp and compares it with the timestamp on the command line. The comparison is made using the > operator in bash and will be true if the timestamp in the file sorts (lexicographically) after the given timestamp in the current locale. If the comparison is true, the line from the file is printed.

The two separate substitutions to parse out the timestamp from the line by deleting parts of the end and beginning of the line could be replaced by

timestamp=$( cut -d ':' -f 3,4 <<<"$line" )

but this would run slower as it's calling an external utility.

Testing:

$ bash script.sh '2018/07/16:12.36.00'
Greater: 0008:3002:2018/07/16:12.36.02:ERR
Greater: 0009:3002:2018/07/16:12.36.15:ERR

If you want to output just the timestamp from the file rather than the original line, change "$line" to "$timestamp" in the printf command.

In that case, you may also speed up things by doing the looping like this:

#!/bin/bash

cut -d ':' -f 3,4 file |
while IFS= read -r timestamp; do
    if [[ "$timestamp" > "$1" ]]; then
        # According to the current locale, the timestamp in "$timestamp"
        # sorts after the timestamp in "$1".
        printf "Greater: %s\n" "$timestamp"
    fi
done

Here, we use cut to get the 3rd and 4th :-delimited columns from the file (the timestamp), which means we don't have to do any parsing of the original lines.

Related:

5
  • when i pass current datetime, it still return 'Greater' even if all the records in the file having date less than current date
    – Sharon
    Commented Jul 17, 2018 at 12:37
  • @Sharon Are you using the same timestamp format on the command line as you have in the file? In the question you mention 2018/07/16:12.36.15. I have tested the above with your data and with 2018/07/16:12.36.00 and it seems to work.
    – Kusalananda
    Commented Jul 17, 2018 at 12:41
  • 1
    Extremely sorry. my passing mechanism convert params to base 64. Fixed and is working now. Thanks
    – Sharon
    Commented Jul 17, 2018 at 12:52
  • @Sharon I saw that you now only want to get the first number of the line. I'll modify my answer as soon as I get to a computer.
    – Kusalananda
    Commented Jul 17, 2018 at 13:04
  • i just edited my question. actually i need the digits between first : and second : and i extract it using this errorcode=${line#*:} errorcode=${errorcode%:*:*:*}
    – Sharon
    Commented Jul 17, 2018 at 13:08
2

Your idea is right but you could fix a few things to make the script work as expected.

  1. Firstly using cat to a file and storing in a variable and looping over is an anti-pattern at best. The approach would break strings by whitespace. Use file redirection with a while loop instead.
  2. Always quote the shell variables to preserve the variable content and to prevent from undergoing word-splitting as mentioned in the previous point
  3. Instead of grep, use the native regex support of bash to extract the date string for EPOCH conversion
  4. By default bash does not provide a way to compare date strings, you need to convert into the equivalent EPOCH values and do an integer comparison

So putting this together, without using any third party tools and with just shell internals. Needs the date command from GNU utils to use the -d flag and may not work on the native date from *BSD machines.

#!/usr/bin/env bash   

errorCodeFilePath="/home/.errorfile.log"

re='[0-9]+/[0-9]+/[0-9]+:[0-9]+\.[0-9]+\.[0-9]+'

convDateString() {
    day="${1##*:}"
    time="${1%%:*}"
    printf '%d' "$(date -d"$time ${day//./:}" +%s)"
}

while IFS= read -r line; do
    inputArg="$1"
    inputEPOCH="$(convDateString "${inputArg}")"
    if [[ $line =~ $re ]]; then
        lineEPOCH="$(convDateString "${BASH_REMATCH[*]}")"
        if [ "$lineEPOCH" -gt "$inputEPOCH" ]; then
            echo "${BASH_REMATCH[@]}" is greater
        fi
    fi
done<"$errorCodeFilePath"

Testing your file on a sample input in question as

$ bash script.sh "2018/07/16:12.36.00"
2018/07/16:12.36.02 is greater
2018/07/16:12.36.15 is greater

With all said, you should consider reading Why is using a shell loop to process text considered bad practice?. Because text processing with shell is slow compared to other tools which are meant for dedicated file processing.

1
  • each loop took close to half a second to print the match. so if i have 2000 matching line, it takes a lot of time. any optimization suggestions?
    – Sharon
    Commented Jul 17, 2018 at 12:31
0

Try this,

#!/bin/sh

SEARCH_DATE="$1"
errorCodeFilePath=/home/nagios/temp/test1
lines=`cat $errorCodeFilePath`
for line in $lines; do
   errorCodeDate=$(echo $line |grep -Eo '[[:digit:]]{4}/[[:digit:]]{2}/[[:digit:]]{2}:[[:digit:]]{2}.[[:digit:]]{2}.[[:digit:]]{2}');
if [ $(date -d "`echo $errorCodeDate| tr ':' ' '| tr '.' ':'`" +%s) -ge $(date -d "`echo $SEARCH_DATE| tr ':' ' '| tr '.' ':'`" +%s) ];
    then
        echo $errorCodeDate
    fi
done
1
  • Why do you store the whole file in a variable (you don't know how big it is)? You are also using unquoted expansions throughout.
    – Kusalananda
    Commented Jul 16, 2018 at 16:39
0

Your date formats happen to sort lexically the same as chronologically (at least in the C locale), so it's just a matter of doing string comparisons here:

#! /bin/sh -
# Usage: that-script <YYYY/MM/DD:HH.MM.SS> [<file> [<file>...]]

search_date=${1?Please specify the cut off date}
shift

LC_ALL=C exec awk -v min="$search_date" \
                  -v date_field=3 \
                  -v time_field=4 \
                  -F : -- '$date_field FS $time_field > min' "$@"

Here the input can be given either on stdin or as file arguments, but beware that awk treats arguments in the var=value format as variable assignments and - as meaning stdin, so if you have files named like that, you'll want to prefix the file names with ./.

For instance, call that script as

that-script 2018/07/16:12.36.15 ./my=file.txd

or

that-script 2018/07/16:12.36.15 < my=file.txt

On the other hand, that means you can also call it as:

that-script 2018/07/16:12.36.15 date_field=5 time_field=6 file.txt

To process some input where the date and time are in fields other than the 3rd and 4th.

In any case, you don't want to use a shell loop to process text.

1
  • no space in the shebang ''bash: that-script: command not found'
    – user380915
    Commented Nov 9, 2019 at 4:15
0

If you want to iterate lines with for you need to set the IFS to newline. This will be slightly faster when a while loop.

#!/bin/bash

IFS=$'\n'
for a in $(<file.txt); do
    [[ $1:ERR < ${a#*:*:} ]] && echo "$a"
done
$ ./script.sh 2018/07/16:12.35.10

(awk version)

#!/usr/bin/awk -bf

BEGIN { FS=OFS=":" } {
    if (d < $3 FS $4) { print $0 }
}
$ ./script.awk -vd=2018/07/16:12.35.10 file.txt

If you already have a date that you know exists and just want to print the remaining lines, you can sort the file by date,time and use grep -A to get the context after the matched line. tail +2 will allow the output to start on line two, effectively removing the matched line from the output.

$ grep < <(sort -t : -k 3,4 < file.txt) \
    -A2000 -Fe '2018/07/16:12.35.10' | tail +2 | sort -n

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.