#!/bin/awk
BEGIN {
while(getline var < compareTo > 0)
{
orderIds[var]=var;
}
}
{
if(orderIds[$0] == "")
{
print $0;
}
}
Running as
awk -v compareTo="ids.log.remote" -f sample.awk ids.log.local
This is working, but instead of using associative arrays ( like HashMap ), is there anything like a HashSet in awk?
I got the timings
bash-3.2$ time grep -xFvf ids.log.local ids.log.remote > /dev/null
real 0m0.130s
user 0m0.127s
sys 0m0.002s
bash-3.2$ time grep -xFvf ids.log.local ids.log.remote > /dev/null
real 0m0.126s
user 0m0.125s
sys 0m0.000s
bash-3.2$ time grep -xFvf ids.log.local ids.log.remote > /dev/null
real 0m0.131s
user 0m0.128s
sys 0m0.002s
bash-3.2$ time awk 'NR == FNR {
orderIds[$0]; next
}
!($0 in orderIds)
' ids.log.local ids.log.remote > /dev/null
real 0m0.053s
user 0m0.051s
sys 0m0.003s
bash-3.2$ time awk 'NR == FNR {
orderIds[$0]; next
}
!($0 in orderIds)
' ids.log.local ids.log.remote > /dev/null
real 0m0.052s
user 0m0.051s
sys 0m0.001s
bash-3.2$ time awk 'NR == FNR {
orderIds[$0]; next
}
!($0 in orderIds)
' ids.log.local ids.log.remote > /dev/null
real 0m0.053s
user 0m0.051s
sys 0m0.002s
bash-3.2$ time awk -v compareTo="ids.log.local" -f checkids.awk ids.log.remote > /dev/null
real 0m0.066s
user 0m0.060s
sys 0m0.006s
bash-3.2$ time awk -v compareTo="ids.log.local" -f checkids.awk ids.log.remote > /dev/null
real 0m0.065s
user 0m0.058s
sys 0m0.008s
bash-3.2$ time awk -v compareTo="ids.log.local" -f checkids.awk ids.log.remote > /dev/null
real 0m0.061s
user 0m0.053s
sys 0m0.007s
@Dimitre Radoulov Looks like your awk is faster. Thanks.
orderIds[var]=varis storingvartwice; once as the the index, and then again as the value held by that array element. This is not necessary and inhibits the detection of blank lines in the main section, ie.whenvar == "". In theBEGINsection, you can setorderIds[var]=1.. The1is just a flag to indicate that this particular index (var) has been encountered in "ids.log.remote"varhere has a string value. But that string isn't going to be freed anyway, since you're using it as the array index, so you're only using up the space of an active pointer. Probably the same length as an integer value, but it's better hygiene not to think of storing a pointer as "storing an int". I also find it more natural to useorderIds[var]=1here; that's a familiar pattern for implementing sets in terms of assoc arrays that others will more immediately recognize when reading your code.