I have just written a small application to aggregate the lines of a text file, for example to group the lines according to the frequency of IP addresses in a log file.
Would the code be sufficiently commented (and self-explanatory) or is something still missing?
I would appreciate any comments from you.
Example call
javac Main.java
and java Main input.txt ".*? (\d+) .*"
with an input.txt file:
g 23 foo
a 234 bar
b 234 baz
c 123 qux
d 32 quux
e 234 corge
f 32 grault
will print
0003 - 234 - a 234 bar
0002 - 32 - d 32 quux
0001 - 23 - g 23 foo
0001 - 123 - c 123 qux
Code
import java.io.BufferedReader;
import java.io.FileReader;
import java.nio.charset.Charset;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
/**
* Main method to aggregate and count the occurrences of a regex pattern in a text file.
*
* @param args input text file, aggregate regex, and optional ignore regex.
* @throws Exception if the input file is not found or cannot be read.
*/
public static void main(String[] args) throws Exception {
if (args.length < 2) {
System.out.println(
"Usage: java -jar <jar file> <input text file> <aggregate regex> (<ignore regex>)");
System.out.println(
"Example: java -jar <jar file> \"input.txt\" \".*? (\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}) .*\"");
System.out.println(" to aggregate IP addresses from the input file.");
throw new IllegalArgumentException("Invalid number of arguments");
}
Pattern aggregatePattern = Pattern.compile(args[1]);
Pattern ignorePattern = args.length > 2 ? Pattern.compile(args[2]) : Pattern.compile("(?!x)x");
LinkedHashMap<String, Map.Entry<ArrayList<String>, Integer>> map = new LinkedHashMap<>();
try (BufferedReader reader =
new BufferedReader(new FileReader(args[0], Charset.defaultCharset()))) {
String line;
while ((line = reader.readLine()) != null) {
Matcher aggregateMatcher = aggregatePattern.matcher(line);
if (aggregateMatcher.find() && !ignorePattern.matcher(line).find()) {
String key = aggregateMatcher.group(1);
map.computeIfAbsent(key, k -> new AbstractMap.SimpleEntry<>(new ArrayList<>(), 0));
Map.Entry<ArrayList<String>, Integer> entry = map.get(key);
entry.getKey().add(line);
entry.setValue(entry.getValue() + 1);
}
}
}
ArrayList<Map.Entry<String, Map.Entry<ArrayList<String>, Integer>>> list =
new ArrayList<>(map.entrySet());
// Sort by count in descending order, if counts are equal, sort by input order (earlier first).
list.sort((o1, o2) -> o2.getValue().getValue().compareTo(o1.getValue().getValue()));
for (Map.Entry<String, Map.Entry<ArrayList<String>, Integer>> entry : list) {
System.out.printf(
"%04d - %s - %s%n",
entry.getValue().getValue(), entry.getKey(), entry.getValue().getKey().get(0));
}
}
}
main()
. It would be helpful to crack argv and then call some appropriately named helper function. Perhaps one that has a/** javadoc */
comment. What I'm looking for is, "what is the contract?", and there's room for the OP code to do a better job of explaining that. There's no Answer posted, yet, so there's still time to edit and improve the Question. \$\endgroup\$