4
\$\begingroup\$

I had written a batch processor in Java a long time ago.

The code basically accepts Callables and executes them.

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

class BatchProcessor {
  private List<Callable<?>> tasks;
  private final ExecutorService executors;

  private final static Logger logger = LoggerFactory.getLogger(BatchProcessor.class);

  public BatchProcessor() {
    executors = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
    tasks = new ArrayList<>();
  }

  public <T> void addTask(Callable<T> task) {
    if (task == null) return;

    tasks.add(task);
  }

  public List<Object> run() {
    if (tasks.isEmpty()) return null;

    List<Object> results = new ArrayList<>();
    List<Future<?>> futures = new ArrayList<>();

    for (Callable<?> task: tasks) {
      Future<?> f = executors.submit(task);
      futures.add(f);
    }

    int done = 0;
    logger.info(String.format("run - Progress %d / %d completed", done, tasks.size()));

    try {
      for (int i = 0; i < tasks.size(); i++) {
        results.add(futures.get(i).get());
         logger.info(String.format("run - Progress %d / %d completed", ++done, tasks.size()));
      }

      executors.shutdown();
      if (!executors.awaitTermination(10, TimeUnit.MINUTES)) {
        logger.error("run - timed out");
      }

    } catch(ExecutionException ee) {
      logger.info("run - ExecutionException " + ee.toString());
    } catch(InterruptedException ee) {
      logger.info("run - InterruptedException");
    }

    return results;
  }

  public static void main(String[] args) {
    BatchProcessor bp = new BatchProcessor();
    bp.addTask(new Callable<String>() {
      @Override
      public String call() {
        return "hello world";
      }
    });
    bp.addTask(new Callable<Integer>() {
      @Override
      public Integer call() {
        return 29;
      }
    });
    List<Object> lst = bp.run();
    for (Object o: lst) {
      if (o instanceof Integer) {
        System.out.println((int) o);
      }
      if (o instanceof String) {
        System.out.println((String) o);
      }
    }
  }
}

Are there any improvements that I can make in this code?

\$\endgroup\$
6
  • \$\begingroup\$ Instead of creating your own ExecutorService, check this stackoverflow.com/a/52277821/1047418 \$\endgroup\$ Commented Feb 25 at 13:41
  • 1
    \$\begingroup\$ What's the purpose of this? What kind of tasks are you planning to execute? \$\endgroup\$ Commented Feb 25 at 15:48
  • 2
    \$\begingroup\$ Your ExecutorService already has an invokeAll method Your logging is pointless as it only logs the progress of asking for the results which says nothing about the actual completion which can be far ahead of that. There’s also no need to call awaitTermination when there are knowingly no jobs left. And Java has lambda expressions for more than a decade now; there is no need to write inner classes to implement trivial Callables. \$\endgroup\$ Commented Feb 26 at 9:29
  • 2
    \$\begingroup\$ And if (tasks.isEmpty()) return null; is a huge code smell, turning a valid input into a null result. If an empty list goes in, an empty list should go out, if (tasks.isEmpty()) return Collections.emptyList(); \$\endgroup\$ Commented Feb 26 at 9:36
  • 1
    \$\begingroup\$ @Holger, Thanks :-) please post it as an answer. \$\endgroup\$ Commented Feb 26 at 11:32

4 Answers 4

4
\$\begingroup\$

Layout

The code uses 2 spaces per indent level, but 4 spaces are recommended for better readability. Also, there is inconsistent indentation in places.

Efficiency

The 2 if statements in the main function are mutually exclusive and can be combined into an if/else:

if (o instanceof Integer) {
    System.out.println((int) o);
} else if (o instanceof String) {
    System.out.println((String) o);
}

This makes code intent clearer, and it improves efficiency since the 2nd check is not needed if the first is true.

Comments

You should add a comment to the top of the code to summarize its purpose (what you are batch processing, how you are doing it, etc.).

\$\endgroup\$
2
  • 4
    \$\begingroup\$ The if-statements aren't really needed at all, can be replaced with a simple System.out.println(o); as there's a function to simply print an object. If only ints and strings should be printed, it can be a single if with an or condition. \$\endgroup\$ Commented Feb 25 at 15:46
  • \$\begingroup\$ Makes sense, thanks :-) \$\endgroup\$ Commented Feb 26 at 11:32
3
\$\begingroup\$

main() never checks whether the tasks all completed successfully. If any failed, we want to know about that in the program's exit status, as that likely affects the user's next actions (e.g. when run from Make, it's important not to continue making targets that depend on a failed dependency).

\$\endgroup\$
1
  • \$\begingroup\$ The main function is just a sample on how one can use the class. In real use case, one can return a value of type, for example, ImmutablePair<TransformedData, Errors>, something like that. \$\endgroup\$ Commented Feb 26 at 11:35
2
\$\begingroup\$

Here's some non-trivial suggestions. (The basic code style could be better in many ways, but that's a triviality as it is so easy to fix with modern tooling.)

Better Logging

The logging of the ExecutionException loses where that exception occurred, which makes debugging buggy tasks unnecessarily difficult. Where it is impractical to pitch the cause of the ExecutionException into a dedicated handler, at least pass the exception object to the logger, perhaps like this:

    // ...
    } catch (ExecutionException ee) {
      logger.info("task failed", ee.getCause());
    }
    // ...

(The wrapping ExecutionException itself isn't very interesting; you probably don't need to log that.)

I also wouldn't log the name of the method doing the logging. The logger does that for you if you configure it to, and it's otherwise pure noise.

Return Type

In practice, I'd suggest making the BatchProcessor generic on the result type of the Callables so that the result of run() can be a List<T> instead of a bare List<Object>. It wouldn't help in your example case as the common supertype of Integer and String is Object anyway, but it does help a lot more in more practical usage.

Support DI-Provided Executors

Add another constructor that lets the creator of the BatchProcessor specify what Executor to use. That makes a lot of sense if you're ever in an environment like Spring Boot...

And the number of threads to use isn't necessarily the number of CPU cores available. Those sorts of things can require complex tuning. But it's OK for a "the user didn't say anything else to do" case; a bit wrong, but probably not catastrophically so.

\$\endgroup\$
0
1
\$\begingroup\$

The early return in such a short, simple method is probably unnecessary:

  public <T> void addTask(Callable<T> task) {
    if (task == null) return;

    tasks.add(task);
  }

Instead:

Braces added for consistency, but unnecessary.

  public <T> void addTask(Callable<T> task) {
    if (task != null) {
      tasks.add(task);
    }
  }

It makes a lot more sense in run where it saves us from having to indent 29 lines of code another level.

\$\endgroup\$
1
  • 4
    \$\begingroup\$ This is a matter of preference and personally I'd prefer to have an early return instead of an extra indentation level. \$\endgroup\$ Commented Feb 25 at 15:45

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.