Skip to content

Parallelization of loo using mirai and mori#378

Draft
florence-bockting wants to merge 6 commits into
loo-v3.0.0from
parallelization
Draft

Parallelization of loo using mirai and mori#378
florence-bockting wants to merge 6 commits into
loo-v3.0.0from
parallelization

Conversation

@florence-bockting

Copy link
Copy Markdown
Contributor

Summary

Fixes #308

  • Replaces parallel::mclapply() / parLapply() with mirai + mori for per-observation parallelism (cross-platform, including Windows).
  • Adds three parallelism modes:
    • per-call cores,
    • persistent session pool (loo.daemons / LOO_DAEMONS), and
    • user-managed mirai::daemons() (remote/SSH/HPC).
  • Parallel output matches serial; only scheduling changes.

What changed

Core (R/parallel.R): with_loo_daemons(), loo_map(), loo_pool_is_local(), loo_persist_config().

Parallelized functions: loo() (function method), psis()/sis()/tis(), relative_eff(), loo_subsample(), loo_moment_match(), loo_model_weights().

Pool precedence: connected pool (user or persistent) always wins → cores is ignored. Local pools use mori zero-copy for broadcast objects (e.g. draws); remote pools serialize.

Also: mirai + mori in DESCRIPTION; vignettes/loo2-parallel.Rmd; tests/testthat/test_parallel.R; benchmark/ scripts + bench-comparison.md.

Review guide

  1. vignettes/loo2-parallel.Rmd shows the user-facing model
  2. R/parallel.R includes the pool lifecycle + loo_map() transport
  3. See as example loo.function: R/loo.Rwith_loo_daemons()loo_map(broadcast = list(draws = ...))
  4. tests/testthat/test_parallel.R include serial/parallel equivalence, pool precedence
  5. benchmark/README.md includes first attempt of a small baseline vs new comparison (see first results in benchmark/bench-comparison.md)

Initial benchmarks (Linux, one machine): loo.function + large draws benefits most (~4× with persistent pool); matrix psis() does not (communication-bound); per-call pool pays ~1s spawn/teardown per call.

Follow-up work

  • Reviewing: Please have a look at the current implementation and check it for correctness and usability. Any comments and improvements are welcome.
  • Benchmarking: master vs this branch across problem sizes, all (or selected number of) parallelized functions, OSes (Linux/macOS/Windows), and metrics (wall-clock, allocation, peak RSS). You can for example extend benchmark/.
  • LSAT case study: posteriorDB lsat-data; showcase speedup and all three parallelism modes (one-off, persistent pool, simulation loop).
  • Remote SSH: two-machine test; verify correctness, measure speedup, document setup (mirai::daemons(url = ..., remote = ssh_config(...))).
  • Documentation: expand vignettes/loo2-parallel.Rmd with assumptions, when-to-use guidance, function-specific notes, memory model.

Current limitations of implementation

  • Matrix psis() rarely speeds up (large data shipped per worker).
  • Per-call cores > 1 can be slower than serial on small problems without loo.daemons.
  • Remote SSH untested in CI
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 2bee14f is merged into master:

  • ❗🐌loo_function: 1.98s -> 2.09s [+4.63%, +6.14%]
  • 🚀loo_matrix: 1.9s -> 1.87s [-2.08%, -0.46%]
    Further explanation regarding interpretation and methodology can be found in the documentation.
@florence-bockting florence-bockting mentioned this pull request Jul 1, 2026
6 tasks
@florence-bockting florence-bockting changed the base branch from master to loo-v3.0.0 July 1, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant