Scriptum
Full-text search with branching
Git-like branching for Apache Lucene. Fork a 100GB index by sharing immutable segment files. Time-travel queries, branch isolation, and safe experimentation on your search indices.
Why Scriptum
- Zero-cost forking - Branch any index in a few ms regardless of size. Copies metadata, not data.
- Structural sharing - Branches share immutable Lucene segments via copy-on-write overlay directories.
- Time travel - Open readers at any historical commit point. Query past index states.
- Full Lucene 10.x - Text search, KNN vectors, facets, highlighting - all branch-aware.
- Apache-2.0 - Open source, permissive license.
How it works
Scriptum extends Lucene with four components that enable copy-on-write branching:
- BranchedDirectory - Overlay pattern: reads fall back to base, writes go to branch overlay.
- BranchDeletionPolicy - Retains all commit points until explicit garbage collection.
- BranchAwareMergePolicy - Prevents merging shared segments that would break other branches.
- BranchIndexWriter - Main API for create, fork, commit, merge, and GC operations.
See LUCENE_EXTENSION.md for the full technical deep-dive.
Clojure API
require('[scriptum.core :as sc])
;; Create an index
def writer := sc/create-index("/tmp/my-index")
;; Add documents
sc/add-doc(writer,
{:title {:type :text, :value "Hello World"},
:id {:type :string, :value "doc-1"}})
sc/commit!(writer, "Initial commit")
;; Fork a branch (3-5ms regardless of index size)
def experiment := sc/fork(writer, "experiment")
;; Add to branch (doesn't affect main)
sc/add-doc(experiment,
{:title {:type :text, :value "Branch only"},
:id {:type :string, :value "doc-2"}})
sc/commit!(experiment, "Added experimental doc")
;; Main still has 1 doc, branch has 2
count(sc/search(writer, {:match-all {}}, 100))
count(sc/search(experiment, {:match-all {}}, 100))
;; Merge back when ready
sc/merge-from!(writer, experiment)(require '[scriptum.core :as sc])
;; Create an index
(def writer (sc/create-index "/tmp/my-index"))
;; Add documents
(sc/add-doc writer {:title {:type :text :value "Hello World"}
:id {:type :string :value "doc-1"}})
(sc/commit! writer "Initial commit")
;; Fork a branch (3-5ms regardless of index size)
(def experiment (sc/fork writer "experiment"))
;; Add to branch (doesn't affect main)
(sc/add-doc experiment {:title {:type :text :value "Branch only"}
:id {:type :string :value "doc-2"}})
(sc/commit! experiment "Added experimental doc")
;; Main still has 1 doc, branch has 2
(count (sc/search writer {:match-all {}} 100)) ;; => 1
(count (sc/search experiment {:match-all {}} 100)) ;; => 2
;; Merge back when ready
(sc/merge-from! writer experiment)Java API
import org.replikativ.scriptum.BranchIndexWriter;
import org.apache.lucene.document.*;
import java.nio.file.Path;
// Create an index
BranchIndexWriter main = BranchIndexWriter.create(
Path.of("/tmp/my-index"), "main");
// Add documents
Document doc = new Document();
doc.add(new TextField("title", "Hello World", Field.Store.YES));
main.addDocument(doc);
main.commit("Initial commit");
// Fork in a few ms regardless of index size)
BranchIndexWriter feature = main.fork("experiment");
// Branches evolve independently
feature.addDocument(anotherDoc);
feature.commit("Feature work");
// Merge back
main.mergeFrom(feature); When to use Scriptum vs Proximum
Scriptum
Full-text search with Lucene
- Keyword search, facets, highlighting
- Text analysis pipelines
- Document-oriented indices
- When you need Lucene's query language
Proximum
Vector similarity search
- Embedding-based retrieval (RAG)
- Semantic search
- Faster parallelized inerstion than Lucene HNSW
- Advanced vector search features
Both have branching, snapshots, and time-travel. Choose based on your search workload.
Requirements
- Java 21+ - Required for Lucene 10.x (Foreign Memory API, Vector API)
- Lucene 10.3.2 - Pulled from Maven Central
- Clojure 1.12.0+ - For the Clojure API
Install
Available on Clojars. See the GitHub repository for current version and installation instructions.
Maven/Gradle users: add the Clojars repository to your build configuration.