Automated Construction Risk Identifcation System
The ACRIS system is organized as a monolithic modular Python package under the acris/ directory. Main modules:
dp/: Data Processingupq/: User & Project Queryqo/: Question Organizationro/: Risk Outputra/: Risk Analysiscommon/: Shared utilities, config, and data models
source .venv/bin/activate
uv pip installLinting and formatting:
# ruff check is the primary entrypoint to the Ruff linter
ruff check # Lint files in the current directory.
ruff check --fix # Lint files in the current directory and fix any fixable errors.
ruff check --watch # Lint files in the current directory and re-lint on change.
ruff check path/to/code/ # Lint files in `path/to/code`.
# ruff format is the primary entrypoint to the formatter
ruff format # Format all files in the current directory.
ruff format path/to/code/ # Format all files in `path/to/code` (and any subdirectories).
ruff format path/to/file.py # Format a single file.
To test the build:
source .venv/bin/activate && uv pip install dist/acris-0.1.0-py3-none-any.whl && python -c 'import acris; import acris.main'The main entry point is acris/main.py.
See acris/README.md for module details.
ACRIS supports semantic search over risk cases using DuckDB with the VSS (Vector Similarity Search) extension and Sentence Transformers for embedding generation.
-
Install dependencies (already in
pyproject.toml):duckdbsentence-transformersnumpy
-
Environment
-
Activate your virtual environment:
source .venv/bin/activate -
Ensure dependencies are installed:
uv pip install
-
-
Configuration
- By default, the DuckDB database is stored as
acris.duckdbin the project root. - To override the path, set the
ACRIS_DUCKDB_PATHenvironment variable.
- By default, the DuckDB database is stored as
The main API is provided in acris/vector_db.py:
add_risk_case(description: str) -> int- Adds a risk case and stores its embedding.
find_similar_cases(query: str, top_k: int = 5) -> List[Tuple[int, str, float]]- Returns the most similar risk cases to the query string.
rebuild_vss_index() -> None- Rebuilds the VSS index (call after bulk updates).
get_all_risk_cases() -> List[Tuple[int, str]]- Returns all risk cases (id, description).
from acris import vector_db
# Add a risk case risk_id = vector_db.add_risk_case("Risk of fall from height")
# Query similar cases
results = vector_db.find_similar_cases("fall prevention", top_k=3)
for rid, desc, score in results:
print(f"ID: {rid}, Desc: {desc}, Score: {score:.4f}")- The embedding model used is
all-MiniLM-L6-v2(384-dim). - The VSS extension is loaded automatically.
- All code follows PEP 8, uses type hints, and is documented with docstrings.
- For production, ensure the database file is secured and sensitive data is handled appropriately.