feat: support `CLUSTER BY [AUTO, NONE]` for Databricks by EhabEasee · Pull Request #5846 · SQLMesh/sqlmesh

EhabEasee · 2026-06-17T15:39:40Z

Description

Databricks supports two keyword forms of liquid clustering that don't take column arguments:

CLUSTER BY AUTO — lets Databricks automatically select clustering columns
CLUSTER BY NONE — disables liquid clustering on a table

Previously, SQLMesh had no way to express these in a model definition. This PR adds support for both.

constants.py: Adds LIQUID_CLUSTERING_KEYWORDS = frozenset({"AUTO", "NONE"}) as a shared constant used across the parser, validator, and adapter.

Parsing (dialect.py): The clustered_by property parser now recognises bare AUTO and NONE tokens (unquoted VAR tokens) as liquid clustering keywords rather than column references. Backtick-quoted `auto` / `none` are still treated as regular column names, preserving backwards compatibility for columns that happen to share those names.

Validation (meta.py): A single string passed to clustered_by is normalised to a list before processing. The validator then skips the column-count check for exp.Var(AUTO|NONE), but only when the field is clustered_by and the dialect is databricks. On deserialisation from JSON, keyword strings are restored to exp.Var sentinels before list_of_fields_validator can normalise them into quoted columns.

Validation (definition.py): The validate_definition column-existence check skips keyword sentinels for the same clustered_by + databricks scope.

Code generation (databricks.py): _build_table_properties_exp detects a single exp.Var in clustered_by (guarded by a ValueError if the Var holds an unexpected value), and emits CLUSTER BY AUTO / CLUSTER BY NONE without wrapping in a tuple. Multi-column paths are unchanged.

Usage:

-- In a SQLMesh model definition
MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by AUTO
);

MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by NONE
);

Via the Python API, both a plain string and exp.Var are accepted:

create_sql_model(..., dialect="databricks", clustered_by="AUTO")
create_sql_model(..., dialect="databricks", clustered_by=exp.Var(this="AUTO"))

Columns with the names auto or none are still supported via backtick quoting:

MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by (`auto`, `none`)
);

Test Plan

tests/core/test_dialect.py — parser round-trips: AUTO/NONE keywords, backtick-quoted columns, paren-wrapped single columns, multi-column lists, mixed list (a, AUTO), non-Databricks dialect
tests/core/test_model.py — model DDL; Python API with both exp.Var and plain string; backtick-quoted column names; render_definition output; JSON serialisation round-trip; non-Databricks dialect rejection; mixed-list column treatment
tests/core/engine_adapter/test_databricks.py — adapter emits CLUSTER BY AUTO / CLUSTER BY NONE without column parens

Checklist

I have run make style and fixed any issues
I have added tests for my changes (if applicable)
All existing tests pass (make fast-test)
My commits are signed off (git commit -s) per the DCO

StuffbyYuki · 2026-06-29T05:27:44Z

@EhabEasee Thanks for this PR!

Not trying to be nit-picky, but here's a few items:

Docs: Add a note in model docs that Databricks supports clustered_by AUTO / NONE, and that backticks are needed for real columns named auto/none.
Test: test_clustered_by_keyword_non_databricks_dialect: perhaps use pytest.raises(ConfigError) instead of (ConfigError, Exception).

Let me know if I'm missing anything!

EhabEasee · 2026-06-29T08:53:15Z

@StuffbyYuki both comments make sense and I've made the updates. However, the comment in the docs feels misplaced and easy to miss.

I was considering adding it in the Databricks engine docs but couldn't find a reasonable place to add it. Do you have any suggestions on a more relevant place to add that note? The StarRocks docs seem to have something similar so I could imitate that?

StuffbyYuki · 2026-06-29T15:04:10Z

@EhabEasee thanks! Yeah I don't think it has to be that big block like starrocks docs do, but I just figured adding something somewhere in the docs might be helpful! I'll let you decide where and how to put it on the docs

EhabEasee · 2026-06-30T07:13:02Z

@StuffbyYuki I added a new section to the databricks integration docs. Let me know if you have any more feedback

StuffbyYuki · 2026-06-30T15:15:14Z

@EhabEasee It looks like your commits need DCO checks!

…id clustering Adds parser, validator, and Databricks adapter support for the keyword forms of liquid clustering. Bare AUTO/NONE (unquoted VAR tokens) are recognised as keywords; backtick-quoted `auto`/`none` and parenthesised forms remain real column references. - Add LIQUID_CLUSTERING_KEYWORDS constant to avoid repeating the sentinel set across dialect, meta, definition, and adapter - Parser (dialect.py): detect VAR-token AUTO/NONE on clustered_by; strip Paren from single-column clustered_by to match partitioned_by normalisation - Validator (meta.py): normalise single string input to list; restore keyword sentinels from JSON strings on deserialisation; skip column-count check for keywords, gated on clustered_by + databricks - validate_definition (definition.py): skip keyword sentinels in the column-existence check, same gate - Adapter (databricks.py): emit CLUSTER BY AUTO / CLUSTER BY NONE without a tuple wrapper; raise ValueError on unexpected bare Var - Tests: parser round-trips, Python API (exp.Var and plain string), backtick-quoted columns, render_definition, JSON round-trip, non-Databricks rejection, mixed-list behaviour, adapter SQL emission Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

…ed_by docs Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

…d_non_databricks_dialect Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

… clustered_by docs" This reverts commit bb70305. Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

…tion docs Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

EhabEasee · 2026-06-30T18:51:08Z

@StuffbyYuki Done

EhabEasee force-pushed the feat/clustered-by-auto-none branch from 6f3e9a9 to 4f29141 Compare June 25, 2026 09:39

EhabEasee changed the title ~~feat: support CLUSTER BY AUTO and CLUSTER BY NONE for Databricks liquid clustering~~ Jun 25, 2026

StuffbyYuki self-requested a review June 29, 2026 05:27

EhabEasee force-pushed the feat/clustered-by-auto-none branch from fb8c119 to 9b49578 Compare June 30, 2026 18:43

EhabEasee added 5 commits June 30, 2026 20:48

docs: note Databricks liquid clustering AUTO/NONE keywords in cluster…

59b3fca

…ed_by docs Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

test: narrow pytest.raises to ConfigError in test_clustered_by_keywor…

3e41b85

…d_non_databricks_dialect Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

Revert "docs: note Databricks liquid clustering AUTO/NONE keywords in…

7892710

… clustered_by docs" This reverts commit bb70305. Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

docs: note Databricks liquid clustering AUTO/NONE keywords in integra…

4ce5a4e

…tion docs Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

docs: add seperator lines

a26e4f6

Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

EhabEasee force-pushed the feat/clustered-by-auto-none branch 2 times, most recently from 1c7d73a to a26e4f6 Compare June 30, 2026 18:49

Merge branch 'main' into feat/clustered-by-auto-none

aadd4e6

StuffbyYuki approved these changes Jun 30, 2026

View reviewed changes

StuffbyYuki merged commit 9a25aa1 into SQLMesh:main Jun 30, 2026
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support `CLUSTER BY [AUTO, NONE]` for Databricks#5846

feat: support `CLUSTER BY [AUTO, NONE]` for Databricks#5846
StuffbyYuki merged 7 commits into
SQLMesh:mainfrom
EhabEasee:feat/clustered-by-auto-none

EhabEasee commented Jun 17, 2026 •

edited

Loading

StuffbyYuki commented Jun 29, 2026

EhabEasee commented Jun 29, 2026 •

edited

Loading

StuffbyYuki commented Jun 29, 2026

EhabEasee commented Jun 30, 2026

StuffbyYuki commented Jun 30, 2026

EhabEasee commented Jun 30, 2026

Uh oh!

Labels

2 participants

Uh oh!

Conversation

EhabEasee commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Plan

Checklist

StuffbyYuki commented Jun 29, 2026

EhabEasee commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

StuffbyYuki commented Jun 29, 2026

EhabEasee commented Jun 30, 2026

StuffbyYuki commented Jun 30, 2026

EhabEasee commented Jun 30, 2026

Uh oh!

Labels

2 participants

EhabEasee commented Jun 17, 2026 •

edited

Loading

EhabEasee commented Jun 29, 2026 •

edited

Loading