Skip to content

Upgrade Apache Calcite from 1.40.0 to 1.42.0#18658

Open
yashmayya wants to merge 1 commit into
apache:masterfrom
yashmayya:calcite-1.42.0-upgrade
Open

Upgrade Apache Calcite from 1.40.0 to 1.42.0#18658
yashmayya wants to merge 1 commit into
apache:masterfrom
yashmayya:calcite-1.42.0-upgrade

Conversation

@yashmayya
Copy link
Copy Markdown
Contributor

@yashmayya yashmayya commented Jun 2, 2026

Summary

Upgrades Apache Calcite from 1.40.0 to 1.42.0. Pinot's master was never moved to 1.41, so this folds in both the 1.40→1.41 and 1.41→1.42 deltas in a single bump.

The bulk of the upgrade is a faithful re-sync of Pinot's customized SQL parser grammar to upstream 1.42, plus a handful of targeted workarounds for behavioral changes Calcite introduced across these two releases. No public Pinot API or wire/segment format changes.

Changes

Dependency

  • pom.xml: calcite.version1.42.0 (covers calcite-core and calcite-babel).
  • Pinned joou-java-6 to 0.9.5 in dependencyManagement to resolve a dependency-convergence conflict: calcite-core 1.42 pulls joou-java-6 0.9.5 while the transitive avatica-core 1.28.0 still wants 0.9.4.
  • LICENSE-binary: bumped the Calcite/Avatica entries to 1.42.0 / 1.28.0 (calcite-core, calcite-babel, calcite-linq4j, avatica-core, avatica-metrics) and added org.jooq:joou-java-6:0.9.5 — matching the binary distribution's DEPENDENCIES manifest.

SQL parser codegen sync (pinot-common/src/main/codegen)

  • Re-synced templates/Parser.jj, config.fmpp, and default_config.fmpp to upstream Calcite 1.42, preserving every PINOT CUSTOMIZATION region.
  • The new babel feature flags introduced upstream — includeStarExclude (SELECT * EXCLUDE/REPLACE), includeSelectBy (SELECT ... BY), and includeIntervalWithoutQualifier — are intentionally kept OFF. The grammar is synced but inactive: the multi-stage engine has no downstream support for these features, so enabling them would parse syntax the planner/runtime can't execute. They are wired through default_config.fmpp so a future change can flip them on deliberately.
  • UNSIGNED is added as a non-reserved keyword (needed for the unsigned integer types below).

Behavioral workarounds for 1.41/1.42 changes

  • CALCITE-7189 (non-strict GROUP BY): 1.41+ BABEL enables MySQL-style non-strict GROUP BY (wrapping non-grouped columns in ANY_VALUE()), but the implementation NPEs when a window function is combined with GROUP BY (e.g. SELECT MIN(col) OVER() FROM t GROUP BY col). Validator now uses a SqlDelegatingConformance over BABEL that overrides isNonStrictGroupBy() to false. This is also the semantically correct behavior for Pinot, which requires all non-aggregated columns to appear in GROUP BY. The feature remains present (un-reverted) in 1.42, so the override is retained.
  • CALCITE-7379 (decorrelation type assertion): the upstream fix does not fully cover the correlated-subquery shapes Pinot produces — a post-decorrelation Litmus.THROW type assertion still fires for a nullability-only divergence. PinotRelDecorrelator (new, see below) relaxes that one assertion: it logs a warning when the row types differ only in nullability and continues, but still fails fast on any structural type change.
  • CALCITE-7351: RelDataTypeSystem#getMaxNumericScale/getMaxNumericPrecision became final. TypeSystem drops the now-illegal overrides; the equivalent behavior is preserved via the type-specific getMaxScale/getMaxPrecision(DECIMAL) overrides Pinot already defines.
  • Filtered MIN/MAX nullability: 1.42 exposes SqlOperatorBinding#hasEmptyGroup(). PinotMinMaxReturnTypeInference now also treats a possibly-empty group as nullable (alongside the existing getGroupCount() == 0 / hasFilter() checks), matching the runtime's null-on-empty behavior for MIN(x) FILTER (WHERE ...).

Unsigned integer types (CALCITE-1466)

BABEL now parses UTINYINT/USMALLINT/UINTEGER/UBIGINT. Pinot has no native unsigned storage, so each is mapped to the narrowest signed type that holds its full range without loss — and UBIGINT (BIGINT UNSIGNED), which has no such type, is rejected rather than silently wrapping:

  • UTINYINT/USMALLINTINT, UINTEGERLONG (a signed INT would wrap UINTEGER values above 2³¹); applied in RelToPlanNodeConverter / v2 PRelToPlanNodeConverter (convertToColumnDataType), the single-stage DataTypeConversionFunctions.cast, TypeSystem.deriveSumType (widens to signed BIGINT so SUM doesn't overflow a 32-bit INT), and ArithmeticFunctionUtils.normalizeNumericType (keeps arithmetic integral instead of widening to DOUBLE).
  • UBIGINT is rejected at planning (convertToColumnDataType throws): its 0..2⁶⁴−1 range exceeds signed LONG (2⁶³−1), so mapping it to LONG would silently wrap values above Long.MAX_VALUE into negatives — a silent wrong result. Failing fast (with a clear message suggesting CAST … AS BIGINT/DECIMAL) is safer; UBIGINT was a parse error pre-1.41 anyway, and only ever arises from an explicit CAST(… AS BIGINT UNSIGNED). (Per review feedback from @xiangfu0.)
  • PinotEvaluateLiteralRule folds a constant unsigned cast into its signed-equivalent type by delegating to convertToColumnDataType — so the representable types fold, and a UBIGINT literal cast is rejected on the same path.

New class

  • org.apache.pinot.calcite.sql2rel.PinotRelDecorrelator — a minimal subclass of Calcite's RelDecorrelator that exists solely to relax the CALCITE-7379 assertion described above. It lives under the org.apache.calcite.sql2rel package because the relevant members (CorelMap, decorrelate, etc.) are package/protected-subclass visible in Calcite.

Testing & validation

  • pinot-query-planner unit suite: 1262/1262 pass.
  • pinot-query-runtime result-correctness vs H2 — ResourceBasedQueriesTest 3571 pass / 0 fail (6 pre-existing skips), QueryRunnerTest 130/130.
  • OfflineClusterIntegrationTest run locally; updated testQueryWithRepeatedColumnsV2 to reflect that 1.42 now accepts repeated columns in GROUP BY but still rejects ambiguous repeated columns in ORDER BY.
  • New/updated regression tests pin every workaround: filtered MIN/MAX nullability, the CALCITE-7379 decorrelation path (the structural-vs-nullability divergence decision is extracted and unit-tested), the non-strict-GROUP-BY-with-window NPE, unsigned-type cast acceptance, SUM/arithmetic return types over unsigned operands, and single-stage unsigned casts.
  • The 8 pinot-query-planner/src/test/resources/queries/*.json EXPLAIN-plan snapshots are mechanically regenerated (label/whitespace deltas from upstream rule changes), not hand-edited.

Behavior & compatibility notes

  • UNSIGNED casts — new accepted/rejected query surface (user-facing). As a consequence of CALCITE-1466, BABEL now parses CAST(x AS <type> UNSIGNED) on both engines. Pinot accepts the representable ones, mapping to the narrowest lossless signed type (TINYINT/SMALLINT UNSIGNEDINT, INTEGER UNSIGNEDLONG), and rejects BIGINT UNSIGNED at planning with a clear IllegalArgumentException (no signed type holds its full range). Net: some ... UNSIGNED casts that were previously parse errors now succeed, and BIGINT UNSIGNED now produces a specific planning-time rejection. Worth a release note.
  • Repeated GROUP BY key now accepted (MSE): under the multi-stage engine, a query with a repeated grouping key (e.g. SELECT x, COUNT(*) FROM t GROUP BY x, x) previously failed validation and now succeeds — Calcite 1.42 de-duplicates the repeated key. This is a benign relaxation (no previously-working query breaks), but during a mixed-version rolling upgrade the same query is rejected by a 1.40 broker and accepted by a 1.42 broker. Repeated columns in ORDER BY are still rejected as ambiguous (covered by the updated testQueryWithRepeatedColumnsV2).
  • Plan shape: semi → inner join in a few IN/EXISTS shapes. The 1.41 subquery-decorrelation rework rewrites the outer semi-join to an inner join in a small number of nested IN/semi-join shapes (visible in the regenerated JoinPlans.json/PhysicalOptimizerPlans.json). This is a sound, plan-equivalent rewrite — Calcite only applies it where the join's right input is already distinct (e.g. fed by an aggregate) — and result-equivalence is covered by the H2-comparison suites (ResourceBasedQueriesTest) and the integration tests, which all pass.
  • New parseable bitwise infix operators (<<, &, ^). The faithful parser sync makes these parseable for the first time (upstream-stock 1.42 additions to BinaryRowOperator, mapping to SqlStdOperatorTable.BIT_LEFT_SHIFT/BITAND_OPERATOR/BITXOR_OPERATOR); previously they were parse errors. No operator wiring is added by this PR. End-to-end status differs by engine: in the multi-stage engine PinotOperatorTable is a curated allow-list that does not register them, so such expressions fail at operator resolution (validation) — same status as the other synced-but-inert 1.42 grammar. In the single-stage engine the canonical names of &/^ (bitand/bitxor) coincide with Pinot's existing bitAnd/bitXor scalar functions, so a & b / a ^ b may resolve to those (a MySQL-style infix alias), while << has no Pinot equivalent and errors. This v1/v2 divergence is an inherent consequence of faithfully syncing the upstream grammar; explicitly wiring up or rejecting these operators is left as a separate, deliberate decision.
  • Unsigned casts in the single-stage engine. Because the single-stage parser also uses BABEL, CAST(x AS INTEGER UNSIGNED) (and the other representable unsigned types) is now parseable in v1 too. DataTypeConversionFunctions.cast maps them to their signed equivalent (UTINYINT/USMALLINT → INT, UINTEGER → LONG), mirroring the multi-stage converter, and rejects BIGINT UNSIGNED (UBIGINT) with the same clear error. Covered by DataTypeConversionFunctionsTest#testCastToUnsignedTypes.
  • No config keys, SPI signatures, enum/DataType additions, JSON/Protobuf fields, or DataTable/segment-version changes — nothing else has mixed-version visibility.

Notes for reviewers

  • The synced-but-disabled grammar (EXCLUDE/REPLACE/SELECT-BY/bare-INTERVAL) is deliberately inert — included to keep Parser.jj a faithful upstream sync rather than a divergent fork. Flipping the three fmpp flags is a separate, future decision.
  • The upstream colon-path field-access grammar (AddOptionalColonPath/ColonBracketSegment) was also synced and is likewise inert under Pinot — but it is gated by the upstream conformance method SqlConformance.isColonFieldAccessAllowed() (which returns false for BABEL), not by an fmpp flag. These productions are byte-for-byte from upstream Calcite 1.42.0 (verified against calcite-1.42.0 Parser.jj), so they intentionally carry no PINOT CUSTOMIZATION markers.
  • All PINOT CUSTOMIZATION markers from the prior grammar are preserved.
  • BIGINT UNSIGNED (UBIGINT) is rejected rather than mapped to a lossy LONG (per @xiangfu0's review) — see the unsigned-types section. The representable unsigned types (TINYINT/SMALLINT/INTEGER UNSIGNED) map losslessly, so no unsigned value silently wraps.
  • Follow-up (not in this PR): the unsigned→signed handling touches several type-dispatch switches (the two converters, DataTypeConversionFunctions, ArithmeticFunctionUtils.normalizeNumericType, TypeSystem.deriveSumType). Each maps to a different target enum and uses case labels (which must be compile-time constants, so they can't delegate to a shared predicate), so the per-switch listing is largely inherent. The one genuinely-collapsible duplication is RelToPlanNodeConverter.convertToColumnDataType and the v2 PRelToPlanNodeConverter.convertToColumnDataType, which are byte-for-byte identical public static copies — a pre-existing smell this diff merely extends. Collapsing those two (have v2 delegate to v1) is worth a dedicated refactor; left out here to keep the diff scoped to the version bump.
  • The filtered MIN/MAX fix (hasEmptyGroup()) is a planning-time type-nullability correction — it lets the query validate under 1.42's stricter nullability checks. Pinot's DataSchema/ColumnDataType erases nullability, so this does not change runtime values; the empty-filtered-group → NULL runtime semantics are pre-existing and unchanged. The fix is covered by a compile-time regression test (testFilteredMinMaxAggregateNullability).

@yashmayya yashmayya added the dependencies Pull requests that update a dependency file label Jun 2, 2026
@yashmayya yashmayya force-pushed the calcite-1.42.0-upgrade branch from b8bf3a6 to 3cbe3eb Compare June 2, 2026 21:14
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 97.29730% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 64.46%. Comparing base (f6b930b) to head (a8c9fe5).
⚠️ Report is 32 commits behind head on master.

Files with missing lines Patch % Lines
...he/pinot/calcite/sql2rel/PinotRelDecorrelator.java 95.23% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18658      +/-   ##
============================================
+ Coverage     64.45%   64.46%   +0.01%     
- Complexity     1282     1291       +9     
============================================
  Files          3352     3372      +20     
  Lines        207171   208583    +1412     
  Branches      32348    32573     +225     
============================================
+ Hits         133534   134465     +931     
- Misses        62910    63312     +402     
- Partials      10727    10806      +79     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.46% <97.29%> (+0.01%) ⬆️
temurin 64.46% <97.29%> (+0.01%) ⬆️
unittests 64.46% <97.29%> (+0.01%) ⬆️
unittests1 56.90% <97.29%> (+0.08%) ⬆️
unittests2 37.08% <24.32%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 1 high-signal issue; see inline comment.

case UINTEGER:
// UBIGINT (0..2^64-1) has no wider signed type, so values above Long.MAX_VALUE wrap (two's-complement) - this is
// unavoidable and acceptable since Pinot has no unsigned storage type.
case UBIGINT:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting UBIGINT here turns BIGINT UNSIGNED into a lossy signed LONG. Any value above Long.MAX_VALUE will now silently wrap, so this is a wrong-result regression rather than a harmless type downgrade. Since Pinot cannot represent the full unsigned 64-bit range, this needs to fail validation/planning instead of being mapped to LONG.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — agreed, and fixed. BIGINT UNSIGNED (UBIGINT) now fails fast at planning instead of being mapped to a lossy LONG: convertToColumnDataType throws a clear "Unsigned BIGINT is not supported … CAST to BIGINT or DECIMAL instead" error.

I kept the other unsigned types accepted, since those map losslessly and don't have the wrap problem — TINYINT/SMALLINT UNSIGNEDINT, INTEGER UNSIGNEDLONG. UBIGINT is the only one with no signed Pinot type wide enough for its full 0..2⁶⁴−1 range, so it's the one that has to be rejected.

The rejection is applied consistently across every site that touched the type — both (P)RelToPlanNodeConverter.convertToColumnDataType, the single-stage DataTypeConversionFunctions.cast, TypeSystem.deriveSumType, and ArithmeticFunctionUtils.normalizeNumericType — and is covered by regression tests (QueryCompilationTest#testUnsignedBigintCastIsRejected for both the column- and literal-cast planning paths, plus updated RelToPlanNodeConverterTest/PRelToPlanNodeConverterTest/DataTypeConversionFunctionsTest unit assertions). PR description updated too. Thanks!

Bump calcite-core/babel to 1.42.0 (folds in both the 1.40->1.41 and
1.41->1.42 deltas, since master was never moved to 1.41) and pin
joou-java-6 to 0.9.5 to resolve a dependency-convergence conflict
between calcite-core and the transitive avatica-core 1.28.0.

Sync the customized SQL parser (Parser.jj + the fmpp configs) to upstream
1.42, preserving all PINOT CUSTOMIZATION regions. The new babel feature
flags (includeStarExclude, includeSelectBy, includeIntervalWithoutQualifier)
are intentionally kept OFF: the grammar is synced but inactive, as the
multi-stage engine has no downstream support for those features yet. The
upstream colon-path field-access grammar was likewise synced and is inert
under Pinot's BABEL conformance (isColonFieldAccessAllowed() returns false) -
gated by conformance rather than an fmpp flag. The sync also makes the bitwise
infix operators '<<', '&' and infix '^' parseable (upstream-stock 1.42
additions); they are not registered in PinotOperatorTable, so they are not yet
supported end-to-end.

Handle 1.42 behavioral changes:
- CALCITE-7189: Validator disables non-strict GROUP BY (BABEL enables it
  in 1.41+, which NPEs for window functions combined with GROUP BY).
- CALCITE-7379: PinotRelDecorrelator relaxes the post-decorrelation type
  assertion for the nullability-only divergence that still fires on some
  Pinot correlated-subquery shapes; it fails fast on structural changes.
- CALCITE-7351: drop the now-final getMaxNumericScale/getMaxNumericPrecision
  overrides (they delegate to the type-specific overrides Pinot keeps).
- Filtered MIN/MAX are now nullable via SqlOperatorBinding.hasEmptyGroup().
- Unsigned integer types (CALCITE-1466) parse under BABEL. The representable
  ones are mapped to the narrowest lossless signed type (TINYINT/SMALLINT
  UNSIGNED -> INT, INTEGER UNSIGNED -> LONG) throughout (converters, literal
  folding, SUM return type, arithmetic normalization). BIGINT UNSIGNED (UBIGINT)
  has no signed type wide enough for its full 0..2^64-1 range, so it is rejected
  at planning rather than silently wrapping values above Long.MAX_VALUE.

Regenerate the EXPLAIN plan snapshots and update/extend the affected tests.
@yashmayya yashmayya force-pushed the calcite-1.42.0-upgrade branch from 3cbe3eb to a8c9fe5 Compare June 4, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants