Skip to content

feat(jepsen): M5a — dynamodb-append-multi-table-workload#916

Merged
bootjp merged 3 commits into
mainfrom
feat/composed1-m5a-impl
Jun 3, 2026
Merged

feat(jepsen): M5a — dynamodb-append-multi-table-workload#916
bootjp merged 3 commits into
mainfrom
feat/composed1-m5a-impl

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 3, 2026

Draft — second slice of Composed-1 M5a per docs/design/2026_06_02_proposed_composed1_m5_jepsen_route_shuffle.md §3.3. Follows #911 (CLI tools, merged); blocks on launch script + setup-hook verification (remaining M5a slices) before unmark-draft.

What this PR ships

jepsen/src/elastickv/dynamodb_multi_table_workload.clj — a new workload variant of the DynamoDB list-append test that exercises cross-table 2PC, which the single-table elastickv.dynamodb-workload cannot do.

Why a separate workload exists

kv/shard_key.go's dynamoRouteKey normalises every DynamoDB table-meta / item / GSI key for one table to a single per-table route key (!ddb|route|table|<base64(name)>). Single-table workloads always route to one shard regardless of partition-key value, so dispatchMultiShardTxn / commitSecondaryTxns / the new ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel never fire (codex P1 on #905 ffb9c73).

What this workload does

  • N=4 tables (jepsen_append_t1jepsen_append_t4) created in setup!.
  • Deterministic key routing: Elle's integer keys k route via (mod k N)+1 → table-idx, so each Elle key always lands at the same (table, pk) storage location (Elle's append checker still sees a single key namespace).
  • Default 4-mop txns with key-count=12 distribute evenly across all 4 tables — a 4-key txn touches all 4 tables and therefore both Raft groups under the launch script's planned --shardRanges layout (t1-t2 → group 1, t3-t4 → group 2).
  • TransactGetItems for atomic cross-table pre-read; TransactWriteItems for atomic cross-table writes (both DynamoDB APIs natively support multi-table txns).
  • Same OCC condition expressions as the single-table workload so any G1c or G-single anomaly observed is attributable to the multi-shard 2PC path rather than to a workload-shape difference.

Why self-contained

The file intentionally duplicates small helpers from elastickv.dynamodb-workload rather than extracting into a shared namespace. Refactoring would couple the workloads through a maintained API and obscure the side-by-side comparability that is the point of running both — relevant because #905 commits to keeping the single-table workload as-is for trend comparison.

Tests (jepsen/test/elastickv/dynamodb_multi_table_workload_test.clj)

7 tests / 17 assertions. The load-bearing ones:

  • key-routing-distributes-across-all-tables — 12 keys distribute 3-per-table across 4 tables
  • key-routing-table-name-matches-prefix — wraparound at k=4
  • key-routing-pks-disambiguate_colliding_keys — k=0 and k=4 share table t1 but have distinct pks
  • multi-op-txn-spans_multiple_groups — default 4-key txn touches BOTH the planned group-1 tables (1,2) and group-2 tables (3,4) — without this, dispatchMultiShardTxn never fires and the whole workload is a no-op

The routing helpers are private; tests access via var-get on #'workload/key->table-idx etc. so a future change to the routing surfaces in this test rather than as a silent G1c during a Jepsen run.

Remaining M5a work (separate PRs after this lands)

  • scripts/run-jepsen-m5-local.sh — single-process two-group launch with --raftGroups + --shardRanges using elastickv-route-key to compute boundary keys
  • Setup-hook verification — Clojure ListRoutes gRPC client asserting multi-group routing is in place
  • Done-when check — workload exercises dispatchMultiShardTxn (server-side log marker or probe) and Elle finds zero G1c

Test plan

  • lein test elastickv.dynamodb-multi-table-workload-test → Ran 7 tests / 17 assertions, 0 failures, 0 errors
  • lein test elastickv.dynamodb-workload-test (existing single-table suite) → Ran 3 tests / 5 assertions, 0 failures (no regression)
  • End-to-end Jepsen run (gated on launch script + setup hook landing)

Second slice of Composed-1 M5a per PR #905 design doc §3.3 +
PR #911 checklist.  New self-contained workload variant of the
DynamoDB list-append test that exercises cross-table 2PC, which
the single-table elastickv.dynamodb-workload cannot do.

Background: kv/shard_key.go's dynamoRouteKey normalises every
DynamoDB table-meta / item / GSI key for one table to a single
per-table route key (!ddb|route|table|<base64(name)>).  Single-
table workloads always route to one shard regardless of
partition-key value, so dispatchMultiShardTxn /
commitSecondaryTxns / the ErrTxnSecondaryRouteShiftedAfterPrimaryCommit
sentinel never fire (codex P1 on PR #905 ffb9c73).

This workload:

  * Creates N=4 tables (jepsen_append_t1 .. jepsen_append_t4) in
    setup!.
  * Routes Elle's integer keys k via (mod k N)+1 -> table-idx
    deterministically, so each Elle key always lands at the
    same (table, pk) storage location (Elle's append checker
    still sees a single key namespace).
  * Default Elle txns of up to 4 mops with key-count=12
    distribute evenly across all 4 tables on every multi-op
    txn — a 4-key txn touches all 4 tables and therefore both
    Raft groups under the launch script's planned --shardRanges
    layout (t1-t2 -> group 1, t3-t4 -> group 2).
  * Uses TransactGetItems for atomic cross-table pre-read and
    TransactWriteItems for atomic cross-table writes (both
    DynamoDB APIs natively support multi-table txns).
  * Carries the same OCC condition expressions as the single-
    table workload so any G1c or G-single anomaly observed is
    attributable to the multi-shard 2PC path rather than to a
    workload-shape difference.

The file is intentionally self-contained (small helpers
duplicated from elastickv.dynamodb-workload rather than
extracted into a shared namespace).  Refactoring shared helpers
into a third namespace would couple the workloads through a
maintained API and obscure the side-by-side comparability that
is the point of running both — relevant because PR #905 commits
to keeping the single-table workload as-is for trend comparison.

Tests cover the public API surface plus the load-bearing
routing invariants:

  * builds-test-spec — :name = elastickv-dynamodb-append-multi-table
  * custom-options-override-defaults — option passthrough
  * host-override-creates-client — DDB client opens
  * key-routing-distributes-across-all-tables — 12 keys
    distribute 3-per-table across 4 tables
  * key-routing-table-name-matches-prefix — wraparound at k=4
  * key-routing-pks-disambiguate_colliding_keys — k=0 and k=4
    share table t1 but have distinct pks
  * multi-op-txn-spans_multiple_groups — default 4-key txn
    touches BOTH the planned group-1 tables (1,2) and group-2
    tables (3,4) — without this, dispatchMultiShardTxn never
    fires and the whole workload is a no-op

The routing helpers are private; the tests access them via
var-get on #'workload/key->table-idx etc. so a future change to
the routing surfaces in this test rather than as a silent G1c
during a Jepsen run.

Verification: lein test elastickv.dynamodb-multi-table-workload-test
-> Ran 7 tests containing 17 assertions, 0 failures, 0 errors.
lein test elastickv.dynamodb-workload-test (existing single-table
suite) -> Ran 3 tests containing 5 assertions, 0 failures.

Remaining M5a work (separate commits on this branch):
  * scripts/run-jepsen-m5-local.sh — single-process two-group
    launch with --raftGroups + --shardRanges using
    elastickv-route-key to compute boundary keys
  * Setup-hook verification — Clojure ListRoutes gRPC client
    asserting multi-group routing is in place
  * Done-when check — workload exercises dispatchMultiShardTxn
    (server-side log marker or probe) and Elle finds zero G1c
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 48 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a42e48c-cc17-432e-b842-a4a8ebc5af6b

📥 Commits

Reviewing files that changed from the base of the PR and between 384cacc and cf07ec0.

📒 Files selected for processing (2)
  • jepsen/src/elastickv/dynamodb_multi_table_workload.clj
  • jepsen/test/elastickv/dynamodb_multi_table_workload_test.clj
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/composed1-m5a-impl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jepsen workload, dynamodb-multi-table-workload, along with its corresponding unit tests. This workload is designed to exercise the multi-shard 2PC path in elastickv by distributing keys deterministically across four different DynamoDB tables, ensuring transactions span multiple Raft groups. There are no review comments to address, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 3, 2026

Workload draft + tests are stable; gemini's first pass returned zero findings. Could use a proper code review before iterating the remaining M5a slices (launch script + setup-hook verification).

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 3, 2026

Claude finished @bootjp's task in 4m 36s —— View job


Code Review

Tasks

  • Gather context and understand request
  • Read workload implementation (dynamodb_multi_table_workload.clj)
  • Read test file (dynamodb_multi_table_workload_test.clj)
  • Review: correctness and logic
  • Review: concurrency / distributed failure paths
  • Review: test coverage and assertions
  • Review: Clojure style and best practices
  • Post findings

Overall

The workload is well-structured. The routing design is sound, the OCC condition-expression pattern is faithfully carried over from the single-table workload, and the key tests lock down the two M5a-specific invariants (deterministic routing + multi-group span). Gemini's "no findings" read was broadly correct — there are no blocking bugs. The items below are one medium-severity correctness gap and several lower-priority issues.


Medium — Stale table data can produce false Elle anomalies across runs

File: dynamodb_multi_table_workload.clj:265

(teardown! [_this _test])

teardown! is a no-op. Tables accumulate appended values across runs. When a subsequent run starts, TransactGetItems will read the leftover lists from the previous run as snapshot, and value' will report reads that include values the current run's Elle checker never observed being appended. This gives Elle visibility of a history it didn't generate — the checker can flag those values as G-single or produce an incorrect analysis.

The single-table workload has the same gap, so this is an inherited pattern rather than a new defect. But with four tables instead of one, the blast radius for stale data is larger and consecutive runs in the launch script are more likely to hit it.

Suggested fix: delete all four tables in teardown!, or at minimum document a cluster-teardown step in the launch script that calls DeleteTable on each. Alternatively, append a per-run UUID or timestamp suffix to table-prefix so each test run owns its own table set without needing cleanup.

Fix this →


Low — TransactGetItems zip relies on positional alignment with no verification

File: dynamodb_multi_table_workload.clj:194-197

(into {}
      (map (fn [k entry] [k (read-list-attr (:Item entry))])
           ks-vec
           (:Responses resp)))

The DynamoDB API guarantees :Responses is positionally aligned with :TransactItems, but this is verified nowhere. If the cognitect SDK ever normalises the response (unlikely, but possible after a library upgrade) the snapshot map would be silently corrupt — every key mapped to the wrong value — and the OCC conditions would start accepting transactions they should reject. A one-line assertion ((assert (= (count ks-vec) (count (:Responses resp))))) would at least surface this class of failure loudly rather than silently. The same pattern is in the single-table workload.


Low — ConditionalCheckFailedException handler is dead code

File: dynamodb_multi_table_workload.clj:314-316

(contains? #{"ConditionalCheckFailedException"
             "InternalServerError"} err-type)
(assoc op :type :info :error (str err-type))

dynamo-append! (single-mop path) has no ConditionExpression, so it cannot return ConditionalCheckFailedException. The multi-mop path uses TransactWriteItems, which returns TransactionCanceledException on OCC failure — never ConditionalCheckFailedException. This arm is never reachable. Keeping it mirrors the single-table workload and is harmless, but it may mislead a future maintainer into thinking individual condition failures are handled distinctly from transaction cancellations.


Low — key-count / num-tables coupling not validated at construction time

File: dynamodb_multi_table_workload.clj:345, dynamodb_multi_table_workload_test.clj:47-51

The key-routing-distributes-across-all-tables test hard-codes (range 12) (matching the default key-count=12). But dynamodb-append-multi-table-workload accepts (:key-count opts), meaning the CLI can pass --key-count 10. With 10 keys and 4 tables, two tables get 2 keys and two get 3 — the even-distribution property breaks silently. The test would still pass (it tests 0..11), and the Jepsen run would proceed, but two tables would have lower key coverage and a 1-key txn on one of those tables would never trigger multi-group dispatch.

A guard at workload construction time (e.g., (when-not (zero? (mod key-count num-tables)) (warn ...))) or a test case with a non-multiple key-count would make this visible.


Trivial — Deftest naming inconsistency

File: dynamodb_multi_table_workload_test.clj:61

(deftest key-routing-pks-disambiguate_colliding_keys

Underscore in an otherwise hyphenated test name. Should be key-routing-pks-disambiguate-colliding-keys.

Fix this →


Design note — setup! doesn't wait for tables to reach ACTIVE state

CreateTable in real DynamoDB is asynchronous; writes immediately after creation fail with ResourceNotFoundException until the table is ACTIVE. The workload issues create-all-tables! in setup! and then starts sending traffic without polling DescribeTable. This works if elastickv's compatible implementation makes table creation synchronous, but is worth verifying before the end-to-end Jepsen run. If creation is async, a polling loop in create-one-table! (checking for ACTIVE) will prevent spurious ResourceNotFoundException failures that would be misread as network errors.


What looks good

  • Routing math is correct. (key->table-idx k) = (inc (mod k 4)) and (key->pk k) = (str (quot k 4)) produce a unique (table, pk) pair for every non-negative integer k. Verified analytically and by the four routing tests.
  • OCC conditions faithfully ported. ConditionCheck on read-only keys and snapshot-conditioned Update on written keys are identical to the single-table workload. The multi-table delta is purely TableName routing — no logic drift.
  • RYOW simulation is correct. Local appends are accumulated before dynamo-transact-write! is called; :r mops in the same txn see those values; :ok is only returned if the write succeeds.
  • by-key grouping prevents duplicate-item rejection. Multiple appends to the same Elle key are merged into one Update, satisfying DynamoDB's constraint against two operations on the same item in a single TransactWriteItems.
  • multi-op-txn-spans_multiple_groups closes the main M5a regression hole. The test is correctly structured; (some keys-in-group1 tables) usage is idiomatic.
  • Intentional duplication rationale is well-documented. The ns docstring explains why shared helpers weren't extracted.

… key-count guard (claude[bot] on PR #916)

claude[bot]'s review of dded8b96 surfaced one medium-severity
correctness gap and several low-priority items.  All addressed:

* MEDIUM — Stale table data across runs (teardown! no-op).
  Tables accumulated appended values across runs; the next
  TransactGetItems pre-read picked them up as snapshot, :r mops
  reported them to Jepsen, and Elle saw a history containing
  writes it never generated — spurious G-single / G1c.

  Fix: added delete-all-tables! mirroring create-all-tables!'s
  exception handling (ignore ResourceNotFoundException for
  idempotency), and called it from the client's teardown!.

* LOW — TransactGetItems positional alignment unverified.
  DynamoDB guarantees :Responses is positionally aligned with
  :TransactItems, but the workload relied on this without an
  assertion.  A future cognitect SDK that ever normalises the
  response would silently corrupt the snapshot map.

  Fix: one-line assert on (count ks-vec) vs (count Responses)
  so the failure surfaces loudly rather than via mysterious
  OCC behaviour.

* LOW — key-count must be a multiple of num-tables.
  Default key-count=12 splits evenly across N=4 tables, but
  --key-count 10 (or any non-multiple) silently gives some
  tables fewer keys, breaking the multi-group-span invariant
  that is the entire point of this workload.

  Fix: construction-time (assert (zero? (mod key-count
  num-tables))) so an uneven split fails-fast at workload
  build time rather than producing a confusing test run.
  Two new tests pin this: rejects-key-count-not-divisible-
  by-num-tables (10, 7 reject) and accepts-divisible-key-
  counts (4, 8, 16 accept).

* TRIVIAL — deftest naming inconsistency.  Two test names
  had stray underscores instead of hyphens
  (key-routing-pks-disambiguate_colliding_keys and
  multi-op-txn-spans_multiple_groups).  Renamed both to
  the project's hyphen-case convention.

Deferred (claude[bot] notes):

* LOW — ConditionalCheckFailedException dead-code arm.
  Mirrors the single-table workload; not blocking; kept
  for parity until a single-table cleanup lands.

* DESIGN NOTE — setup! doesn't wait for ACTIVE state.
  Real DynamoDB's CreateTable is async, but elastickv's
  implementation may be synchronous.  Worth verifying at
  the end-to-end Jepsen run; if needed, will add a
  DescribeTable poll loop.

Verification: lein test
elastickv.dynamodb-multi-table-workload-test ->
Ran 9 tests containing 22 assertions, 0 failures, 0 errors.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 3, 2026

Pushed as c46b2b5 — claude[bot]'s findings on aded8b9 addressed.

Medium — stale table data across runs. teardown! was a no-op, so the next run's TransactGetItems pre-read picked up leftover Items as snapshot, :r mops reported them to Jepsen, and Elle saw a history containing writes it never generated.

Fix: added delete-all-tables! mirroring create-all-tables!'s exception handling (ignore ResourceNotFoundException for idempotency), and called it from the client's teardown!.

Low — TransactGetItems positional alignment unverified. One-line assert on (count ks-vec) = (count :Responses) so a future cognitect SDK that normalises the response surfaces loudly.

Low — key-count must be multiple of num-tables. --key-count 10 would silently give some tables fewer keys, breaking the multi-group-span invariant. Construction-time assert fails-fast; two new tests pin both reject (10, 7) and accept (4, 8, 16) cases.

Trivial — deftest naming. Two test names had stray underscores; renamed to project's hyphen-case convention.

Deferred (claude[bot] explicitly tagged as low-priority):

  • ConditionalCheckFailedException dead-code arm — mirrors single-table workload, kept for parity until a single-table cleanup lands.
  • setup! doesn't wait for ACTIVE state — real DynamoDB's CreateTable is async but elastickv's may be synchronous; will verify at the end-to-end Jepsen run and add a DescribeTable poll if needed.

lein test elastickv.dynamodb-multi-table-workload-test → Ran 9 tests / 22 assertions, 0 failures, 0 errors.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 3, 2026

Claude finished @bootjp's task in 1m 43s —— View job


Code Review — follow-up pass on c46b2b5

Tasks

  • Load tools
  • Read updated workload implementation
  • Read updated test file
  • Review: fixes from previous round (teardown!, assert, key-count guard, naming)
  • Review: deferred items and remaining concerns
  • Post findings

Verdict: all four previous findings correctly addressed — ready to remove draft once launch script lands


Fix-by-fix verification

Medium (stale table data) → ✅ Fixed correctly

delete-all-tables! (lines 159–171) mirrors create-all-tables! exactly: doseq over table-names, ignores ResourceNotFoundException so teardown is idempotent across partial-setup failures and double-teardown. Called from teardown! (lines 297–301). The comment in the docstring and in teardown! correctly records why this matters for Elle history correctness.

Low (TransactGetItems positional alignment) → ✅ Fixed correctly

Lines 224–225 add the assert with a meaningful failure message. One note: Clojure assertions can be elided globally via (set! *assert* false) / :elide-assertions true, but in a Jepsen test process that is never the case, so this is sufficient.

Low (key-count % num-tables guard) → ✅ Fixed correctly

Construction-time assert at lines 388–391. Two new rejection tests (rejects-key-count-not-divisible-by-num-tables) cover 10 and 7; three acceptance tests (accepts-divisible-key-counts) cover 4, 8, 16. The test count matches the reported 9 tests / 22 assertions.

Trivial (deftest naming) → ✅ Fixed

key-routing-pks-disambiguate-colliding-keys at line 61 is now hyphen-case throughout.


Deferred items — status unchanged, dispositions look right

ConditionalCheckFailedException dead-code arm — kept for parity with the single-table workload. Fine; a future cleanup can remove both together.

setup! doesn't poll for ACTIVE — gating this on the end-to-end Jepsen run is correct. If elastickv's CreateTable is synchronous (as most compatible implementations are), no change is needed. If the run surfaces spurious ResourceNotFoundException failures immediately after setup!, add a DescribeTable poll there.


One new observation (no action needed before unmark-draft)

teardown! is called on the first-opened client only

In Jepsen's lifecycle, teardown! fires once on a single node's client, then close! fires on every node's client. The ddb field in teardown!'s scope is the client opened for that one node — delete-all-tables! only needs a valid connection to any node, so this is fine. Worth knowing for the launch-script author: if the teardown node is unreachable, the deletion will fail. The ResourceNotFoundException-only guard in delete-one-table! does not absorb network errors (cognitect.anomalies/unavailable). If partial network faults are left active at teardown time, tables may not be cleaned up. No change needed now; document in the launch-script teardown notes.


What looks good in c46b2b5

  • delete-all-tables! docstring explains the Elle-history rationale clearly — future maintainers will understand why this isn't optional cleanup.
  • The assert message at line 225 includes the contract text ("positionally aligned with the requested items"), not just the check condition — good.
  • The assert message at line 389–391 interpolates both the actual value and the bound so the failure is immediately actionable from a CLI invocation.
  • Test count math: 3+1+1+1+5+4+2+3+2 = 22 assertions across 9 tests — matches the reported run output exactly.

… (claude[bot] design note on c46b2b5)

claude[bot]'s follow-up review of c46b2b5 listed two deferred
items:

* ConditionalCheckFailedException dead-code arm — mirrors the
  single-table workload, claude OK'd 'A future cleanup can
  remove both together.'  Still deferred.

* setup! doesn't wait for ACTIVE state — claude OK'd 'gating
  this on the end-to-end Jepsen run is correct' but flagged
  the assumption is unverified.

The second one is cheap to lock down with actual code
evidence rather than leaving it speculative.  Verified
against adapter/dynamodb.go:779-783 — elastickv's CreateTable
handler returns TableStatus=ACTIVE synchronously in the
response body, so the table is queryable as soon as
create-one-table! returns.  No DescribeTable poll needed for
the elastickv target.

Real DynamoDB's CreateTable is async; if this workload is
ever pointed at a real DynamoDB endpoint (unlikely — Jepsen
is wired against elastickv only), a poll loop must be added.
Documented at the setup! call site so a future reader doesn't
have to re-do the adapter inspection.

Doc-comment-only change.  lein test
elastickv.dynamodb-multi-table-workload-test still 9 tests /
22 assertions, 0 failures, 0 errors.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 3, 2026

You linked claude[bot]'s initial review of aded8b9 (the 6-finding pass). All 4 actionable findings from that review were addressed in c46b2b5 and confirmed by claude[bot]'s follow-up at 14:54:50 — both deferrals were explicitly OK'd:

  • ConditionalCheckFailedException dead-code arm — kept for parity with the single-table workload; claude said "Fine; a future cleanup can remove both together."
  • setup! doesn't poll for ACTIVE state — claude said "gating this on the end-to-end Jepsen run is correct."

The second deferral was the only one with an unverified assumption: claude[bot] flagged it would matter "if elastickv's CreateTable is async like real DynamoDB." Locked that down with actual code evidence in cf07ec0adapter/dynamodb.go:779-783 returns TableStatus=ACTIVE synchronously in the response body, so the table is queryable as soon as create-one-table! returns. Added a comment at the setup! call site so a future reader (or someone retargeting this workload at real DynamoDB) doesn't have to re-do the adapter inspection.

Doc-only; lein test still 9 / 22 passing.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 3, 2026

Claude finished @bootjp's task in 1m 8s —— View job


Code Review — third pass on cf07ec0

Tasks

  • Load tools
  • Read updated Clojure workload file
  • Read updated test file
  • Verify synchronous ACTIVE claim in adapter/dynamodb.go:779-783
  • Check git log for cf07ec0 diff
  • Confirm all prior findings addressed; post final verdict

Verdict: clean — all prior findings resolved, doc-only commit is correct, no new issues


cf07ec0 verification

The synchronous ACTIVE claim is correct and well-evidenced.

adapter/dynamodb.go:779-783 writes TableStatus=ACTIVE only after createTableWithRetry returns nil. Tracing that call (dynamodb.go:856-888): it calls d.coordinator.Dispatch(ctx, req) and only returns nil once Raft consensus completes and the create-table entry is applied to the FSM. There is no goroutine or background channel — the HTTP response is written synchronously after commit. The table is queryable the moment create-one-table! returns.

The added comment at setup! (lines 295–301) correctly documents:

  • why no poll loop is needed against elastickv
  • the exact code location confirming it (adapter/dynamodb.go:779-783)
  • the forward-compatibility note for real DynamoDB endpoints

No doc content is incorrect or misleading.


Status of all findings from prior rounds

Finding Status
Medium — stale table data / teardown! no-op Fixed in c46b2b5delete-all-tables! with idempotent guard
Low — TransactGetItems positional alignment unverified Fixed in c46b2b5 — assert with message at line 224
Low — key-count / num-tables coupling unguarded Fixed in c46b2b5 — construction-time assert + 5 tests
Trivial — deftest naming underscores Fixed in c46b2b5
Low — ConditionalCheckFailedException dead-code arm Deferred by design (parity with single-table workload)
Design note — setup! ACTIVE poll Locked down in cf07ec0 — synchronous by inspection

What looks good overall

The workload is sound. The routing math ((inc (mod k N)) / (str (quot k N))) produces unique (table, pk) pairs across all non-negative Elle integer keys. The OCC condition-expression pattern is faithfully carried over from the single-table workload. The test suite covers the M5a-specific invariants (deterministic routing + multi-group span + key-count divisibility), and the routing tests use var-get on private vars so a future routing change surfaces here rather than as a silent false negative in a Jepsen run. All comments that reference prior review rounds are accurate.

Ready to unmark draft once the launch script and setup-hook verification PRs land.

@bootjp bootjp marked this pull request as ready for review June 3, 2026 17:24
@bootjp bootjp merged commit 78fae66 into main Jun 3, 2026
9 checks passed
@bootjp bootjp deleted the feat/composed1-m5a-impl branch June 3, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant