backup: #35 - validate name/dir consistency in S3 + DDB reverse encoders#912
Conversation
The S3 and DDB reverse encoders use two different identifiers for
the same logical entity:
- The on-disk subdirectory name (`s3/<bucketDir>/`,
`dynamodb/<tableDir>/`), which the decoder spells as
`EncodeSegment([]byte(name))`.
- The name embedded inside `_bucket.json` / `_schema.json`, which
the encoder uses to build MVCC keys
(`s3keys.BucketMetaKey(pub.Name)`,
`base64.RawURLEncoding.EncodeToString([]byte(tableName))`).
Until now, the encoder trusted both sides independently. A tampered
or hand-rebuilt dump where the operator renamed the directory
without touching the JSON (or vice versa) would silently emit
records keyed by the JSON name while pulling object/item bytes
from a mismatched filesystem path — a name/dir consistency
violation that splits a single restore artifact across two
different identities.
## Fix
`encodeBucket` (S3) and `encodeTable` (DDB) now check, immediately
after the empty-name / empty-key guards, that
`EncodeSegment([]byte(name)) == <dir>`. Mismatch fails closed with
the existing per-adapter sentinel (`ErrS3EncodeInvalidBucket` /
`ErrDDBEncodeInvalidSchema`), with an error message that names
both the JSON value and the encoded form so an operator can
correct the dump.
## Pinned by
- `TestS3EncodeRejectsNameDirMismatch`: dir `wrong-dir` +
`_bucket.json` with `name=real` → `ErrS3EncodeInvalidBucket`.
- `TestDDBEncodeRejectsNameDirMismatch`: dir `wrong-dir` +
`_schema.json` with `table_name=real` → `ErrDDBEncodeInvalidSchema`.
All existing positive tests use `EncodeSegment([]byte(name))` for
the dir name, so they continue to pass. Existing rejection tests
trip earlier guards (empty name, format_version, schema-shape)
before the new consistency check fires, so they also remain green.
## Caller audit per CLAUDE.md semantic-change rule
- `encodeBucket` (S3): success path gains one new fail-closed
check between the empty-name guard and the meta-record emission.
Sole production caller is `S3RecordEncoder.Encode`.
- `encodeTable` (DDB): success path gains one new fail-closed
check between the empty-name guard and the schema-marshal step.
Sole production caller is `DynamoDBEncoder.Encode`.
Both new checks use the existing per-adapter sentinels, so the
encoder's existing `errors.Mark(ErrEncodeAdapterData)` wrap at
`runAdapterEncoders` still routes the failure through the CLI's
exit-2 (data-correctness) classification — no new sentinel needed
in `classifyEncodeError`.
Tests + lint green.
|
Warning Review limit reached
More reviews will be available in 20 minutes and 43 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces name/directory consistency checks for both DynamoDB and S3 encoders to ensure that the on-disk directory name matches the encoded table or bucket name, preventing silent consistency violations in tampered or hand-rebuilt dumps. Unit tests have been added to verify that the encoders fail closed on mismatches. All review comments have been filtered out, and I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
@claude review This is the implementation PR for follow-up task #35 (name/dir consistency check). Adds a fail-closed validation in both Tests pinned:
Gemini posted "no feedback" earlier; no other bot reviews yet. |
|
Claude finished @bootjp's task in 2m 58s —— View job Code Review
The implementation is correct and the security intent is sound. One pre-existing test is silently broken by the check ordering, and there's a minor redundancy in the error messages. Finding 1 —
|
Claude's #912 review caught that this test no longer exercises its target guard after the name/dir consistency check landed: - Test used dir="tbl" with table_name="x" - New name/dir check (line 130 of encodeTable) fires first since EncodeSegment("x")="x" != "tbl", returning ErrDDBEncodeInvalidSchema - Empty-hash-key guard at line 138 is never reached - Same sentinel means the test silently passed while pinning the WRONG behavior (a name/dir mismatch test, not the empty-hash-key test it documents) Fix: align dir to EncodeSegment("x") so the name/dir check passes and the empty-hash-key guard at line 138 is actually exercised. Added a comment explaining the fixture's gate ordering so a future guard insertion can't silently break this test again. No production code change — encoder gates are unchanged; only the test fixture is realigned.
|
@claude review Fixes the claude review finding on the prior revision: Finding 1 (TestDDBEncodeRejectsEmptyHashKey silent regression): The test used Finding 2 (cosmetic redundancy): Left as-is per claude's note that it's cosmetic-only. No production code change — encoder gates are unchanged; only the test fixture is realigned. |
|
Claude finished @bootjp's task in 1m 38s —— View job Code Review
Finding 1 (TestDDBEncodeRejectsEmptyHashKey) — Fixed correctly. The revised fixture at Remaining All four tests that still use a mismatched
Each test is exercising exactly the guard it names. S3 tests — unaffected and clean. The fix is minimal, correct, and addresses the regression exactly as described. No further issues found. LGTM. |
Summary
encodeBucket) and DDB (encodeTable) now check that the on-disk subdirectory matchesEncodeSegment([]byte(name))from_bucket.json/_schema.jsonbefore emitting any records.Background
The reverse encoders use two identifiers for the same logical entity:
s3/<bucketDir>/,dynamodb/<tableDir>/— decoder spells asEncodeSegment([]byte(name))._bucket.json/_schema.json— encoder uses for MVCC keys (s3keys.BucketMetaKey(pub.Name), base64'dtableName).Until now, both sides were trusted independently. A tampered or hand-rebuilt dump that renamed the directory without touching the JSON (or vice versa) would silently emit records keyed by the JSON name while pulling object / item bytes from a mismatched filesystem path — a name/dir consistency violation that splits a single restore artifact across two identities.
Fix
After the existing empty-name guard, both encoders verify
EncodeSegment([]byte(name)) == <dir>. Mismatch fails closed with the existing per-adapter sentinel (ErrS3EncodeInvalidBucket/ErrDDBEncodeInvalidSchema), and the error message names both the JSON value and the encoded form so the operator can correct the dump.Caller audit (CLAUDE.md semantic-change rule)
encodeBucket(S3): success path gains one new fail-closed check; sole production caller isS3RecordEncoder.Encode.encodeTable(DDB): success path gains one new fail-closed check; sole production caller isDynamoDBEncoder.Encode.runAdapterEncoders'errors.Mark(ErrEncodeAdapterData)wrap still routes failures through the CLI's exit-2 classification — no new sentinel needed inclassifyEncodeError.Self-review of code changes (5 lenses)
EncodeSegmentcall and string compare per bucket / per table. Negligible.<dir> == EncodeSegment(name)for every published record.EncodeSegmentfor the dir spelling, so they remain green; existing rejection tests trip earlier guards, also green.Test plan
go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/...golangci-lint --config=.golangci.yaml run