docs: M5-3 proposed - SQS partitioned-FIFO reverse encoder#914
Conversation
Phase 0b M5-3 design doc. Lifts the ErrSQSEncodeUnsupportedPartitioned gate that M5-1 (#849) and M5-2 (#892) explicitly deferred. Adds: - sqsMessageRecord.Partition field + decoder plumbing. - Partitioned key-family emission (data/vis/byage/dedup) when meta.PartitionCount > 1. - Out-of-range and classic-vs-partitioned mismatch fail-closed guards. Group lock rows remain unemitted (carry-over from M5-2 — emitting any row falsely blocks the group). CLAUDE.md design-doc-first workflow: this doc lands before the implementation PR.
|
Warning Review limit reached
More reviews will be available in 30 minutes and 3 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 744ecbb5db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | `data` | `!sqs\|msg\|data\|<queue-seg><gen-BE><msgID-seg>` | `!sqs\|msg\|data\|p\|<queue-seg>\|<part-BE><gen-BE><msgID-seg>` | | ||
| | `vis` | `!sqs\|msg\|vis\|<queue-seg><gen-BE><visibleAt-BE><msgID-seg>` | `!sqs\|msg\|vis\|p\|<queue-seg>\|<part-BE><gen-BE><visibleAt-BE><msgID-seg>` | | ||
| | `byage` | `!sqs\|msg\|byage\|<queue-seg><gen-BE><sendTs-BE><msgID-seg>` | `!sqs\|msg\|byage\|p\|<queue-seg>\|<part-BE><gen-BE><sendTs-BE><msgID-seg>` | | ||
| | `dedup` | `!sqs\|msg\|dedup\|<queue-seg><gen-BE><dedupID-seg>` | `!sqs\|msg\|dedup\|p\|<queue-seg>\|<part-BE><gen-BE><group-seg><dedupID-seg>` | |
There was a problem hiding this comment.
Add the missing group/dedup delimiter
For partitioned FIFO dedup keys, this shape omits the literal terminator between the encoded group ID and dedup ID. The live constructor appends sqsPartitionedQueueTerminator after encodeSQSSegment(groupID) and before encodeSQSSegment(dedupID) in adapter/sqs_keys.go, with comments explaining that back-to-back raw-base64 segments are ambiguous and can make distinct (groupID, dedupID) pairs collapse onto the same key. If the M5-3 implementation follows this table, restored partitioned queues can reintroduce false duplicate suppression across groups; the documented shape should include the separator.
Useful? React with 👍 / 👎.
|
|
||
| ## Decoder lift (M5-1 follow-up) | ||
|
|
||
| The decoder's `decodeSQSMessageValue` (`internal/backup/sqs.go:719`) already runs after the partition trailer has been parsed by `sqsParsePartitionedMsgKey` (`sqs.go:600`). M5-3 plumbs the partition number through `sqsMessageRecord.Partition` and writes it to `messages.jsonl`. |
There was a problem hiding this comment.
Specify returning the partition from key parsing
This describes plumbing the partition after parsing, but the current decoder path only validates the partitioned trailer and returns the encoded queue segment: parseSQSMessageDataKey calls parseSQSPartitionedQueueAndTrailer, whose return type is just (string, error). Since the live message value decoded by decodeSQSMessageValue does not contain the partition, an implementation that only updates sqsMessageRecord/decodeSQSMessageValue as written here still has no source for the partition and will emit all new dumps with the zero default. The M5-3 slice should explicitly require changing the data-key parser/call site to return the parsed uint32 partition from the key.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request proposes a design document for the SQS partitioned-FIFO reverse encoder (Phase 0b M5-3) to support queues with multiple partitions. The feedback highlights critical improvements to the design: using a pointer *uint32 for the Partition field to prevent silent corruption and ensure JSON consistency, including sequence_number in the sorting criteria to guarantee deterministic re-encodes, and adding a fail-closed validation check when partition information is missing in partitioned queues.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // sqsMessageRecord adds (in M5-3): | ||
| Partition uint32 `json:"partition,omitempty"` | ||
| ``` | ||
|
|
||
| `omitempty` is load-bearing — every classic-queue dump produced before M5-3 lands has no `partition` field, and the encoder MUST default to `partition=0` (the only valid value for `partition_count == 1`). New partitioned-queue dumps populate `partition` from the live key's partition trailer. |
There was a problem hiding this comment.
Using a plain uint32 with omitempty means that for partitioned queues (PartitionCount > 1), any message belonging to partition 0 will have its partition field omitted from the JSON. This has two major drawbacks:
- Inconsistent JSON representation: Some messages in a partitioned queue dump will have
"partition": Nwhile others will completely lack the field, making the schema inconsistent and harder for external tools to parse. - Silent Corruption Risk: If an operator attempts to restore a legacy classic dump (which has no
partitionfields) into a partitioned queue (e.g., by manually changingPartitionCountin_queue.json), the encoder will silently default all messages toPartition = 0. This leads to silent routing/FIFO consistency corruption because the messages are not distributed across partitions according to theirMessageGroupIDhash.
Recommendation:
Use a pointer *uint32 for the Partition field:
// sqsMessageRecord adds (in M5-3):
Partition *uint32 `json:"partition,omitempty"`- For classic queues,
Partitioncan be left asnil, so it is completely omitted, preserving backward compatibility. - For partitioned queues,
Partitionis populated as a pointer to the partition number (even if it is0). This ensures that"partition": 0is explicitly written, making the JSON consistent and allowing the encoder to detect and reject legacy dumps being restored to partitioned queues without repartitioning.
|
|
||
| 1. **Drop `ErrSQSEncodeUnsupportedPartitioned`.** Remove the `meta.PartitionCount > 1` gate at line 162. | ||
| 2. **Branch on `PartitionCount`.** When `> 1`, use partitioned key constructors (duplicated from `adapter/sqs_keys.go` following the established M3b-3 GSI pattern). When `<= 1`, classic constructors as today. | ||
| 3. **Group-by-partition before emit.** Sort messages by `(partition, send_timestamp_millis, message_id)` so per-partition order is stable across runs — required for byte-identical re-encodes. |
There was a problem hiding this comment.
The sorting criteria for partitioned FIFO queues should include sequence_number to ensure strict stability and correctness, matching the classic FIFO sorting logic in sortMessagesForEmit.
Without sequence_number, messages with the same send_timestamp_millis within the same partition might be reordered across runs, breaking the deterministic byte-identical re-encode guarantee.
Recommendation:
Update the sorting criteria to:
Sort messages by (partition, send_timestamp_millis, sequence_number, message_id)
|
|
||
| The encoder fails closed with the existing per-adapter sentinels on: | ||
|
|
||
| - `meta.PartitionCount > 1` AND any message has `Partition == 0` AND the dump's record count for partition 0 doesn't match the live partition assignment. (Detectable only if the encoder can recompute the partition; deferred to a self-test invariant rather than a runtime check.) |
There was a problem hiding this comment.
If we adopt the *uint32 pointer representation for the Partition field, we can implement a robust, simple runtime validation check instead of deferring it to a self-test invariant.
Specifically, if meta.PartitionCount > 1 and any message has Partition == nil (indicating a legacy dump without partition info), the encoder should immediately fail closed. This prevents accidental/silent corruption when restoring a classic dump into a partitioned queue.
Recommendation:
Update this invariant to:
meta.PartitionCount > 1AND any message hasPartition == nil-> fail closed with a new sentinel (e.g.,ErrSQSEncodeMissingPartition).
Two doc fixes from the first-round review: 1. Codex P2 (L16): the partitioned-dedup key shape was missing the sqsPartitionedQueueTerminator between <group-seg> and <dedupID-seg>. The live sqsPartitionedMsgDedupKey (adapter/sqs_keys.go:389) emits this delimiter because base64.RawURLEncoding has no padding, so back-to-back raw-base64 segments would let distinct (group,dedup) pairs collapse onto the same key. Add the delimiter to the table + a notation paragraph explaining why every partitioned key carries the terminator after <queue-seg> AND why dedup additionally carries one between group/dedup. 2. Codex P2 (L38): v1 said the partition is 'plumbed through sqsMessageRecord.Partition' but parseSQSMessageDataKey / parseSQSPartitionedQueueAndTrailer currently return only the encoded queue (with the partition u32 parsed-then-discarded inside parseSQSPartitionedQueueAndTrailer). v2 explicitly requires extending both signatures to return the partition, and clarifies the message value itself never carried partition — the key is the source of truth.
|
@claude review v2 addresses both codex P2 findings from the v1 review: Codex P2 (L16) — partitioned-dedup key was missing the sqsPartitionedQueueTerminator between and . The live sqsPartitionedMsgDedupKey (adapter/sqs_keys.go:389) emits this delimiter because base64.RawURLEncoding has no padding and back-to-back segments would let distinct (group,dedup) pairs collapse onto the same key. Table now shows the delimiter; notation paragraph explains why every partitioned key carries a terminator after AND why dedup additionally carries one between group/dedup. Codex P2 (L38) — v1 was under-specified on partition plumbing. parseSQSMessageDataKey / parseSQSPartitionedQueueAndTrailer currently return only the encoded queue; partition uint32 is parsed-then-discarded inside the partitioned trailer parser. v2 explicitly requires extending both signatures to return the partition; clarifies the message value itself never carried partition — the key is the source of truth. |
|
Claude finished @bootjp's task in 3m 55s —— View job Review: M5-3 SQS partitioned-FIFO reverse encoder (v2)
Codex P2 fixes — both verified ✅P2 (L16) — terminator in dedup key. The v2 table (line 16) now shows P2 (L38) — partition plumbing. The current code at Remaining open items from Gemini — my take1. With The specific scenario Gemini flags: an operator manually edits Adopting The tradeoff is pointer-dereference noise in the encoder loop. That's real but minor. Recommend the implementation PR adopt 2. The doc specifies Recommend: 3. Fail-closed for nil partition (Gemini L86) — follows from #1. If New finding:
|
| Item | Severity | Recommendation |
|---|---|---|
*uint32 for Partition field |
Medium — silent data corruption risk | Adopt in impl PR; add ErrSQSEncodeMissingPartition |
sequence_number in sort key |
Low — byte-identical re-encode correctness | Add as tiebreaker |
| Fail-closed for nil partition | Medium — follows from *uint32 decision |
Implement if *uint32 is adopted |
parseSQSGenericKey not mentioned |
Low — mechanical update to a call site | Add to "Files to modify" |
decodeSQSMessageValue partition wiring |
Low — unclear abstraction level | Specify in doc |
The design doc can merge as-is with these noted as items for the implementation PR, or the doc can be updated to resolve the *uint32 decision and the sequence_number sort key before merge so the implementation PR has a fully-specified contract. Either is workable; I lean toward resolving them in the doc before the impl PR starts, since the *uint32 decision affects the fail-closed invariants, the JSON schema, and the validation test inputs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48bcab7512
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| ```go | ||
| // sqsMessageRecord adds (in M5-3): | ||
| Partition uint32 `json:"partition,omitempty"` |
There was a problem hiding this comment.
Require partition presence for partitioned dumps
Because partition 0 is a valid partition, omitempty makes a partition-0 message indistinguishable from a message whose partition was never captured. The current decoder already recognizes partitioned data keys but discards the parsed partition, so an existing format_version=1 dump of a partitioned queue can contain no partition fields; after this design lifts the encoder gate and defaults missing values to 0, replaying that dump would silently move every message from partitions 1..N-1 into partition 0. The implementation needs a way to preserve presence (for example no omitempty on partitioned dumps or a pointer/presence bit) and fail closed when partition_count > 1 but the field is absent.
Useful? React with 👍 / 👎.
…x P1 + claude review) Three findings from the v2 review: 1. Codex P1 (L27) + Gemini #1: uint32+omitempty made partition-0 indistinguishable from absent. Replace with *uint32: nil = pre-M5-3 dump or never-captured, non-nil = explicitly captured (incl. 0). New ErrSQSEncodeMissingPartition fail-closed sentinel when partition_count>1 AND rec.Partition==nil. Decision matrix table added to make all 6 (partition_count × Partition shape) cases explicit. 2. Gemini #2 (sort): include sequence_number in the per-partition sort key. Now (partition, send_timestamp_millis, sequence_number, message_id) — matches the classic path's tiebreakers. 3. Claude review (call-site plumbing): explicit notes that HandleMessageData wires rec.Partition = &partition only when isPartitioned; decodeSQSMessageValue keeps its current (value)->(*record, error) signature; parseSQSGenericKey wrapper updated mechanically to discard the new partition return. New test pinned: TestSQSEncodeRejectsMissingPartitionOnPartitionedQueue.
|
@claude review v3 addresses codex P1 + 3 gemini mediums + 2 claude findings:
|
|
Claude finished @bootjp's task in 3m 25s —— View job Review: M5-3 SQS partitioned-FIFO reverse encoder (v3)
All 5 claimed v3 improvements verified ✅1. Codex P1 / Gemini #1 — 2. Gemini #2 — sort key. 3. Gemini #3 — 4. Claude (HandleMessageData) — call-site wiring. 5. Claude (parseSQSGenericKey) — wrapper mechanical update. Minor inaccuracies — fix before or alongside the implementation PR1. "Classic path uses the same four-field sort" (line 93) — classic is 3-field. The live case a.SendTimestampMillis != b.SendTimestampMillis: ...
case a.SequenceNumber != b.SequenceNumber: ...
default: a.MessageID < b.MessageIDThat's 3 fields, not 4. The partitioned path is the 4-field version (adds 2. Line 75 says: " The live signature ( func decodeSQSMessageValue(value []byte) (sqsMessageRecord, error)It returns a value type, not a pointer. The design doc's description is wrong; either a typo or a copy of the proposed M5-3 change. Clarify whether M5-3 keeps the value return or switches to a pointer return — the current doc implies the latter while claiming "keeps its current signature." Low-severity implementation gaps — note for the impl PR3.
4.
Editorial
SummaryAll 5 v3 improvements are correctly implemented in the doc — the
|
…cher (claude v914 v3) Four findings from the v3 claude review, all editorial / mechanical: 1. Minor #1: "classic path four-field sort" wording at L93 was wrong. The live sortMessagesForEmit (sqs.go:815) is a 3-field tuple (send_timestamp_millis, sequence_number, message_id). Partitioned path prepends partition as the leading key. Reworded. 2. Minor #2: decodeSQSMessageValue return type at L75 — live sig is (value []byte) (sqsMessageRecord, error), a value not a pointer. Doc previously said *sqsMessageRecord — corrected. 3. Low #3: encodeMessageSideRecords partition param plumbing now spelled out explicitly in the §"Encoder lift" side-record section. Branches on partition != nil rather than re-reading meta.PartitionCount. 4. Low #4: addMessage / partitioned dispatch named. v4 threads partition *uint32 through addMessage (rather than spawning a peer addPartitionedMessage) so addMessage + encodeMessageSideRecords share the same dispatch source. 5. Editorial: Files-to-modify comment for sqs.go now lists all four touched functions (sqsMessageRecord +Partition, parseSQSMessageDataKey + parseSQSPartitionedQueueAndTrailer new returns, HandleMessageData wiring, parseSQSGenericKey wrapper discard).
|
@claude review v4 addresses all four v3 findings:
|
|
Claude finished @bootjp's task in 2m 39s —— View job Review: M5-3 SQS partitioned-FIFO reverse encoder (v4)
All 5 v4 improvements verified ✅1. Classic sort field count (line 93). case a.SendTimestampMillis != b.SendTimestampMillis: ...
case a.SequenceNumber != b.SequenceNumber: ...
default: a.MessageID < b.MessageIDDoc now correctly says "3-field, partition prepended" — not the erroneous "four-field sort" from v3. ✅ 2. 3. 4. 5. Editorial: One remaining naming error — correct before or during the implementation PR
The doc refers to the per-message side-record dispatcher as
The actual live function is if err := e.addSideRecords(b, meta.Name, &meta, &records[i]); err != nil {This is the only concrete naming error remaining. An implementation PR author searching for Everything else looks correct
SummaryAll 5 v3 findings are cleanly resolved in v4. The doc is ready to merge with one correction:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 810634d596
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| The full decision matrix lives in the table under §"Dump-format change." The encoder fails closed with these sentinels: | ||
|
|
||
| - `meta.PartitionCount > 1` AND `rec.Partition == nil` (pre-M5-3 dump under lifted gate, or M5-3 decoder bug) → **new sentinel `ErrSQSEncodeMissingPartition`**. The operator must re-decode with an M5-3 decoder; replaying a legacy dump into a partitioned queue would silently move every message to partition 0 (codex P1 / gemini #914 v2). | ||
| - `meta.PartitionCount > 1` AND `*rec.Partition >= meta.PartitionCount` → **new sentinel `ErrSQSEncodeOutOfRangePartition`** (out-of-range partition number, dump is malformed). |
There was a problem hiding this comment.
Reject nonzero partitions for perQueue HT-FIFO
For valid HT-FIFO queues with partition_count > 1 but fifo_throughput_limit == "perQueue", the live router forces every group to partition 0 (adapter/sqs_partitioning.go:71-72) and ReceiveMessage only scans one effective partition (adapter/sqs_keys_dispatch.go:125-126). This invariant only checks *rec.Partition < meta.PartitionCount, so a dump line with partition 1..N-1 would be accepted and restored under |p|<nonzero>, where it will not be found by the live receive fanout. Please fail closed against effectivePartitionCount/perQueue semantics, or explicitly require partition 0 in that mode.
Useful? React with 👍 / 👎.
… + claude v914 v4) Two findings from the v4 review: 1. Codex P2 v914 v4 (L110): For FIFO partitioned queues with FifoThroughputLimit == "perQueue", the live partitionFor at adapter/sqs_partitioning.go:71-72 forces every group to partition 0 regardless of PartitionCount, and ReceiveMessage only scans the partition-0 lane (adapter/sqs_keys_dispatch.go:125-126). The v4 invariants only checked *rec.Partition < PartitionCount, so a dump line with partition 1..N-1 would be accepted and restored onto |p|<nonzero> lanes the live receive never visits — silent data loss on first read. v5 adds new sentinel ErrSQSEncodePartitionRoutingMismatch and the TestSQSEncodeRejectsNonzeroPartitionOnPerQueueHTFIFO regression. Also documents the effectivePartitionCount(meta) helper concept so the per-gate inline checks stay one-line clean. 2. Claude v914 v4 (Low-Medium): The per-message side-record dispatcher is named addSideRecords (encode_sqs_side.go:133), called from encode_sqs.go:214 as e.addSideRecords(b, meta.Name, &meta, &records[i]). v4 used encodeMessageSideRecords in three places — that function does not exist. Renamed throughout the doc. The perQueue gate is semantic-changing (new fail-closed branch). Callers of the new helper effectivePartitionCount: only addMessage and addSideRecords in M5-3's encoder. The validation site itself is the encoder's per-message loop, which is the only place that has both meta and the candidate rec.Partition.
|
@claude review v5 addresses both v4 findings:
|
|
Claude finished @bootjp's task in 3m 20s —— View job Review: M5-3 SQS partitioned-FIFO reverse encoder (v5)
Both v5 improvements verified ✅1. Codex P2 (perQueue HT-FIFO routing). if meta.FifoThroughputLimit == htfifoThroughputPerQueue {
return 0
}Every group is forced to partition 0 for 2. New finding:
|
| Item | Severity | Recommendation |
|---|---|---|
| perQueue routing mismatch gate | ✅ Fixed in v5 | |
addSideRecords name |
✅ Fixed in v5 | |
effectivePartitionCount naming collision with adapter |
Low–Medium | Note in doc or impl PR: must define a new copy in internal/backup/, cannot import from adapter; mirror "perQueue" constant too |
| Key dispatch vs. effective-count separation | Low | One-liner clarification in §"Encoder lift" |
The doc can land as-is with these two items tracked in the implementation PR, or the naming-collision note can be added to §"Encoder lift" before merge.
…distinction (claude v914 v5) Two impl-PR notes from claude's v5 review, both low-severity but worth pinning in the doc so the impl PR doesn't trip on them: 1. effectivePartitionCount naming collision: the adapter already has an identically-named function operating on *adapter.sqsQueueMeta (sqs_keys_dispatch.go, sqs_admin_peek.go callers). The backup package cannot import it (M3b-3 circular-dep pattern, same as sqsFifoDedupWindowMillis mirrored at encode_sqs_side.go:15). v6 spells out: define a NEW copy in internal/backup/ operating on sqsQueueMetaPublic, and mirror the htfifoThroughputPerQueue constant (= "perQueue") locally. 2. Key-shape vs validation distinction: 'Branch on PartitionCount' in Encoder lift uses raw meta.PartitionCount > 1 for key shape, NOT effectivePartitionCount. perQueue queues with PartitionCount=2 still use partitioned-shape keys (|p| prefix, partition=0) because that's what the live adapter writes. effectivePartitionCount controls only validation gates and ReceiveMessage fan-out. v6 adds an explicit note so an impl author doesn't conflate them and accidentally restore perQueue dumps under classic keys.
|
@claude review v6 addresses both v5 impl-PR notes:
|
|
Claude finished @bootjp's task in 1m 58s —— View job Review: M5-3 SQS partitioned-FIFO reverse encoder (v6)
Both v6 improvements verified ✅1.
2. Key-shape vs. effective-count distinction (Claude v5 finding). One remaining minor issue — wrong file path for
|
| Item | Severity | Status |
|---|---|---|
effectivePartitionCount naming collision + new copy note |
✅ Fixed in v6 | |
| Key-shape vs. effective-count distinction | ✅ Fixed in v6 | |
sqsQueueMetaPublic file reference (encode_sqs.go:71-87 → sqs.go:148-162) |
Minor — wrong file path | ❌ Remaining |
claude v6 caught that v6's reference to sqsQueueMetaPublic at encode_sqs.go:71-87 was wrong: - sqsQueueMetaPublic actually lives at internal/backup/sqs.go:148 - encode_sqs.go:66 defines a DIFFERENT struct, sqsStoredQueueMeta, used for the !sqs|queue|meta| record value (live-adapter shape), not the dump-side public projection. An impl PR author searching encode_sqs.go:71-87 would have found the wrong struct. v7 corrects the file path and adds a one-line note distinguishing the two structs.
|
@claude review v7 fixes the file path:
|
|
Claude finished @bootjp's task in 1m 56s —— View job Review: M5-3 SQS partitioned-FIFO reverse encoder (v7)
v7 fix verified ✅
v7 doc (line 116) now correctly reads: "The new copy operates on Cross-checked against source:
The distinction (dump-side projection vs. live-adapter JSON shape for All other previously-verified items remain correct
No new issues foundAll line number references verified against the live codebase:
SummaryThe v7 fix is correct and complete. The design doc is ready to merge — all 14 findings from 7 review rounds are resolved, every line number reference checks out against the live source, and the mechanical implementation notes are precise enough that the implementation PR has a fully-specified contract. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c7782fc0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - `meta.PartitionCount == 1` (classic) AND `rec.Partition != nil && *rec.Partition != 0` → reuses **`ErrSQSInvalidMessage`** (dump is internally inconsistent). | ||
| - `meta.PartitionCount > 1` AND `meta.FifoThroughputLimit == "perQueue"` AND `*rec.Partition != 0` → **new sentinel `ErrSQSEncodePartitionRoutingMismatch`**. The live router (`adapter/sqs_partitioning.go:71-72` in `partitionFor`) forces every group to partition 0 whenever `FifoThroughputLimit == "perQueue"`, regardless of `PartitionCount`; ReceiveMessage only scans the partition-0 lane (`adapter/sqs_keys_dispatch.go:125-126`). Accepting `*rec.Partition >= 1` for a `perQueue` queue would restore messages onto `|p|1|...` lanes the live receive fan-out never visits — silent data loss on first read. Codex P2 v914 v4 caught this gap. Pinned by `TestSQSEncodeRejectsNonzeroPartitionOnPerQueueHTFIFO`. | ||
|
|
||
| The effective partition count is `1` whenever `FifoThroughputLimit == "perQueue"`, regardless of the declared `PartitionCount`. The first two gates above only fire when the effective count is `> 1` (i.e. partitioned by message group, not collapsed by per-queue throughput limiting). An impl-side helper `effectivePartitionCount(meta) uint32` is therefore preferable to inlining the perQueue check at every gate. |
There was a problem hiding this comment.
Do not collapse missing-partition checks for perQueue
For PartitionCount > 1 queues with FifoThroughputLimit == "perQueue", this note exempts the missing/out-of-range gates by switching them to effectivePartitionCount, but the live key dispatch still selects the partitioned keyspace solely from meta.PartitionCount > 1 (adapter/sqs_keys_dispatch.go) and this doc also says key construction must use raw PartitionCount. If a perQueue dump lacks rec.Partition, the proposed partition != nil dispatch can fall back to classic keys (or otherwise lacks the required partition-0 value), making restored rows invisible to the partitioned readers. Please keep the presence/range checks based on meta.PartitionCount > 1, with only the nonzero routing mismatch using the perQueue collapse.
Useful? React with 👍 / 👎.
…odex P2 v914 v7) v6 introduced a subtle data-loss bug by saying the missing-partition and out-of-range gates should use effectivePartitionCount. Codex P2 v914 v7 caught it: For perQueue queues with PartitionCount=2: - effectivePartitionCount collapses to 1 - v6 said: missing-partition gate doesn't fire (since effective=1) - So a perQueue dump with rec.Partition == nil would slip past validation - addMessage branches on partition != nil; nil -> classic sqsMsgDataKeyBytes - But the LIVE key dispatch selects the partitioned keyspace from raw meta.PartitionCount > 1 (sqs_keys_dispatch.go) - Result: restored messages get classic keys for a queue whose readers scan only the partitioned keyspace -> silent data loss v8 reverts: ALL four gates use raw meta.PartitionCount > 1 as the partitioned-queue predicate. effectivePartitionCount stays useful for diagnostics and the ReceiveMessage scan fan-out, but MUST NOT appear in any encoder-side validation gate predicate. Updated the 'Key-shape vs validation distinction' section to unify both choices (key shape AND validation) on raw PartitionCount. Caller audit: this is a semantic change in the gate predicate from 'effective' back to 'raw'. The only callers of these predicates are the encoder's per-message loop (single site), which is what we're fixing. The effectivePartitionCount helper itself is unchanged — it still mirrors adapter/sqs_keys_dispatch.go's helper for the ReceiveMessage fan-out cross-check.
|
@claude review v8 addresses codex P2 v7 — a real data-loss bug in my v6 The bug v6 introduced: I said the missing-partition and out-of-range gates should use
v8 fix: All four validation gates use raw Caller audit: semantic-changing fix to the predicate. Only call site is the encoder's per-message loop in |
|
Claude finished @bootjp's task in 2m 6s —— View job Review: M5-3 SQS partitioned-FIFO reverse encoder (v8)
Codex P2 v7 fix — verified ✅The v6 bug was real. For a
v8 correctly fixes this. All four gates (lines 109-112) now use raw
Line 114 explicitly states the fix and the rationale, calling out the data-loss failure mode of v6. ✅ Gate ordering is correct for the perQueue+nil caseFor a perQueue queue with
For a perQueue queue with
For a perQueue queue with
The gate ordering and coverage is complete. Key-shape / effective-count separation — verified ✅Lines 114–118 now explicitly document:
This is the correct mental model and matches the live dispatch helpers. ✅ All previously verified items remain correct
No new issues foundThe design doc is ready to merge. The v8 fix is correct, complete, and well-explained. All 15 findings from 8 review rounds are resolved, gate ordering is correct for all six |
Summary
ErrSQSEncodeUnsupportedPartitioned, plumb apartitionfield throughsqsMessageRecord, and emit the partitioneddata/vis/byage/dedupkey family for queues withpartition_count > 1.Decision gates documented
messages.jsonlper queue with"partition": <N>per line (Option A, recommended) vs per-partition file layout (Option B).omitemptyonPartitioncovers pre-M5-3 dumps; format_version stays at 1.Fail-closed invariants added
meta.PartitionCount > 1ANDmessage.Partition >= PartitionCount→ newErrSQSEncodeOutOfRangePartition.meta.PartitionCount == 1ANDmessage.Partition != 0→ErrSQSInvalidMessage(reused).Out of scope (documented)
PartitionCountmismatch between dump and target — needs a future--repartitionflag).vis=0rule as M5-2).Test plan (planned for the impl PR)
TestSQSEncodePartitionedQueueRoundTrip|p|prefixTestSQSEncodePartitionedDedupBuildsGroupSegment<group-seg>matchesmessage_group_idTestSQSEncodeRejectsOutOfRangePartitionPartition >= PartitionCount→ErrSQSEncodeOutOfRangePartitionTestSQSEncodeRejectsNonZeroPartitionOnClassicQueuePartitionCount=1+Partition=2→ErrSQSInvalidMessageTestSQSEncodeLegacyDumpsWithoutPartitionStillRoundTrippartitionfield) round-trips unchangedTestSQSEncodePartitionedSideRecordsByteCrossCheckLiveAdaptersqsPartitionedMsg{...}Key(...)Self-review (5 lenses)
messages.jsonl. Group-by-partition sort before emit ensures byte-identical re-encodes.Test plan