Skip to content

storage: discard checksum corrupt chunks on startup#11886

Open
edsiper wants to merge 3 commits into
masterfrom
storage-corrupt-chunks-11847
Open

storage: discard checksum corrupt chunks on startup#11886
edsiper wants to merge 3 commits into
masterfrom
storage-corrupt-chunks-11847

Conversation

@edsiper
Copy link
Copy Markdown
Member

@edsiper edsiper commented May 29, 2026

Fixes #11847.

Upgrade to chunkio v1.5.5 and fix interfaces:

When filesystem storage finds a chunk with a bad checksum during startup scan/backlog loading, treat it as irrecoverable when storage.delete_irrecoverable_chunks is enabled.

This updates the storage backlog path to discard checksum-corrupt chunks, matching ChunkIO's irrecoverable-chunk handling, and adds an in_tail integration test covering startup cleanup.


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

edsiper added 3 commits May 29, 2026 16:04
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

The PR extends storage chunk error handling to treat checksum corruption (CIO_ERR_BAD_CHECKSUM) as an irrecoverable error in the scan and backlog segregation layers, enabling automatic cleanup of corrupted chunks during startup when deletion is enabled. An integration test validates the behavior.

Changes

Checksum Error Irrecoverability

Layer / File(s) Summary
Treat CIO_ERR_BAD_CHECKSUM as irrecoverable error
lib/chunkio/src/cio_scan.c, plugins/in_storage_backlog/sb.c
Both the filesystem scan and backlog segregation paths now include CIO_ERR_BAD_CHECKSUM alongside CIO_ERR_BAD_FILE_SIZE and CIO_ERR_BAD_LAYOUT in their irrecoverable error checks. Chunks matching these conditions are deleted when the respective delete-on-error flags are enabled.
Integration test for checksum cleanup on startup
tests/integration/scenarios/in_tail/config/tail_storage_corrupt_chunk.yaml, tests/integration/scenarios/in_tail/tests/test_in_tail_001.py
Adds a StorageFailureService test harness with log polling, a YAML configuration for the tail storage scenario with checksum validation enabled, and a test that deliberately corrupts a chunk, verifies the startup log emits an "invalid crc32" message, and confirms the corrupted chunk file is deleted from disk.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • fluent/fluent-bit#11389: Modifies plugins/in_storage_backlog/sb.c within sb_segregate_chunks to change error/stream case handling during backlog processing.

Suggested labels

backport to v4.0.x, backport to v4.2.x

Suggested reviewers

  • koleini
  • fujimotos

Poem

🐰 A checksum's cry for help rings clear,
Now marked as broken, far and near,
We sweep the storage clean with care,
And test to prove it works—with flair! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: handling of checksum-corrupted chunks during startup by discarding them. It's concise, clear, and directly reflects the primary objective across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch storage-corrupt-chunks-11847

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fluent Bit 5.0.5 becomes non-functional after corrupted filesystem storage chunks

1 participant