fix(slack): persist working pill across bridge restarts#83
Merged
Conversation
When config-sync restarts the bridge (which happens on every applied bundle, sometimes several times an hour while iterating on the box), the in-memory `active` pill map is wiped. The 30s refresh loop is gated on that map being populated, so until the agent's next channel- root text post relights it, the pill stays dark. An agent that goes long on tools without an intermediate text post — common when debugging or planning — looks dead in Slack for the rest of the turn. Reproduced in prod today: bridge restarted at 16:28 UTC, agent kept working but didn't emit channel-root text until 18:31 (Manifesting on a long Bash sequence the whole time). Channel showed no progress and no pill for ~2 hours. Persist the `active` map to ~/.kanban-code/active-pills/<slug> (same atomic write-then-rename pattern as thread-root), restore on bridge startup. Don't restore pills older than MAX_RESTORE_AGE_MS (10 min) to avoid falsely advertising work on a turn that actually finished before the restart — Slack's own idle TTL would have cleared the visual pill by then anyway. Re-light immediately on restore rather than waiting for the next refresh tick so the gap between bridge start and visible "is working…" is seconds, not a minute. Tests: 6 new for active-pill (round-trip, missing file, corrupt file, partial record, clear, idempotent clear). 242/242 passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the bridge restarts (config-sync does this on every applied bundle, sometimes several times an hour while iterating), the in-memory
activepill map is wiped. The 30s refresh loop is gated on that map being populated, so until the agent's next channel-root text post relights it, the pill stays dark. An agent that goes long on tools without an intermediate text post (common when debugging or planning) ends up looking dead in Slack for the rest of the turn.Repro from earlier today: bridge restarted at 16:28 UTC, the dependabot-scout agent kept working but didn't emit channel-root text until 18:31 (it was "Manifesting" on a long Bash sequence the entire time). Channel showed no progress and no pill for ~2 hours.
Fix
Persist
activeto~/.kanban-code/active-pills/<slug>(same atomic write-then-rename pattern asthread-root), restore on bridge startup. Two guards on restore:Persist on every set / refresh / clear so the on-disk state stays in lockstep with the in-memory map.
Test plan
active-pill: round-trip, missing file, corrupted file, partial record, clear, idempotent clear. 242/242 passing.