Skip to content

fix(slack): persist working pill across bridge restarts#83

Merged
rogeriochaves merged 1 commit into
mainfrom
fix/persist-working-pill-across-restarts
May 31, 2026
Merged

fix(slack): persist working pill across bridge restarts#83
rogeriochaves merged 1 commit into
mainfrom
fix/persist-working-pill-across-restarts

Conversation

@rogeriochaves
Copy link
Copy Markdown
Contributor

Summary

When the bridge restarts (config-sync does this on every applied bundle, sometimes several times an hour while iterating), the in-memory active pill map is wiped. The 30s refresh loop is gated on that map being populated, so until the agent's next channel-root text post relights it, the pill stays dark. An agent that goes long on tools without an intermediate text post (common when debugging or planning) ends up looking dead in Slack for the rest of the turn.

Repro from earlier today: bridge restarted at 16:28 UTC, the dependabot-scout agent kept working but didn't emit channel-root text until 18:31 (it was "Manifesting" on a long Bash sequence the entire time). Channel showed no progress and no pill for ~2 hours.

Fix

Persist active to ~/.kanban-code/active-pills/<slug> (same atomic write-then-rename pattern as thread-root), restore on bridge startup. Two guards on restore:

  1. Skip pills older than 10 min — Slack's own idle TTL would have cleared the visual pill by then anyway, and an agent that's been silent that long has likely finished its turn. Re-lighting would falsely advertise active work.
  2. Re-light immediately on restore rather than waiting for the next refresh tick — gap between bridge start and visible "is working…" becomes seconds instead of a minute.

Persist on every set / refresh / clear so the on-disk state stays in lockstep with the in-memory map.

Test plan

  • 6 new unit tests for active-pill: round-trip, missing file, corrupted file, partial record, clear, idempotent clear. 242/242 passing.
  • Build clean.
  • After merge + box pulls main, trigger an agent (e.g. force-sync the bundle, which restarts the bridge), confirm the pill survives the restart on a long-running tool sequence.

When config-sync restarts the bridge (which happens on every applied
bundle, sometimes several times an hour while iterating on the box),
the in-memory `active` pill map is wiped. The 30s refresh loop is
gated on that map being populated, so until the agent's next channel-
root text post relights it, the pill stays dark. An agent that goes
long on tools without an intermediate text post — common when
debugging or planning — looks dead in Slack for the rest of the turn.

Reproduced in prod today: bridge restarted at 16:28 UTC, agent kept
working but didn't emit channel-root text until 18:31 (Manifesting on
a long Bash sequence the whole time). Channel showed no progress and
no pill for ~2 hours.

Persist the `active` map to ~/.kanban-code/active-pills/<slug> (same
atomic write-then-rename pattern as thread-root), restore on bridge
startup. Don't restore pills older than MAX_RESTORE_AGE_MS (10 min)
to avoid falsely advertising work on a turn that actually finished
before the restart — Slack's own idle TTL would have cleared the
visual pill by then anyway. Re-light immediately on restore rather
than waiting for the next refresh tick so the gap between bridge
start and visible "is working…" is seconds, not a minute.

Tests: 6 new for active-pill (round-trip, missing file, corrupt file,
partial record, clear, idempotent clear). 242/242 passing.
@rogeriochaves rogeriochaves merged commit d8591b1 into main May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant