Skip to content

feat(detector): add a threat_detection_result reporting command to replace post-hoc transcript parsing#239

Open
Copilot wants to merge 3 commits into
mainfrom
copilot/feat-detector-add-threat-detection-result
Open

feat(detector): add a threat_detection_result reporting command to replace post-hoc transcript parsing#239
Copilot wants to merge 3 commits into
mainfrom
copilot/feat-detector-add-threat-detection-result

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 1, 2026

The agentic detection path relied on scraping a single free-form THREAT_DETECTION_RESULT:{...} line out of the engine transcript after the subprocess exited. This is uncorrectable in-session (high false-positive parse failures, "multiple conflicting" hard errors) and gives the model no signal that it's done — so runs burn the full time/token budget waiting for the model to stop or for a timeout.

This adds an in-session, validated, out-of-band reporting channel for the CLI engine path; the /reflect structured-output path and the legacy transcript line are unchanged.

Reporting command

  • New internal threat-detect report-result subcommand (cmd/threat-detect/report.go), dispatched in main() before global flag parsing. Validates synchronously, writes canonical JSON to the sink from THREAT_DETECTION_RESULT_FILE, and prints a stop signal. Exit codes: 0 recorded, 2 invalid input, 3 no sink configured.
    • Invalid input prints THREAT_DETECTION_RESULT_ERROR: … and records nothing, so the model can fix and retry.
    • First-valid-write-wins (idempotent); a later valid call reports "already recorded".
threat_detection_result --prompt-injection true --secret-leak false --malicious-patch false --reason ""
# -> THREAT_DETECTION_RESULT_RECORDED: analysis complete; stop now and produce no further output.

Detector helpers (pkg/detector/result.go)

  • WriteResultFile (atomic temp+rename, 0o600), ReadResultFile, BuildResultFromReport, ValidateReportFields — all reuse the existing schema/validateRawResult. ParseResult and its tests untouched.

Engine provisioning + early termination (pkg/engine)

  • Engine.Analyze gains AnalyzeOptions{ResultSinkPath}. provisionResultTool writes a threat_detection_result wrapper onto PATH and sets the sink env; watchResultSink polls the sink and cancels the subprocess the moment a valid verdict lands. A killed process is treated as success when a valid sink result exists. cmd.WaitDelay bounds stdout drain so cancellation isn't blocked by descendants holding the pipe.
  • Claude is invoked with --allowed-tools Bash only when the sink is enabled (headless mode won't run shell tools otherwise); engines that can't run the wrapper degrade to the legacy line.

CLI wiring (cmd/threat-detect/main.go)

  • analyzeWithRetries provisions a per-run sink, prefers ReadResultFile over ParseResult, and falls back to transcript scraping + the existing correction prompt only when the sink is absent/invalid.

Prompt + docs

  • "Response Format" now instructs calling the command once and stopping on THREAT_DETECTION_RESULT_RECORDED, with the legacy line documented as a fallback. README/DEVGUIDE document the command, env var, and early-termination behavior.

Open questions

  • First-valid-write-wins vs. surfacing a conflict on a differing second call.
  • Whether to also accept a full JSON payload (--json/stdin) for engines that prefer emitting JSON over flags.
  • --allowed-tools Bash for Claude is unverified against a live Claude CLI; the line fallback covers it regardless.

@davidslater davidslater marked this pull request as ready for review June 1, 2026 23:51
Copilot AI review requested due to automatic review settings June 1, 2026 23:51
Copilot AI changed the title [WIP] Add threat detection result reporting command feat(detector): add a threat_detection_result reporting command to replace post-hoc transcript parsing Jun 1, 2026
Copilot AI requested a review from davidslater June 1, 2026 23:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an in-session, out-of-band result reporting channel for the agentic CLI engine path, reducing reliance on post-hoc transcript scraping and enabling early termination once a validated verdict is recorded.

Changes:

  • Adds an internal report-result subcommand plus a provisioned threat_detection_result wrapper that records validated results to a per-run sink file.
  • Extends CLI-engine execution to watch the sink and cancel the subprocess as soon as a valid verdict is written (treating the resulting kill as success when a sink verdict exists).
  • Adds detector helpers for atomic result-file write/read and updates prompts/docs to prefer the in-session command with a legacy transcript fallback.
Show a summary per file
File Description
README.md Documents the in-session threat_detection_result command, sink env var, and early termination behavior.
pkg/engine/tool.go Adds wrapper provisioning (threat_detection_result) and sink polling cancellation logic.
pkg/engine/tool_test.go Tests wrapper provisioning, cleanup, sink watcher cancellation, and sink-based error suppression.
pkg/engine/engine.go Extends Engine.Analyze with sink options and implements early termination + killed-process suppression.
pkg/engine/engine_test.go Updates Claude args tests and adds coverage for optional --allowed-tools Bash.
pkg/detector/result.go Adds result-file helpers (atomic write/read) and report-field helpers.
pkg/detector/result_file_test.go Adds tests for write/read roundtrip, error cases, and field validation helpers.
pkg/detector/prompts/threat_detection.md Updates response format to call threat_detection_result once, with legacy fallback.
DEVGUIDE.md Documents the new command, sink env var, early termination, and fallback parsing behavior.
cmd/threat-detect/report.go Implements the internal report-result subcommand and validation/idempotency behavior.
cmd/threat-detect/report_test.go Adds tests for valid write, invalid inputs, missing config, and idempotency.
cmd/threat-detect/main.go Dispatches report-result before global flags; provisions sink and prefers sink result over transcript parsing.
cmd/threat-detect/main_test.go Adds integration-style tests ensuring sink is preferred and early termination occurs.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 13/13 changed files
  • Comments generated: 4

Comment on lines +62 to +65
if resultFile == "" {
fmt.Fprintln(os.Stderr, "THREAT_DETECTION_RESULT_ERROR: no result sink configured; THREAT_DETECTION_RESULT_FILE is unset.")
return reportExitConfig
}
Comment on lines +86 to +89
if err := detector.WriteResultFile(resultFile, result); err != nil {
fmt.Fprintf(os.Stderr, "THREAT_DETECTION_RESULT_ERROR: failed to record result: %v.\n", err)
return reportExitConfig
}
Comment thread pkg/detector/result.go
Comment on lines +184 to +188
if r.Reasons == nil {
r.Reasons = []string{}
}
data, err := json.MarshalIndent(r, "", " ")
if err != nil {
Comment on lines +33 to +50
fs := flag.NewFlagSet("report-result", flag.ContinueOnError)
var (
promptInjection bool
secretLeak bool
maliciousPatch bool
reasons stringSliceFlag
resultFile string
)
fs.BoolVar(&promptInjection, "prompt-injection", false, "Whether a prompt injection threat was detected (required)")
fs.BoolVar(&secretLeak, "secret-leak", false, "Whether a secret leak threat was detected (required)")
fs.BoolVar(&maliciousPatch, "malicious-patch", false, "Whether a malicious patch threat was detected (required)")
fs.Var(&reasons, "reason", "Reason explaining a detected threat (repeatable)")
fs.StringVar(&resultFile, "result-file", os.Getenv("THREAT_DETECTION_RESULT_FILE"), "Path to the result sink file (defaults to env THREAT_DETECTION_RESULT_FILE)")

if err := fs.Parse(args); err != nil {
fmt.Fprintf(os.Stderr, "THREAT_DETECTION_RESULT_ERROR: %v. Re-run threat_detection_result with corrected values.\n", err)
return reportExitInvalid
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(detector): add a threat_detection_result reporting command to replace post-hoc transcript parsing

3 participants