Goal or problem
The full (agentic) detection path requires the model to emit a single free-form line of the form THREAT_DETECTION_RESULT:{...json...} somewhere in its transcript. The orchestrating CLI captures the whole transcript and only parses that line after the engine subprocess has exited (pkg/detector/result.go:79-141, invoked from cmd/threat-detect/main.go:160-182). This post-hoc, text-scraping contract has two recurring failure modes called out in the task:
-
Unrecoverable parsing errors / high false-positive rate. Because parsing happens after the engine finishes, any deviation (prose around the line, a stringified boolean, two non-identical result lines, fenced code, etc.) cannot be corrected within the same model turn. The model never learns it produced malformed output during the run. Two conflicting lines hard-fail with multiple conflicting THREAT_DETECTION_RESULT entries found (pkg/detector/result.go:117-119), and a missing/garbled line fails with no THREAT_DETECTION_RESULT found ... (pkg/detector/result.go:111-113). The only remedy today is to re-run the entire engine with a correction prompt appended (cmd/threat-detect/main.go:177-179, pkg/detector/correction.go:22-25), which is expensive and still post-hoc.
-
Dead spiral / high latency and cost. Nothing tells the model when it has successfully reported. Models frequently keep "confirming" by emitting the line repeatedly, or keep reasoning until the engine hits its own timeout. There is no in-band signal that the job is done, so a single run can burn the full time/token budget. Repeated non-identical lines additionally trip the conflicting-results hard error above.
The task proposes giving the model a single callable command (e.g. threat_detection_result) that it invokes with the verdict. The command validates the input synchronously, returns actionable errors the model can immediately fix in-session, records the verdict to an out-of-band sink the agent cannot tamper with after the fact, and signals "done — stop now". This issue specifies that mechanism end to end.
Background or context
Relevant architecture:
-
The detector has three result paths. (a) A non-agentic /reflect structured-output triage and full pass that already use strict JSON Schema and need no transcript scraping (pkg/engine/reflect.go:38-63, cmd/threat-detect/main.go:94-142). (b) The agentic CLI engine path (copilot, claude, codex) which is the subject of this issue (pkg/engine/engine.go:16-77, cmd/threat-detect/main.go:144-157). (c) Static prompt analysis that only enriches the prompt (pkg/detector/static.go).
-
The agentic engines run a CLI subprocess and capture only stdout (pkg/engine/runCLIEnv, pkg/engine/engine.go:190-211). The Copilot engine already runs the model with shell/tool access: --disable-builtin-mcps, --no-ask-user, --allow-all-tools (pkg/engine/engine.go:90-102). The Codex engine runs with --dangerously-bypass-approvals-and-sandbox (pkg/engine/engine.go:145-159). This means the model can already execute arbitrary shell commands during the run, so a command-on-PATH is a viable in-session reporting channel for at least Copilot and Codex.
-
The result schema is fixed and already centralized: the Result struct, ResultJSONSchema, and validateRawResult (pkg/detector/result.go:14-36, pkg/detector/result.go:143-174). The ResultPrefix constant is THREAT_DETECTION_RESULT: (pkg/detector/result.go:12).
-
Retry/correction infrastructure already exists and is bounded by --triage-retries (default 1, env THREAT_DETECTION_TRIAGE_RETRIES) via analyzeWithRetries (cmd/threat-detect/main.go:160-182) and BuildCorrectionPrompt (pkg/detector/correction.go:22-25).
-
The model is instructed to print the line by the prompt template's "Response Format" section (pkg/detector/prompts/threat_detection.md:70-81).
-
The runtime container ships only the threat-detect binary plus CA certs, runs as non-root uid 1000, and bundles no engine CLIs (Dockerfile:14-30). Any new tool the model calls must therefore be either the same threat-detect binary (a subcommand) or a tiny script generated at runtime — not a new third-party dependency.
Assumption: the task's phrase "writes to a log file" refers to this transcript-scraping contract (the model writes the result line into its output/log, which is parsed afterwards). There is no separate dedicated result log file in this repository; the parsed source is the engine's captured stdout transcript (pkg/engine/engine.go:199-210 → pkg/detector/result.go:79).
Current state
- Prompt instructs: "output exactly one line containing only the JSON response ... Do not include any other text" (
pkg/detector/prompts/threat_detection.md:70-81).
- Engine runs, stdout is captured (
pkg/engine/engine.go:199-210).
analyzeWithRetries calls ParseResult(rawOutput); on error it appends a correction message and re-invokes the whole engine, up to triageRetries + 1 attempts (cmd/threat-detect/main.go:160-182).
ParseResult scrapes the transcript: stream-json extraction first, then raw line matching, then dedup, then JSON validation, with hard errors for "none found", "multiple conflicting", and schema violations (pkg/detector/result.go:79-141).
- There is no way for the model to know, during the run, whether its reported verdict was accepted, and no way for it to be told to stop. There is no out-of-band, tamper-resistant result sink — the only record is the transcript the analyzed agent's content shares space with.
Target state
Introduce an in-session, validated, out-of-band reporting channel for the agentic path:
-
A new threat-detect report-result subcommand that the model invokes (via a generated threat_detection_result wrapper on PATH). It:
- validates the verdict synchronously against the existing schema;
- on invalid input, exits non-zero and prints a precise, bounded, actionable error to stdout+stderr telling the model exactly what to fix, and does not record anything;
- on valid input, atomically writes the canonical JSON result to a sink file path provided via
THREAT_DETECTION_RESULT_FILE, prints an unambiguous success-and-stop message, and is idempotent (a second valid call reports "already recorded, stop now" without changing the recorded result);
- the recorded sink file lives outside the analyzed content and is written only by the detector binary, satisfying TD-38 ("Detection results MUST NOT be modifiable by the agent being analyzed",
specs/threat-detection-spec.md:248) at least as well as transcript scraping does.
-
The CLI engine path provisions the sink + tool before each engine invocation, prefers the sink result over transcript scraping, and cancels the engine subprocess as soon as a valid result is recorded (early termination). Early termination is what eliminates the dead-spiral latency/cost: the run ends the instant a valid verdict exists rather than waiting for the model to stop or for a timeout.
-
The transcript-scraped THREAT_DETECTION_RESULT: line is retained as a backward-compatible fallback for engines/environments where in-session shell execution is unavailable. ParseResult and its tests are unchanged.
The /reflect structured-output path (pkg/engine/reflect.go) is unchanged and out of scope; it already has a synchronous validated contract.
Required changes
1. Result file helpers — pkg/detector/result.go
Add canonical write/read helpers and a constructor, reusing validateRawResult and resultFromRaw.
// WriteResultFile atomically writes r as canonical THREAT_DETECTION_RESULT JSON
// to path (temp file in the same dir + rename), with 0o600 permissions.
func WriteResultFile(path string, r *Result) error
// ReadResultFile reads path and parses it with ParseStructuredResult, returning
// a validated *Result. Returns an error if the file is missing, empty, or invalid.
func ReadResultFile(path string) (*Result, error)
// BuildResultFromReport constructs a *Result from individual report fields.
// reasons may be nil; it is normalized to a non-nil empty slice.
func BuildResultFromReport(promptInjection, secretLeak, maliciousPatch bool, reasons []string) *Result
// ValidateReportFields validates a report payload using the same rules as
// validateRawResult and returns a single bounded, human-readable error string
// suitable for feeding back to the model (already passed through
// TruncateCorrectionMessage). Returns "" when valid.
func ValidateReportFields(promptInjection, secretLeak, maliciousPatch any, reasons any) string
Notes:
WriteResultFile must marshal via the existing Result shape (same field names/order as json.MarshalIndent used in writeResult, cmd/threat-detect/main.go:184-189) so the recorded file round-trips through ParseStructuredResult (pkg/detector/result.go:51-75).
- Atomic write: create a temp file in
filepath.Dir(path), write, Close, os.Rename over path. This avoids a partial-read race against the engine-side watcher (see change 3).
2. New subcommand — cmd/threat-detect/report.go (new file)
// runReport implements the "report-result" subcommand invoked in-session by the
// detection model through the generated threat_detection_result wrapper.
func runReport(args []string) int
Flags (own flag.FlagSet, not the global flag.CommandLine):
--prompt-injection (bool, required)
--secret-leak (bool, required)
--malicious-patch (bool, required)
--reason (repeatable string; collected into a slice; optional, but required to be non-empty when any boolean is true — mirror prompt guidance in pkg/detector/prompts/threat_detection.md:80)
--result-file (string; defaults to env THREAT_DETECTION_RESULT_FILE)
Behavior and exit codes (define local constants, e.g. reportExitOK = 0, reportExitInvalid = 2, reportExitConfig = 3):
- Resolve the sink path from
--result-file or THREAT_DETECTION_RESULT_FILE. If empty, print a config error to stderr and return reportExitConfig.
- Validate via
detector.ValidateReportFields. If invalid: print a single bounded, imperative message to stdout (so it is visible in the model's tool output) and stderr, e.g. THREAT_DETECTION_RESULT_ERROR: <reason>. Re-run threat_detection_result with corrected values., and return reportExitInvalid. Do not write the sink.
- If valid and the sink already contains a valid result (
detector.ReadResultFile succeeds): print THREAT_DETECTION_RESULT_RECORDED: result already recorded; analysis complete; stop now and produce no further output. and return reportExitOK (idempotent — first valid write wins; do not overwrite).
- Otherwise write atomically via
detector.WriteResultFile, print THREAT_DETECTION_RESULT_RECORDED: analysis complete; stop now and produce no further output., and return reportExitOK.
Assumption: first-valid-write-wins is the desired conflict policy (it removes the current "multiple conflicting" hard failure for the tool path while keeping results tamper-resistant). Flag this in Unresolved questions if a different policy is wanted.
3. Subcommand dispatch — cmd/threat-detect/main.go
Dispatch before the global flag parsing in run() (which uses the package-global flag and would otherwise reject subcommand flags):
func main() {
if len(os.Args) > 1 && os.Args[1] == "report-result" {
os.Exit(runReport(os.Args[2:]))
}
os.Exit(run())
}
The existing main() (cmd/threat-detect/main.go:40-42) is replaced with the above. Keep report-result undocumented in --help output for end users (it is an internal tool surface), but document it in DEVGUIDE.md.
4. Engine tool provisioning + early termination — pkg/engine/engine.go (plus new pkg/engine/tool.go)
Extend the Engine contract so the CLI path can pass a sink and stop early. Change the interface (only three implementations exist: copilotEngine, claudeEngine, codexEngine; ReflectClient does not implement Engine):
// AnalyzeOptions carries optional in-session reporting configuration.
type AnalyzeOptions struct {
// ResultSinkPath, when non-empty, enables the threat_detection_result tool:
// the engine provisions the wrapper on PATH, sets THREAT_DETECTION_RESULT_FILE,
// and cancels the subprocess as soon as a valid result is written to this path.
ResultSinkPath string
}
type Engine interface {
Analyze(ctx context.Context, prompt string, opts AnalyzeOptions) (string, error)
}
New helpers in pkg/engine/tool.go:
// provisionResultTool creates a temp dir containing an executable
// "threat_detection_result" wrapper that execs the current binary's
// report-result subcommand. It returns the tool dir, the env additions
// (THREAT_DETECTION_RESULT_FILE and a PATH prefix), and a cleanup func.
func provisionResultTool(sinkPath string) (env []string, cleanup func(), err error)
// watchResultSink polls sinkPath; when ReadResultFile(sinkPath) first succeeds,
// it calls cancel() to terminate the engine subprocess. It returns when ctx is done.
func watchResultSink(ctx context.Context, cancel context.CancelFunc, sinkPath string)
Wrapper script content (written 0o700 into the temp dir as threat_detection_result), using the resolved binary path from os.Executable():
#!/bin/sh
exec "<self-path>" report-result "$@"
provisionResultTool env additions:
THREAT_DETECTION_RESULT_FILE=<sinkPath>
PATH=<toolDir> + string(os.PathListSeparator) + current PATH
Engine integration:
- Add a
runCLIEnv variant (e.g. runCLIEnvWithSink) that accepts sinkPath. When sinkPath != "", derive ctx, cancel := context.WithCancel(ctx), start watchResultSink, and after cmd.Run() returns: if the run failed and detector.ReadResultFile(sinkPath) succeeds, treat it as success and return the captured stdout with a nil error (the process was intentionally killed because the verdict was already recorded). Only surface the subprocess error when no valid sink result exists.
copilotEngine.Analyze, claudeEngine.Analyze, codexEngine.Analyze (pkg/engine/engine.go:49-77) thread opts.ResultSinkPath into the new sink-aware run path and merge the provisioned env with the existing per-engine env (copilotEnv, pkg/engine/engine.go:130-135).
Assumption: Copilot (--allow-all-tools, pkg/engine/engine.go:96) and Codex (--dangerously-bypass-approvals-and-sandbox, pkg/engine/engine.go:149) can execute the wrapper. Whether claude --print (pkg/engine/engine.go:137-143) executes shell tools without an explicit --allowed-tools Bash grant is unverified; see Unresolved questions. The stdout-line fallback (change 6) covers any engine that cannot run the wrapper.
5. CLI wiring — cmd/threat-detect/main.go
- In the CLI-engine branch (
cmd/threat-detect/main.go:144-157), create a sink path (e.g. os.CreateTemp("", "threat-detect-result-*.json"), then remove the empty file so ReadResultFile only succeeds once the tool writes it; defer cleanup of the path).
- Pass the sink path into
analyzeWithRetries.
- Update
analyzeWithRetries (cmd/threat-detect/main.go:160-182):
func analyzeWithRetries(ctx context.Context, eng engine.Engine, prompt, sinkPath string, retries int) (*detector.Result, error)
Per attempt: remove any stale sink file first; call eng.Analyze(ctx, currentPrompt, engine.AnalyzeOptions{ResultSinkPath: sinkPath}); then prefer the sink: if detector.ReadResultFile(sinkPath) succeeds, return it; else fall back to detector.ParseResult(rawOutput); on failure build the correction prompt exactly as today (pkg/detector/correction.go:22-25) and retry. The correction prompt text should instruct re-calling the tool.
6. Prompt template — pkg/detector/prompts/threat_detection.md
Rewrite the "Response Format" section (pkg/detector/prompts/threat_detection.md:70-81) so the primary instruction is to call the command exactly once, e.g.:
Report your verdict by running this command exactly once:
threat_detection_result --prompt-injection <true|false> --secret-leak <true|false> --malicious-patch <true|false> --reason "..." --reason "..."
The command validates your input and prints THREAT_DETECTION_RESULT_ERROR with
the problem if anything is wrong — fix it and run the command again. When it
prints THREAT_DETECTION_RESULT_RECORDED, the analysis is complete: stop
immediately and produce no further output.
Keep a clearly labeled fallback paragraph documenting the legacy single line THREAT_DETECTION_RESULT:{...} for engines without shell access, preserving the existing boolean-type warning (pkg/detector/prompts/threat_detection.md:78-79). Do not alter the triage prompt (pkg/detector/prompts/threat_detection_triage.md), which is non-agentic by design (lines 11-16).
7. Docs — README.md, DEVGUIDE.md
- Document
report-result / the threat_detection_result tool, THREAT_DETECTION_RESULT_FILE, and the early-termination behavior in the CLI/usage sections of README.md and the maintainer notes in DEVGUIDE.md.
Acceptance criteria
Tests / validation steps
Design decisions and rejected alternatives
- Same binary subcommand + generated wrapper (chosen). The runtime image ships only
threat-detect (Dockerfile:14-30). A report-result subcommand reuses the existing schema/validation code and needs no new dependency or image change. The thin threat_detection_result wrapper gives the model a stable, prompt-friendly command name on PATH.
- MCP server / native tool-calling API (rejected for now). The Copilot engine explicitly disables built-in MCPs (
--disable-builtin-mcps, pkg/engine/engine.go:94), and tool-call wiring differs per engine. A shell command works uniformly wherever the engine can run shell (Copilot, Codex) and degrades gracefully via the line fallback elsewhere. An MCP-based result tool can be a follow-up.
- Early termination via context cancel (chosen) vs. waiting for the model to stop. Cancelling the subprocess the moment a valid result is recorded is what actually fixes the dead-spiral latency/cost; relying on the model to self-terminate does not.
- First-valid-write-wins (chosen) vs. last-write-wins / conflict error. First-wins removes the current "multiple conflicting" hard failure (
pkg/detector/result.go:117-119) for the tool path while keeping results stable and tamper-resistant.
- Keep the line-based fallback (chosen) vs. remove it. Removing transcript scraping would break engines/environments without in-session shell execution and is unnecessary for the fix.
Unresolved questions
- Does
claude --print (pkg/engine/engine.go:137-143) execute shell tools (so it can call the wrapper) without an explicit --allowed-tools Bash grant? If not, decide whether to add that grant or rely solely on the line fallback for Claude.
- Is first-valid-write-wins the desired conflict policy, or should a second differing call surface an error/override?
- Should
report-result accept a full JSON object (--json / stdin) in addition to discrete flags, for engines that find JSON easier to emit than multiple flags?
- Does any downstream
gh-aw integration parse the detector's own stdout for THREAT_DETECTION_RESULT: (vs. consuming the --output JSON / exit code)? The detector emits canonical JSON via writeResult (cmd/threat-detect/main.go:184-205) and the recorded sink is internal, so this is expected to be safe, but confirm before relying on it.
- Preferred poll interval / cancel latency for
watchResultSink (e.g. 200–300 ms) and whether to use filesystem notifications instead of polling.
Implications
- Lower latency and cost on the agentic path: a correct first call ends the run immediately instead of waiting for the model to stop or for a timeout.
- Lower false-positive parse failures: validation is synchronous and correctable in-session before the run ends.
- Slightly stronger tamper resistance for TD-38 (
specs/threat-detection-spec.md:248): the verdict is recorded to a detector-written file rather than read back out of the same transcript that contains analyzed agent content.
- The
Engine interface signature changes (Analyze gains AnalyzeOptions); all three engine implementations and their tests must be updated in lockstep.
Out of scope or notes
- The
/reflect structured-output triage and full path (pkg/engine/reflect.go, cmd/threat-detect/main.go:94-142) are unchanged; they already use a synchronous validated schema contract.
- The triage prompt (
pkg/detector/prompts/threat_detection_triage.md) stays non-agentic and tool-free (lines 11-16).
- No change to exit codes, the public JSON output contract (TD-08/TD-19/TD-20,
specs/threat-detection-spec.md:80-89, specs/threat-detection-spec.md:176-181), or the release/lifecycle machinery.
- No new third-party dependencies; the generated wrapper and the
report-result subcommand both use the already-shipped threat-detect binary.
Steps to take after merging
- Regenerate and verify the containerized smoke siblings if any generated workflow text references the result contract:
scripts/create-threat-detection-sibling-workflows.py then --check.
- Follow-up issue: optionally expose the result tool over MCP for engines that prefer native tool-calling over shell.
- Follow-up issue: once telemetry confirms the tool path is reliable across engines, evaluate deprecating the transcript-scraped
THREAT_DETECTION_RESULT: fallback (separate spec/version-compatibility decision under specs/threat-detection-spec.md).
- Coordinate with
gh-aw integration owners before changing or removing the line-based contract, per the version-coupling guidance in README.md.
Goal or problem
The full (agentic) detection path requires the model to emit a single free-form line of the form
THREAT_DETECTION_RESULT:{...json...}somewhere in its transcript. The orchestrating CLI captures the whole transcript and only parses that line after the engine subprocess has exited (pkg/detector/result.go:79-141, invoked fromcmd/threat-detect/main.go:160-182). This post-hoc, text-scraping contract has two recurring failure modes called out in the task:Unrecoverable parsing errors / high false-positive rate. Because parsing happens after the engine finishes, any deviation (prose around the line, a stringified boolean, two non-identical result lines, fenced code, etc.) cannot be corrected within the same model turn. The model never learns it produced malformed output during the run. Two conflicting lines hard-fail with
multiple conflicting THREAT_DETECTION_RESULT entries found(pkg/detector/result.go:117-119), and a missing/garbled line fails withno THREAT_DETECTION_RESULT found ...(pkg/detector/result.go:111-113). The only remedy today is to re-run the entire engine with a correction prompt appended (cmd/threat-detect/main.go:177-179,pkg/detector/correction.go:22-25), which is expensive and still post-hoc.Dead spiral / high latency and cost. Nothing tells the model when it has successfully reported. Models frequently keep "confirming" by emitting the line repeatedly, or keep reasoning until the engine hits its own timeout. There is no in-band signal that the job is done, so a single run can burn the full time/token budget. Repeated non-identical lines additionally trip the conflicting-results hard error above.
The task proposes giving the model a single callable command (e.g.
threat_detection_result) that it invokes with the verdict. The command validates the input synchronously, returns actionable errors the model can immediately fix in-session, records the verdict to an out-of-band sink the agent cannot tamper with after the fact, and signals "done — stop now". This issue specifies that mechanism end to end.Background or context
Relevant architecture:
The detector has three result paths. (a) A non-agentic
/reflectstructured-output triage and full pass that already use strict JSON Schema and need no transcript scraping (pkg/engine/reflect.go:38-63,cmd/threat-detect/main.go:94-142). (b) The agentic CLI engine path (copilot,claude,codex) which is the subject of this issue (pkg/engine/engine.go:16-77,cmd/threat-detect/main.go:144-157). (c) Static prompt analysis that only enriches the prompt (pkg/detector/static.go).The agentic engines run a CLI subprocess and capture only stdout (
pkg/engine/runCLIEnv,pkg/engine/engine.go:190-211). The Copilot engine already runs the model with shell/tool access:--disable-builtin-mcps,--no-ask-user,--allow-all-tools(pkg/engine/engine.go:90-102). The Codex engine runs with--dangerously-bypass-approvals-and-sandbox(pkg/engine/engine.go:145-159). This means the model can already execute arbitrary shell commands during the run, so a command-on-PATHis a viable in-session reporting channel for at least Copilot and Codex.The result schema is fixed and already centralized: the
Resultstruct,ResultJSONSchema, andvalidateRawResult(pkg/detector/result.go:14-36,pkg/detector/result.go:143-174). TheResultPrefixconstant isTHREAT_DETECTION_RESULT:(pkg/detector/result.go:12).Retry/correction infrastructure already exists and is bounded by
--triage-retries(default 1, envTHREAT_DETECTION_TRIAGE_RETRIES) viaanalyzeWithRetries(cmd/threat-detect/main.go:160-182) andBuildCorrectionPrompt(pkg/detector/correction.go:22-25).The model is instructed to print the line by the prompt template's "Response Format" section (
pkg/detector/prompts/threat_detection.md:70-81).The runtime container ships only the
threat-detectbinary plus CA certs, runs as non-root uid 1000, and bundles no engine CLIs (Dockerfile:14-30). Any new tool the model calls must therefore be either the samethreat-detectbinary (a subcommand) or a tiny script generated at runtime — not a new third-party dependency.Assumption: the task's phrase "writes to a log file" refers to this transcript-scraping contract (the model writes the result line into its output/log, which is parsed afterwards). There is no separate dedicated result log file in this repository; the parsed source is the engine's captured stdout transcript (
pkg/engine/engine.go:199-210→pkg/detector/result.go:79).Current state
pkg/detector/prompts/threat_detection.md:70-81).pkg/engine/engine.go:199-210).analyzeWithRetriescallsParseResult(rawOutput); on error it appends a correction message and re-invokes the whole engine, up totriageRetries + 1attempts (cmd/threat-detect/main.go:160-182).ParseResultscrapes the transcript: stream-json extraction first, then raw line matching, then dedup, then JSON validation, with hard errors for "none found", "multiple conflicting", and schema violations (pkg/detector/result.go:79-141).Target state
Introduce an in-session, validated, out-of-band reporting channel for the agentic path:
A new
threat-detect report-resultsubcommand that the model invokes (via a generatedthreat_detection_resultwrapper onPATH). It:THREAT_DETECTION_RESULT_FILE, prints an unambiguous success-and-stop message, and is idempotent (a second valid call reports "already recorded, stop now" without changing the recorded result);specs/threat-detection-spec.md:248) at least as well as transcript scraping does.The CLI engine path provisions the sink + tool before each engine invocation, prefers the sink result over transcript scraping, and cancels the engine subprocess as soon as a valid result is recorded (early termination). Early termination is what eliminates the dead-spiral latency/cost: the run ends the instant a valid verdict exists rather than waiting for the model to stop or for a timeout.
The transcript-scraped
THREAT_DETECTION_RESULT:line is retained as a backward-compatible fallback for engines/environments where in-session shell execution is unavailable.ParseResultand its tests are unchanged.The
/reflectstructured-output path (pkg/engine/reflect.go) is unchanged and out of scope; it already has a synchronous validated contract.Required changes
1. Result file helpers —
pkg/detector/result.goAdd canonical write/read helpers and a constructor, reusing
validateRawResultandresultFromRaw.Notes:
WriteResultFilemust marshal via the existingResultshape (same field names/order asjson.MarshalIndentused inwriteResult,cmd/threat-detect/main.go:184-189) so the recorded file round-trips throughParseStructuredResult(pkg/detector/result.go:51-75).filepath.Dir(path), write,Close,os.Renameoverpath. This avoids a partial-read race against the engine-side watcher (see change 3).2. New subcommand —
cmd/threat-detect/report.go(new file)Flags (own
flag.FlagSet, not the globalflag.CommandLine):--prompt-injection(bool, required)--secret-leak(bool, required)--malicious-patch(bool, required)--reason(repeatable string; collected into a slice; optional, but required to be non-empty when any boolean istrue— mirror prompt guidance inpkg/detector/prompts/threat_detection.md:80)--result-file(string; defaults to envTHREAT_DETECTION_RESULT_FILE)Behavior and exit codes (define local constants, e.g.
reportExitOK = 0,reportExitInvalid = 2,reportExitConfig = 3):--result-fileorTHREAT_DETECTION_RESULT_FILE. If empty, print a config error to stderr and returnreportExitConfig.detector.ValidateReportFields. If invalid: print a single bounded, imperative message to stdout (so it is visible in the model's tool output) and stderr, e.g.THREAT_DETECTION_RESULT_ERROR: <reason>. Re-run threat_detection_result with corrected values., and returnreportExitInvalid. Do not write the sink.detector.ReadResultFilesucceeds): printTHREAT_DETECTION_RESULT_RECORDED: result already recorded; analysis complete; stop now and produce no further output.and returnreportExitOK(idempotent — first valid write wins; do not overwrite).detector.WriteResultFile, printTHREAT_DETECTION_RESULT_RECORDED: analysis complete; stop now and produce no further output., and returnreportExitOK.Assumption: first-valid-write-wins is the desired conflict policy (it removes the current "multiple conflicting" hard failure for the tool path while keeping results tamper-resistant). Flag this in Unresolved questions if a different policy is wanted.
3. Subcommand dispatch —
cmd/threat-detect/main.goDispatch before the global flag parsing in
run()(which uses the package-globalflagand would otherwise reject subcommand flags):The existing
main()(cmd/threat-detect/main.go:40-42) is replaced with the above. Keepreport-resultundocumented in--helpoutput for end users (it is an internal tool surface), but document it inDEVGUIDE.md.4. Engine tool provisioning + early termination —
pkg/engine/engine.go(plus newpkg/engine/tool.go)Extend the
Enginecontract so the CLI path can pass a sink and stop early. Change the interface (only three implementations exist:copilotEngine,claudeEngine,codexEngine;ReflectClientdoes not implementEngine):New helpers in
pkg/engine/tool.go:Wrapper script content (written 0o700 into the temp dir as
threat_detection_result), using the resolved binary path fromos.Executable():provisionResultToolenv additions:THREAT_DETECTION_RESULT_FILE=<sinkPath>PATH=<toolDir>+string(os.PathListSeparator)+ currentPATHEngine integration:
runCLIEnvvariant (e.g.runCLIEnvWithSink) that acceptssinkPath. WhensinkPath != "", derivectx, cancel := context.WithCancel(ctx), startwatchResultSink, and aftercmd.Run()returns: if the run failed anddetector.ReadResultFile(sinkPath)succeeds, treat it as success and return the captured stdout with anilerror (the process was intentionally killed because the verdict was already recorded). Only surface the subprocess error when no valid sink result exists.copilotEngine.Analyze,claudeEngine.Analyze,codexEngine.Analyze(pkg/engine/engine.go:49-77) threadopts.ResultSinkPathinto the new sink-aware run path and merge the provisioned env with the existing per-engine env (copilotEnv,pkg/engine/engine.go:130-135).Assumption: Copilot (
--allow-all-tools,pkg/engine/engine.go:96) and Codex (--dangerously-bypass-approvals-and-sandbox,pkg/engine/engine.go:149) can execute the wrapper. Whetherclaude --print(pkg/engine/engine.go:137-143) executes shell tools without an explicit--allowed-tools Bashgrant is unverified; see Unresolved questions. The stdout-line fallback (change 6) covers any engine that cannot run the wrapper.5. CLI wiring —
cmd/threat-detect/main.gocmd/threat-detect/main.go:144-157), create a sink path (e.g.os.CreateTemp("", "threat-detect-result-*.json"), then remove the empty file soReadResultFileonly succeeds once the tool writes it; defer cleanup of the path).analyzeWithRetries.analyzeWithRetries(cmd/threat-detect/main.go:160-182):Per attempt: remove any stale sink file first; call
eng.Analyze(ctx, currentPrompt, engine.AnalyzeOptions{ResultSinkPath: sinkPath}); then prefer the sink: ifdetector.ReadResultFile(sinkPath)succeeds, return it; else fall back todetector.ParseResult(rawOutput); on failure build the correction prompt exactly as today (pkg/detector/correction.go:22-25) and retry. The correction prompt text should instruct re-calling the tool.6. Prompt template —
pkg/detector/prompts/threat_detection.mdRewrite the "Response Format" section (
pkg/detector/prompts/threat_detection.md:70-81) so the primary instruction is to call the command exactly once, e.g.:Keep a clearly labeled fallback paragraph documenting the legacy single line
THREAT_DETECTION_RESULT:{...}for engines without shell access, preserving the existing boolean-type warning (pkg/detector/prompts/threat_detection.md:78-79). Do not alter the triage prompt (pkg/detector/prompts/threat_detection_triage.md), which is non-agentic by design (lines 11-16).7. Docs —
README.md,DEVGUIDE.mdreport-result/ thethreat_detection_resulttool,THREAT_DETECTION_RESULT_FILE, and the early-termination behavior in the CLI/usage sections ofREADME.mdand the maintainer notes inDEVGUIDE.md.Acceptance criteria
threat-detect report-resultexists and is dispatched before global flag parsing (cmd/threat-detect/main.go).THREAT_DETECTION_RESULT_FILEset, a validreport-resultcall writes canonical JSON (round-trips throughdetector.ParseStructuredResult) and exits 0 with a "recorded; stop now" message on stdout.report-resultcall (missing required boolean, wrong type, empty--reasonwhile a threat istrue) exits non-zero, prints a bounded actionableTHREAT_DETECTION_RESULT_ERROR:message, and does not create/modify the sink file.report-resultcall is idempotent: it does not overwrite the first recorded result and reports "already recorded; stop now".detector.WriteResultFilewrites atomically (temp + rename) with0o600;detector.ReadResultFilereturns a validated*Resultor an error.PATHand setsTHREAT_DETECTION_RESULT_FILEbefore invoking the engine.analyzeWithRetriesprefers the sink result over transcript scraping and only falls back todetector.ParseResultwhen the sink is absent/invalid.THREAT_DETECTION_RESULT:transcript fallback still works:pkg/detector/result.goand all existing tests inpkg/detector/result_test.goremain unchanged and passing.pkg/detector/prompts/threat_detection.md).make lint,make test, andscripts/create-threat-detection-sibling-workflows.py --checkpass.Tests / validation steps
pkg/detector/result_test.go(or a newresult_file_test.go):WriteResultFile/ReadResultFileround-trip for safe and threat results;ReadResultFileerrors on missing/empty/invalid files;ValidateReportFieldsrejects wrong-type and missing fields with the same messages asvalidateRawResult(pkg/detector/result.go:143-174).cmd/threat-detect/report_test.go: valid flags write the sink and exit 0; invalid flags exit non-zero, emitTHREAT_DETECTION_RESULT_ERROR, and leave no sink; missingTHREAT_DETECTION_RESULT_FILEexits the config code; idempotent second call.cmd/threat-detect/main_test.go: updatewriteFakeCopilot(cmd/threat-detect/main_test.go:118-134) and/or add a helper so the fake engine writes a valid JSON verdict to$THREAT_DETECTION_RESULT_FILE(simulating the model calling the tool). Assertrun()returns the sink-derived result even when stdout contains noTHREAT_DETECTION_RESULT:line. Add an early-termination test where the fake engine writes the sink then sleeps, and assert the run completes well under the sleep duration.pkg/engine/engine_test.go:provisionResultToolcreates an executable wrapper and the expectedPATH/THREAT_DETECTION_RESULT_FILEenv;watchResultSinkcancels the context once a valid sink file appears; the sink-aware run path returns success (nil error) when the process is killed but a valid sink result exists.make buildthen a Copilot/Codex agentic detection against a fixture (e.g.testdata/detection-only/) and confirm the model callsthreat_detection_result, the run terminates immediately on a valid call, and the JSON output matches the recorded sink.Design decisions and rejected alternatives
threat-detect(Dockerfile:14-30). Areport-resultsubcommand reuses the existing schema/validation code and needs no new dependency or image change. The thinthreat_detection_resultwrapper gives the model a stable, prompt-friendly command name onPATH.--disable-builtin-mcps,pkg/engine/engine.go:94), and tool-call wiring differs per engine. A shell command works uniformly wherever the engine can run shell (Copilot, Codex) and degrades gracefully via the line fallback elsewhere. An MCP-based result tool can be a follow-up.pkg/detector/result.go:117-119) for the tool path while keeping results stable and tamper-resistant.Unresolved questions
claude --print(pkg/engine/engine.go:137-143) execute shell tools (so it can call the wrapper) without an explicit--allowed-tools Bashgrant? If not, decide whether to add that grant or rely solely on the line fallback for Claude.report-resultaccept a full JSON object (--json/ stdin) in addition to discrete flags, for engines that find JSON easier to emit than multiple flags?gh-awintegration parse the detector's own stdout forTHREAT_DETECTION_RESULT:(vs. consuming the--outputJSON / exit code)? The detector emits canonical JSON viawriteResult(cmd/threat-detect/main.go:184-205) and the recorded sink is internal, so this is expected to be safe, but confirm before relying on it.watchResultSink(e.g. 200–300 ms) and whether to use filesystem notifications instead of polling.Implications
specs/threat-detection-spec.md:248): the verdict is recorded to a detector-written file rather than read back out of the same transcript that contains analyzed agent content.Engineinterface signature changes (AnalyzegainsAnalyzeOptions); all three engine implementations and their tests must be updated in lockstep.Out of scope or notes
/reflectstructured-output triage and full path (pkg/engine/reflect.go,cmd/threat-detect/main.go:94-142) are unchanged; they already use a synchronous validated schema contract.pkg/detector/prompts/threat_detection_triage.md) stays non-agentic and tool-free (lines 11-16).specs/threat-detection-spec.md:80-89,specs/threat-detection-spec.md:176-181), or the release/lifecycle machinery.report-resultsubcommand both use the already-shippedthreat-detectbinary.Steps to take after merging
scripts/create-threat-detection-sibling-workflows.pythen--check.THREAT_DETECTION_RESULT:fallback (separate spec/version-compatibility decision underspecs/threat-detection-spec.md).gh-awintegration owners before changing or removing the line-based contract, per the version-coupling guidance inREADME.md.