Contribution type
Feature request
What do you want to contribute?
Implement Stage 4 of the threat detection extraction: comprehensive testing and validation of the containerized threat detection integration after github/gh-aw switches to the external github/gh-aw-threat-detection container.
Stages 1 and 2 are complete. This issue tracks the post-integration testing work for Stage 4, including spec compliance, integration tests, regression tests, and end-to-end workflow validation.
Agent analysis and findings
Issue #4 defined Stage 4 as extensive testing after the parent repository is updated. The original testing scope included:
- spec compliance for TD-01 through TD-14
- integration tests using existing test workflows
- regression tests for existing threat detection behavior
- end-to-end workflow compilation and execution with container detection enabled
The standalone component now has local tests and container smoke validation, but Stage 4 needs to validate the composed system across repository boundaries:
github/gh-aw-threat-detection provides the CLI/container contract.
github/gh-aw compiles workflows that invoke the container.
- GitHub Actions executes the compiled workflows with realistic artifacts.
- Safe outputs remain blocked or allowed based on threat detection results.
Relevant existing test areas from the original issue:
pkg/workflow/threat_detection_test.go
pkg/workflow/threat_detection_file_access_test.go
pkg/workflow/threat_detection_isolation_test.go
pkg/workflow/cache_memory_threat_detection_test.go
pkg/workflow/safe_jobs_threat_detection_test.go
pkg/workflow/detection_success_test.go
pkg/cli/workflows/test-ollama-threat-detection.md
specs/security-architecture-spec.md section 9, TD-01 through TD-14
- the extracted
specs/threat-detection-spec.md in this repository
Stage 4 should happen after, or alongside the final PR for, Stage 3 because it depends on the parent repo generating container-based detection workflows.
Use case and expected behavior
As a maintainer, I want comprehensive validation of the containerized threat detection flow so that replacing inline detection does not weaken safe-output protections, break workflow compilation, or regress documented security requirements.
Expected behavior after implementation:
- Containerized detection satisfies the extracted threat detection specification.
- Parent-repo workflow compilation tests cover the new container-based detection job.
- Existing threat detection regression tests pass after being updated for the new architecture.
- End-to-end workflow tests demonstrate safe, threatened, and infrastructure-error outcomes.
- Safe output jobs are allowed only when detection succeeds safely.
- Test coverage includes artifact handling, patch/bundle handling, cache/comment memory interactions, and detection result parsing behavior.
Complete step-by-step agentic plan
-
Read the extracted specification in this repository:
specs/threat-detection-spec.md
-
Read the original parent specification requirements:
github/gh-aw specs/security-architecture-spec.md section 9, TD-01 through TD-14
-
Build a Stage 4 test matrix that maps each TD requirement to at least one validation method:
- unit test
- integration test
- workflow compilation/golden test
- end-to-end workflow execution test
- manual validation if automation is not feasible
-
In github/gh-aw-threat-detection, verify standalone component coverage:
- CLI result parsing
- prompt construction
- artifact discovery
- exit code behavior
- Docker smoke behavior
- supported engine command construction
-
In github/gh-aw, update existing threat detection tests for the containerized workflow shape:
- detection job generation
- safe output dependency wiring
- lock/schema metadata
- Docker image pre-pull behavior
- artifact upload/download behavior
-
Add or update integration tests for representative workflows:
- threat detection enabled with safe output allowed
- threat detection enabled with threat detected
- threat detection enabled with infrastructure/configuration failure
- threat detection disabled
- workflows with patches/bundles
- workflows with comment memory/cache memory
-
Add or update end-to-end workflow execution tests:
- run the compiled workflow or a representative test workflow against the published/pinned container image
- use deterministic or stubbed engine behavior where possible
- verify detection JSON output and safe output gating
-
Validate the Ollama/LlamaGuard path if it remains supported:
- update
pkg/cli/workflows/test-ollama-threat-detection.md or equivalent tests
- confirm whether Ollama remains a custom-step pattern or needs a dedicated test path
-
Add regression tests for compatibility-sensitive behavior:
- artifact names
- legacy detection artifact names if still supported
- exit code mapping
- safe output job conditions
- generated workflow
needs: ordering
-
Run validation suites:
make test in this repository
make docker-smoke in this repository
- parent-repo Go tests
- parent-repo workflow/golden tests
- any available end-to-end workflow tests
-
Document the Stage 4 validation status:
- list automated coverage
- list manual checks, if any
- link any remaining gaps to follow-up issues
Specific implementation details and examples
Suggested Stage 4 requirement-to-test matrix:
| Area |
Validation |
| Prompt construction |
Unit tests in this repository |
| Result parsing |
Unit tests in this repository |
| Artifact discovery |
Unit tests in this repository plus parent integration tests |
| Container invocation |
Parent workflow generation/golden tests |
| Exit codes |
CLI tests and parent workflow behavior tests |
| Safe output gating |
Parent safe-output dependency tests and E2E tests |
| Lock metadata |
Parent lock/schema tests |
| Docker pre-pull |
Parent Docker image collection tests |
| Spec TD-01 through TD-14 |
Compliance checklist mapped to automated tests |
Representative scenarios to cover:
{
"prompt_injection": false,
"secret_leak": false,
"malicious_patch": false,
"reasons": []
}
{
"prompt_injection": true,
"secret_leak": false,
"malicious_patch": false,
"reasons": ["Prompt attempts to override system instructions"]
}
- Infrastructure error:
- missing artifacts directory
- unsupported engine
- engine CLI missing or failing
- invalid model output without
THREAT_DETECTION_RESULT
Relevant files/areas:
- this repository:
specs/threat-detection-spec.md
cmd/threat-detect/main.go
pkg/artifacts/
pkg/detector/
pkg/engine/
Dockerfile
.github/workflows/ci.yml
.github/workflows/release.yml
- parent
github/gh-aw repository:
- threat detection workflow tests
- safe output tests
- workflow compiler/golden tests
- workflow execution tests
- security architecture spec tests, if present
Acceptance criteria:
- Stage 4 has a documented test matrix for TD-01 through TD-14.
- Automated tests cover the critical containerized detection paths.
- Existing parent-repo threat detection regression tests are updated and pass.
- End-to-end validation covers safe, threat, and infrastructure-error outcomes.
- Safe output gating is verified with containerized detection enabled.
- Any remaining manual validation or unsupported edge cases are documented in follow-up issues.
Validation ideas:
- Run
make test and make docker-smoke in this repository.
- Run all updated parent-repo tests.
- Compile representative workflows and inspect generated YAML for the pinned container image.
- Execute at least one representative workflow path using deterministic engine output or a stubbed engine.
- Confirm tests fail if the detection container exits with threat or infrastructure status unexpectedly.
Suggested labels
Contributor checklist
Contribution type
Feature request
What do you want to contribute?
Implement Stage 4 of the threat detection extraction: comprehensive testing and validation of the containerized threat detection integration after
github/gh-awswitches to the externalgithub/gh-aw-threat-detectioncontainer.Stages 1 and 2 are complete. This issue tracks the post-integration testing work for Stage 4, including spec compliance, integration tests, regression tests, and end-to-end workflow validation.
Agent analysis and findings
Issue #4 defined Stage 4 as extensive testing after the parent repository is updated. The original testing scope included:
The standalone component now has local tests and container smoke validation, but Stage 4 needs to validate the composed system across repository boundaries:
github/gh-aw-threat-detectionprovides the CLI/container contract.github/gh-awcompiles workflows that invoke the container.Relevant existing test areas from the original issue:
pkg/workflow/threat_detection_test.gopkg/workflow/threat_detection_file_access_test.gopkg/workflow/threat_detection_isolation_test.gopkg/workflow/cache_memory_threat_detection_test.gopkg/workflow/safe_jobs_threat_detection_test.gopkg/workflow/detection_success_test.gopkg/cli/workflows/test-ollama-threat-detection.mdspecs/security-architecture-spec.mdsection 9, TD-01 through TD-14specs/threat-detection-spec.mdin this repositoryStage 4 should happen after, or alongside the final PR for, Stage 3 because it depends on the parent repo generating container-based detection workflows.
Use case and expected behavior
As a maintainer, I want comprehensive validation of the containerized threat detection flow so that replacing inline detection does not weaken safe-output protections, break workflow compilation, or regress documented security requirements.
Expected behavior after implementation:
Complete step-by-step agentic plan
Read the extracted specification in this repository:
specs/threat-detection-spec.mdRead the original parent specification requirements:
github/gh-awspecs/security-architecture-spec.mdsection 9, TD-01 through TD-14Build a Stage 4 test matrix that maps each TD requirement to at least one validation method:
In
github/gh-aw-threat-detection, verify standalone component coverage:In
github/gh-aw, update existing threat detection tests for the containerized workflow shape:Add or update integration tests for representative workflows:
Add or update end-to-end workflow execution tests:
Validate the Ollama/LlamaGuard path if it remains supported:
pkg/cli/workflows/test-ollama-threat-detection.mdor equivalent testsAdd regression tests for compatibility-sensitive behavior:
needs:orderingRun validation suites:
make testin this repositorymake docker-smokein this repositoryDocument the Stage 4 validation status:
Specific implementation details and examples
Suggested Stage 4 requirement-to-test matrix:
Representative scenarios to cover:
{ "prompt_injection": false, "secret_leak": false, "malicious_patch": false, "reasons": [] }{ "prompt_injection": true, "secret_leak": false, "malicious_patch": false, "reasons": ["Prompt attempts to override system instructions"] }THREAT_DETECTION_RESULTRelevant files/areas:
specs/threat-detection-spec.mdcmd/threat-detect/main.gopkg/artifacts/pkg/detector/pkg/engine/Dockerfile.github/workflows/ci.yml.github/workflows/release.ymlgithub/gh-awrepository:Acceptance criteria:
Validation ideas:
make testandmake docker-smokein this repository.Suggested labels
Contributor checklist