Architecture
Overview¶
diffsan is a single-process (monolithic) CLI with clear internal module boundaries. It is designed for:
- strong debuggability (artifacts always written),
- robust agent handling (Cursor retry/repair path and Codex structured-output path),
- easy extension later (additional skip rules, agents, GitHub support).
The monolith is structured as a pipeline of modules with contracts defined in 02-contracts-and-schemas.md.
High-level components (internal modules)¶
- ConfigLoader: merge defaults + repo config + env + CLI args
- DiffProvider: obtain MR diff (CI path is primary)
- Preprocessor: ignore/prioritize/truncate + secret scan/redact
- Fingerprinting: sha256(raw diff), deterministic finding IDs (optional)
- PriorDigestResolver: fetch prior bot summary notes + inline discussions and extract digest
- SkipEngine: decide whether to skip (MVP: auto-merge)
- PromptBuilder: build agent prompt and inject diff + digest + flags (schema/rules are agent-dependent)
- AgentRunner (Cursor/Codex): run selected agent CLI; cursor uses retry/repair, codex uses structured single-attempt execution
- Parser/Validator: parse agent output to strict JSON and validate with Pydantic
- Formatter: render summary markdown + collapsible metadata and truncation
- GitLabPoster: post summary note and inline discussions with retries
- Artifacts/Events: write prompt/output/review + structured events JSONL
Data flow (CI mode)¶
load_config()->AppConfigget_diff()->DiffBundle+ writediff.raw.patchprepare_diff()->PreparedDiff+ writediff.prepared.patch,truncation.json,redaction.jsoncompute_fingerprint(raw_diff)->Fingerprintget_prior_digest()->PriorDigest | None+ writeprior_digest.jsondecide_skip()->SkipDecision- if skip: write
run.jsonok=true with skip reason; exit 0
- if skip: write
build_agent_request()->AgentRequest+ writeprompt.txtrun_agent()->AgentRawResponse+ReviewOutput- cursor: retry/repair loop (
run_agent_with_retries()) - codex: single-attempt structured run (
run_codex_once()) - write
agent.raw.txt(and optionally per-attempt outputs)
- cursor: retry/repair loop (
validate_review()->ReviewOutput+ writereview.jsonbuild_post_plan()->PostPlan+ writepost_plan.jsonpost_to_gitlab()->PostResults+ writepost_results.json- Write
run.jsonandevents.jsonlthroughout
Standalone mode is minimal:
- acquire diff locally (simple)
- run agent and validate
- print summary to stdout
- no GitLab posting
Invariants (must hold)¶
- Artifacts must be written even on failure (prompt/raw output/review when available).
- Secret redaction occurs before prompting.
- Agent output must be validated as strict JSON before posting.
- Cursor path requires repair retries; Codex path uses single-attempt structured output.
- Avoid spam: verbosity configurable; inject compact prior digest; avoid repeating prior findings.
- Tool exits non-zero on failures (pipeline can be configured allow-failure).
Extensibility points (future)¶
- Additional agents beyond Cursor/Codex via adapter modules
- Additional forges (GitHub) by swapping posting client and MR context
- Additional skip rules (draft/WIP, docs-only, etc.)
- More sophisticated diff selection/truncation (risk-based sampling)