Contracts and Schemas
This document defines the canonical contracts for diffsan: schemas, artifact files, events, and error codes.
Agents should treat these as authoritative. If you change these, update this doc and the corresponding code.
Artifact directory and files¶
All runs create a workdir (default name TBD, e.g. .diffsan/).
Minimum artifacts (MVP v0):
run.json— final status + error info (always written)events.jsonl— structured events (always written)diff.raw.patch— raw diff text (as obtained)diff.prepared.patch— ignored/prioritized/truncated/redacted diff used in prompttruncation.json— truncation reportredaction.json— secret scanning/redaction reportprior_digest.json— extracted digest from prior bot summary notes + inline discussions (if any)prompt.txt— prompt passed to agentagent.raw.txt— raw agent stdout (may be invalid JSON)agent.raw.attemptN.txt— per-attempt raw agent stdout for retry debuggingreview.json— validatedReviewOutputJSONpost_plan.json— what we intended to post (summary + discussions)post_results.json— posting outcomes (IDs/errors)
Core schemas (Pydantic)¶
AppConfig (merged config)¶
Fields are grouped for clarity. Exact defaults are implementation-defined but should be opinionated and reasonable.
agent.agent is an enum with values: cursor or codex.
{
"workdir": ".diffsan",
"note_timezone": "system-local",
"mode": { "ci": false },
"limits": {
"max_diff_chars": 200000,
"max_files": 60,
"max_hunks_per_file": 40
},
"truncation": {
"priority_extensions": [
".py",
".js",
".ts",
".go",
".java",
".rb",
".php",
".rs"
],
"depriority_extensions": [".md", ".rst", ".txt", ".lock"],
"include_extensions": null,
"ignore_globs": ["docs/**", "**/*.generated.*"]
},
"secrets": {
"enabled": true,
"extra_patterns": [],
"post_warning_to_mr": true
},
"skip": {
"skip_on_auto_merge": true,
"skip_on_same_fingerprint": true
},
"agent": {
"agent": "cursor",
"cursor_command": null,
"codex_command": null,
"proxy_url": null,
"max_json_retries": 3,
"json_repair_prompt": "Return ONLY valid JSON that matches the schema.",
"verbosity": "medium",
"skills": [],
"prompt_template": null
},
"gitlab": {
"enabled": true,
"base_url": "https://gitlab.com",
"project_id": "12345",
"mr_iid": 67,
"token_env": "GITLAB_TOKEN",
"idempotent_summary": false,
"summary_note_tag": "ai-reviewer",
"retry_max": 3
},
"logging": { "level": "info", "structured": true }
}
agent.proxy_url is Codex-only. If it is set while agent.agent != "codex", config validation must fail.
DiffBundle (diff acquisition)¶
{
"source": {
"kind": "git-diff",
"ref": {
"target_branch": "main",
"source_branch": "feature-1",
"base_sha": "aaa111",
"head_sha": "bbb222"
}
},
"raw_diff": "diff --git a/... b/...\n...",
"files": [
{
"path": "app/auth.py",
"additions": 10,
"deletions": 2,
"is_binary": false
}
]
}
TruncationReport¶
{
"truncated": true,
"original_chars": 340000,
"final_chars": 200000,
"original_files": 120,
"final_files": 60,
"items": [
{
"kind": "file",
"path": "docs/guide.md",
"details": "Dropped (deprioritized extension)"
},
{
"kind": "chars",
"path": null,
"details": "Stopped at max_diff_chars=200000"
}
]
}
RedactionReport¶
Important: never store raw secrets. Only hashes/lengths.
{
"enabled": true,
"found": true,
"matches": [
{
"pattern_name": "AWS_ACCESS_KEY_ID",
"path": "app/config.py",
"line_hint": 12,
"match_sha256": "3d5e...",
"match_length": 20
}
],
"redaction_token": "[REDACTED]"
}
PreparedDiff¶
{
"prepared_diff": "diff --git a/... b/...\n... [REDACTED] ...",
"truncation": { "...": "see TruncationReport" },
"redaction": { "...": "see RedactionReport" },
"ignored_paths": ["docs/guide.md"],
"included_paths": ["app/auth.py", "app/config.py"]
}
Fingerprint¶
PriorDigest (compact memory to avoid repeats)¶
{
"prior_fingerprint": { "algo": "sha256", "value": "1111..." },
"findings": [
{
"finding_id": "f-9c21...",
"path": "app/auth.py",
"line_range": "88-106",
"title": "Avoid eval() on user input",
"severity": "high"
}
],
"summary_hint": "Previously flagged unsafe eval and missing tests.",
"summaries": [
{
"note_id": 98765,
"text": "### AI Review Summary\n- Prior summary markdown..."
}
],
"inline_comments": [
{
"discussion_id": "a1b2c3",
"note_id": 24680,
"path": "app/auth.py",
"line": 95,
"resolved": false,
"body": "Please avoid eval() here."
}
]
}
SkipDecision¶
{
"should_skip": false,
"reasons": [],
"fingerprint": { "algo": "sha256", "value": "b7d4..." },
"prior_digest": { "...": "see PriorDigest" }
}
Agent contracts¶
AgentRequest (internal)¶
Contains the prompt plus metadata used for artifacts/logging. - For cursor, the prompt includes strict JSON-only rules and embedded schema. - For codex, prompt omits JSON-only/schema sections because structured output is enforced by CLI flags.
{
"prompt": "You are diffsan, an AI code reviewer...\nReturn ONLY JSON...\nSCHEMA: ...\nDIFF:\n...",
"meta": {
"fingerprint": { "algo": "sha256", "value": "b7d4..." },
"truncation": { "truncated": true, "...": "..." },
"redaction_found": false,
"agent": "cursor",
"verbosity": "high",
"skills": ["security", "python"]
}
}
AgentReviewOutput (agent must emit this JSON)¶
- Cursor path is unstructured, so diffsan validates and may retry/repair.
- Codex path uses
--output-schemastructured output, and diffsan still validates before continuing.
{
"summary_markdown": "### AI Review Summary\n- ...\n\n<details><summary>Truncation</summary>\n...\n</details>",
"findings": [
{
"finding_id": "optional",
"severity": "high",
"category": "security",
"path": "app/auth.py",
"line_start": 88,
"line_end": 106,
"body_markdown": "User-controlled input is passed to `eval()`.",
"suggested_patch": {
"format": "unified-diff",
"content": "--- a/app/auth.py\n+++ b/app/auth.py\n@@\n- eval(expr)\n+ ast.literal_eval(expr)\n"
}
}
]
}
ReviewOutput (final validated artifact written by diffsan)¶
diffsan composes ReviewOutput after parsing AgentReviewOutput, and populates meta
outside the agent using runtime context:
fingerprintfromsha256(raw diff)agentfrom runtime configtimingsfrom agent execution timingtoken_usagebest-effort (empty if unavailable)truncatedandredaction_foundfrom preprocessor results
Finding fields (minimum required)¶
Each finding must include:
severity: info|low|medium|high|criticalcategory: correctness|security|performance|maintainability|style|testing|docs|otherpath: file pathline_start,line_end: integer line numbersbody_markdown: text- optional
suggested_patch
Posting contracts (internal)¶
PostPlan¶
{
"summary_markdown": "### AI Review Summary\n...",
"summary_meta_collapsible": "<details><summary>Metadata</summary>\n...\n</details>",
"discussions": [
{
"path": "app/auth.py",
"line_start": 88,
"line_end": 106,
"body_markdown": "**[security/high]** ...",
"position": {
"position_type": "text",
"base_sha": "aaa111",
"head_sha": "bbb222",
"start_sha": "aaa111",
"new_path": "app/auth.py",
"new_line": 95
},
"severity": "high",
"category": "security"
}
],
"idempotent_summary": false,
"prior_summary_note_id": null
}
PostResults¶
{
"ok": true,
"items": [
{
"kind": "summary_note",
"ok": true,
"http_status": 201,
"gitlab_id": 987654,
"retry_count": 0
},
{
"kind": "discussion",
"ok": true,
"http_status": 201,
"gitlab_id": 1234567,
"retry_count": 0
}
]
}
Summary note body should include:
- tag marker:
<!-- diffsan:<summary_note_tag> --> - fingerprint marker:
<!-- diffsan:fingerprint:<algo>:<value> --> - prior digest marker (when available):
<!-- diffsan:prior_digest:<base64-json> -->
Structured events (events.jsonl)¶
Each line is a JSON object:
{
"ts": "2026-02-10T12:01:00Z",
"level": "info",
"event": "diff.fetched",
"data": { "chars": 18342, "files": 12 }
}
Suggested events (MVP):
run.started,run.finishedconfig.loadeddiff.fetcheddiff.prepared(includes truncated/redaction flags)skip.decided(reasons)prompt.writtenagent.attemptreview.validatedpost.plan_builtgitlab.post.summarygitlab.post.discussionerror.raised
Errors (run.json + exceptions)¶
ErrorInfo (stored in run.json)¶
{
"error_code": "AGENT_OUTPUT_INVALID",
"message": "Failed to obtain valid JSON output after 3 attempts",
"retryable": false,
"context": { "attempts": 3, "agent": "cursor" },
"cause": "pydantic.ValidationError: ..."
}
The attempts field is typically present for cursor retry exhaustion and may be absent for single-attempt codex failures.
Suggested error codes¶
CONFIG_PARSE_ERRORDIFF_FETCH_FAILEDDIFF_PARSE_FAILEDREDACTION_ENGINE_FAILEDGITLAB_FETCH_PRIOR_FAILEDAGENT_EXEC_FAILEDAGENT_OUTPUT_INVALIDFORMAT_FAILEDGITLAB_AUTH_ERRORGITLAB_POSITION_INVALIDGITLAB_POST_FAILED
Contract change rules¶
If any schema, artifact name, event name, or error code changes:
- Update this doc (
02-contracts-and-schemas.md) - Update code in
src/diffsan/contracts/*