SENDA
Software Engineering with Natively Directed Agents
A complete framework for software teams operating with AI agents — covering roles, lifecycle phases, autonomy levels, governance metrics, and delivery standards.
5
Lifecycle Phases
5
Autonomy Levels
5
Defined Roles
5
Governance Metrics
The questions existing methodologies can't answer
Scrum, SAFe, and Kanban were designed for human-paced development. When AI agents enter the loop, every assumption breaks. Teams are left improvising governance in production.
Who reviews AI-generated code?
Existing methodologies assume developers write every line. When 70% of a sprint's output is AI-generated, code review becomes a bottleneck that no one is formally accountable for.
How do you measure quality when speed is infinite?
Velocity metrics collapse when an agent can generate 2,000 lines in an hour. Story points, burn-down charts, and throughput KPIs all become meaningless without a new contract with reality.
Who owns the technical debt AI introduces?
AI agents optimize for the immediate task. They hallucinate dependencies, ignore long-term architecture, and produce code that passes tests but violates implicit constraints no prompt ever captured.
Where do you draw the boundary between AI autonomy and human control?
Without a formal autonomy framework, teams either under-leverage AI (defaulting to manual everything) or over-trust it (shipping unreviewed output). Both failure modes are expensive.
The core philosophy
SENDA is built on three axioms that do not change regardless of which AI tools, which models, or which delivery framework you operate within.
01
Humans decide, agents execute.
Every architectural decision, every design trade-off, every acceptance of technical risk — these remain human responsibilities. AI agents are execution engines, not decision-makers. The moment you confuse the two, you have lost governance.
02
Every line of AI-generated code has a human owner.
Ownership is not the same as authorship. A developer who reviews, approves, and commits an AI-generated function owns that function with the same accountability as if they wrote it. There is no 'the AI did it' defense.
03
Speed is not productivity. Shipped quality is productivity.
AI agents can generate code faster than any human team. That speed is worthless — and actively dangerous — if it ships defects, accumulates unreviewed debt, or outpaces the team's ability to understand what it has built.
The SENDA Lifecycle
Five sequential phases, each with explicit entry conditions, defined outputs, and named human owners. Click any phase to explore it.
SCOPE
Replaces Sprint Planning
Define what the team is building, what the AI is allowed to generate, and what the boundaries of acceptable output are. SCOPE produces a Context Document that lives in the repository as code (/docs/context/*.md) — not a static PDF. It is version-controlled, injectable into agent system prompts, and triggers automatic alerts when updated. Without a Context Document, agents generate plausible code for the wrong problem.
The Autonomy Spectrum
SENDA defines five levels of AI autonomy (L0–L4). Each level specifies what the AI does, what the human does, and the conditions under which that level is appropriate. Teams do not operate at a single level — they calibrate per task type.
Manual
AI does
Nothing. AI is not used.
Human does
Everything.
When to use
Security-critical code, compliance boundaries, or areas where the team has no AI context established.
Assisted
AI does
Autocomplete, snippet suggestion, documentation drafting.
Human does
Reads, selects, accepts or rejects every AI suggestion. Writes all structural code.
When to use
Core business logic, any code touching money, auth, or data persistence.
Directed
AI does
Generates complete functions, tests, and documentation from structured prompts.
Human does
Writes the prompt, reviews all output, owns the commit. Intervenes on any deviation from Context Document.
When to use
Standard feature development, CRUD operations, integration code, test suite generation.
Supervised
AI does
Generates multi-file implementations from high-level task descriptions. Selects libraries, patterns, and structure within approved constraints.
Human does
Defines the task boundary. Reviews diff. Approves or rejects the complete implementation as a unit.
When to use
Scaffolding, boilerplate-heavy work, test fixtures, documentation systems. Only in areas with well-established Context Documents.
Autonomous
AI does
Executes multi-step workflows end-to-end, selects from pre-approved architectural patterns, and ships to staging without per-commit review. Does not make novel architectural decisions — only applies patterns the Architect has pre-authorized.
Human does
Pre-defines the approved pattern library and guardrails. Reviews aggregate output at phase boundaries. Retains veto authority on any deployment. Handles escalations.
When to use
Mature codebases with high test coverage, well-documented architecture, established SENDA governance history, and Trust Score above 90%. The Architect must explicitly authorize L4 per task scope. Not for greenfield work.
SENDA Roles
Every SENDA team has five defined roles with explicit boundaries. In small teams, individuals may hold multiple roles — but every responsibility must have a named owner. Unowned responsibilities are how governance fails.
Architect
Approves and owns the Context Document — accountable for its accuracy and completeness. Defines architectural guardrails, sets autonomy levels per task type, and designs multi-agent orchestration flows when applicable. The Architect does not write the Context Document; they review, challenge, and sign off on it.
Context Engineer
Drafts the Context Document by translating business requirements into structured, machine-usable briefs. The Context Engineer writes; the Architect approves. This separation ensures the person closest to the problem defines the scope, while the person with architectural authority validates it. The quality of this handoff determines everything downstream.
Reviewer
Executes the GOVERN phase through a tiered auditing model. An AI Auditor Agent performs the first-pass review, generating a Trust Score. When the Trust Score exceeds 90% and all automated tests pass, the human Reviewer performs a spot-check. Below that threshold, the Reviewer conducts a full line-by-line review. This scales governance without sacrificing quality.
Builder
Directs AI agents during the GENERATE phase. Writes and iterates on prompts at the task level, monitors output quality, and escalates blockers. The Builder's primary metric is Generation Correctness — not lines produced.
Product Owner
Participates in SCOPE and VALIDATE. Owns the acceptance criteria that agents are measured against. Ensures the team is building the right thing, not just building things correctly. Approves Context Documents before GENERATE begins.
Primary Governance Metrics
SENDA teams track five primary metrics — including economic efficiency. All targets shown are calibration baselines, not fixed rules. Teams establish their own thresholds during the first three REFLECT cycles, then tighten them as governance matures.
DER
Defect Escape Rate
Target
< 5%
Formula
Measures the quality of the GOVERN phase. SENDA defines three defect classes: P1 (functional breakage or security vulnerability), P2 (architectural violation or implicit constraint breach), and P3 (style deviation or naming inconsistency). Only P1 and P2 count toward DER — P3 issues are tracked separately. The 5% baseline comes from industry-standard defect containment rates; teams calibrate their own threshold during the first three REFLECT cycles.
GC
Generation Correctness
Target
> 70%
Formula
Measures prompt quality and agent calibration. An artifact is "accepted" if it passes GOVERN without P1/P2 changes — cosmetic naming or formatting tweaks (P3) do not count as rejection. Pre-commit prompt iterations are not counted; only the artifact submitted to the Reviewer enters the formula. The 70% baseline is a starting point — teams should track their trend across cycles and tighten the target as Context Documents mature.
ICR
Intervention Capture Rate
Target
> 95%
Formula
The single most important SENDA metric. It measures whether the governance process is actually catching problems before they reach production. A low ICR means governance is theater, not control. This metric is non-negotiable — it is the one number that tells you if SENDA is working or failing.
CDS
Context Document Score
Target
> 80 / 100
Formula
A structured, rubric-based assessment of Context Document quality. Each dimension is scored 0–100 using a checklist: Specificity (are requirements unambiguous?), Completeness (are edge cases addressed?), Constraint Coverage (are boundaries explicit?), Validation Coverage (are acceptance criteria testable?). Scored by the Reviewer during REFLECT using a standardized rubric — not subjective opinion.
TROI
Token ROI
Target
> 1.5
Formula
Measures the economic efficiency of AI agent usage. "Baseline Human Hours" is not a guess — it is derived from the team's historical velocity on comparable tasks before AI adoption (or from industry benchmarks for standard task types like CRUD, test generation, or documentation). Teams without historical data use conservative multipliers (2× for scaffolding, 1.5× for logic, 1× for novel architecture). Calibrate quarterly.
SENDA Starter Kit
You don't need a consultant to start. Adopt these three practices in one week and you'll have more governance than 90% of teams using AI today. When you're ready to scale, we're here.
01
Start with the Context Document
Before your next task, write a one-page brief: what you're building, what the AI is allowed to touch, what the acceptance criteria are. Use a simple template in your repo (/docs/context/). Don't optimize the format — just start writing constraints down. This alone eliminates the majority of AI hallucination problems.
You stop generating code for the wrong problem.
02
Label your autonomy levels
Tag each task type in your backlog with an autonomy level: L1 for auth/payments, L2 for standard features, L3 for scaffolding. Don't use L0 or L4 yet — start in the middle. The act of labeling forces a conversation about where AI should and shouldn't operate. That conversation is the governance.
Your team has explicit, shared rules about AI boundaries.
03
Track one metric: ICR
After one week of labeled work, count: how many issues did your review process catch before shipping, vs. how many escaped to production? That ratio is your Intervention Capture Rate. If it's below 90%, your review process has gaps. You don't need dashboards — a spreadsheet works. The goal is to make governance measurable, not perfect.
You have a number that tells you if governance is working.
Service Tiers
Three engagement models designed to meet teams at their current maturity level and deliver SENDA governance in a way that sticks.
Assessment
Understand where you stand.
- Current workflow and tooling audit
- Autonomy level calibration workshop
- Governance gap analysis
- SENDA readiness scorecard
- Recommended implementation roadmap
Implementation
Stand up SENDA in your team.
- Full SENDA lifecycle setup
- Context Document templates and training
- Role assignment and RACI definition
- Metric tracking infrastructure
- Three complete SENDA cycles with embedded support
- Reviewer and Context Engineer training
Managed Governance
Continuous oversight and optimization.
- Monthly REFLECT facilitation
- Metric review and autonomy recalibration
- Context Document quality reviews
- Incident response for governance failures
- Quarterly methodology updates
Adaptive Governance Modes
SENDA is not one-size-fits-all. The same five phases apply everywhere, but the governance weight adapts to the project context. Standard mode for regulated environments; Flash mode for high-iteration product work. Same framework, different intensity.
Standard Mode
Full Governance
Flash Mode
Automated Governance
Documentation
Full Context Document — versioned in repo, reviewed before GENERATE.
Micro-briefs embedded in code (inline Markdown). Max 50 lines of scope per task.
Governance
Tiered human review with Trust Score thresholds. Reviewer signs off on every artifact.
Same Trust Score model, but above 90% the human reviews architecture diffs only. Same metrics, lighter touch.
Cycle Time
1–2 week cycles with structured REFLECT phases.
Sub-day cycles. REFLECT is automated via CI metrics dashboard. Human REFLECT monthly.
Failure Response
Metric-driven recalibration in REFLECT.
Adds FCT (Failure to Commit Time) — measures lag between generation and test rejection. High FCT auto-downgrades autonomy. Feeds back into the same DER/GC/ICR metrics.
Ideal For
Regulated industries, infrastructure, compliance-heavy environments.
Product teams, SaaS iteration, MVPs — where governance must exist but cannot slow deployment below daily.
Guardrail Automático
Integration tests and static analysis replace line-by-line review. If code passes all automated gates, it ships to staging. Humans review architecture, not syntax.
Micro-Contexts
Instead of 20-page Context Documents, atomized prompts scope each task to max 50 lines of logic. This keeps agents in L2 (Directed) and prevents hallucination at scale.
FCT Metric
Failure to Commit Time — how long between AI generation and test rejection. High FCT triggers an instant autonomy downgrade without waiting for the REFLECT phase.
Ready to implement SENDA?
Your team is already using AI. The question is whether it is governed or improvised. SENDA gives you the framework to answer that question with confidence.