A Governance-First Methodology by Sturion AI

SENDA

Software Engineering with Natively Directed Agents

A complete framework for software teams operating with AI agents — covering roles, lifecycle phases, autonomy levels, governance metrics, and delivery standards.

Explore the Lifecycle Implement SENDA

Lifecycle Phases

Autonomy Levels

Defined Roles

Governance Metrics

The Challenge

The questions existing methodologies can't answer

Scrum, SAFe, and Kanban were designed for human-paced development. When AI agents enter the loop, every assumption breaks. Teams are left improvising governance in production.

Who reviews AI-generated code?

Existing methodologies assume developers write every line. When 70% of a sprint's output is AI-generated, code review becomes a bottleneck that no one is formally accountable for.

How do you measure quality when speed is infinite?

Velocity metrics collapse when an agent can generate 2,000 lines in an hour. Story points, burn-down charts, and throughput KPIs all become meaningless without a new contract with reality.

Who owns the technical debt AI introduces?

AI agents optimize for the immediate task. They hallucinate dependencies, ignore long-term architecture, and produce code that passes tests but violates implicit constraints no prompt ever captured.

Where do you draw the boundary between AI autonomy and human control?

Without a formal autonomy framework, teams either under-leverage AI (defaulting to manual everything) or over-trust it (shipping unreviewed output). Both failure modes are expensive.

First Principles

The core philosophy

SENDA is built on three axioms that do not change regardless of which AI tools, which models, or which delivery framework you operate within.

Humans decide, agents execute.

Every architectural decision, every design trade-off, every acceptance of technical risk — these remain human responsibilities. AI agents are execution engines, not decision-makers. The moment you confuse the two, you have lost governance.

Every line of AI-generated code has a human owner.

Ownership is not the same as authorship. A developer who reviews, approves, and commits an AI-generated function owns that function with the same accountability as if they wrote it. There is no 'the AI did it' defense.

Speed is not productivity. Shipped quality is productivity.

AI agents can generate code faster than any human team. That speed is worthless — and actively dangerous — if it ships defects, accumulates unreviewed debt, or outpaces the team's ability to understand what it has built.

The Methodology

The SENDA Lifecycle

Five sequential phases, each with explicit entry conditions, defined outputs, and named human owners. Click any phase to explore it.

Phase 1 / 5

SCOPE

Replaces Sprint Planning

Define what the team is building, what the AI is allowed to generate, and what the boundaries of acceptable output are. SCOPE produces a Context Document that lives in the repository as code (/docs/context/*.md) — not a static PDF. It is version-controlled, injectable into agent system prompts, and triggers automatic alerts when updated. Without a Context Document, agents generate plausible code for the wrong problem.

Replaces

Sprint Planning

Governance Framework

The Autonomy Spectrum

SENDA defines five levels of AI autonomy (L0–L4). Each level specifies what the AI does, what the human does, and the conditions under which that level is appropriate. Teams do not operate at a single level — they calibrate per task type.

Manual

AI does

Nothing. AI is not used.

Human does

Everything.

When to use

Security-critical code, compliance boundaries, or areas where the team has no AI context established.

Assisted

AI does

Autocomplete, snippet suggestion, documentation drafting.

Human does

Reads, selects, accepts or rejects every AI suggestion. Writes all structural code.

When to use

Core business logic, any code touching money, auth, or data persistence.

Directed

AI does

Generates complete functions, tests, and documentation from structured prompts.

Human does

Writes the prompt, reviews all output, owns the commit. Intervenes on any deviation from Context Document.

When to use

Standard feature development, CRUD operations, integration code, test suite generation.

Supervised

AI does

Generates multi-file implementations from high-level task descriptions. Selects libraries, patterns, and structure within approved constraints.

Human does

Defines the task boundary. Reviews diff. Approves or rejects the complete implementation as a unit.

When to use

Scaffolding, boilerplate-heavy work, test fixtures, documentation systems. Only in areas with well-established Context Documents.

Autonomous

AI does

Executes multi-step workflows end-to-end, selects from pre-approved architectural patterns, and ships to staging without per-commit review. Does not make novel architectural decisions — only applies patterns the Architect has pre-authorized.

Human does

Pre-defines the approved pattern library and guardrails. Reviews aggregate output at phase boundaries. Retains veto authority on any deployment. Handles escalations.

When to use

Mature codebases with high test coverage, well-documented architecture, established SENDA governance history, and Trust Score above 90%. The Architect must explicitly authorize L4 per task scope. Not for greenfield work.

Team Structure

SENDA Roles

Every SENDA team has five defined roles with explicit boundaries. In small teams, individuals may hold multiple roles — but every responsibility must have a named owner. Unowned responsibilities are how governance fails.

Strategy Layer

Architect

Approves and owns the Context Document — accountable for its accuracy and completeness. Defines architectural guardrails, sets autonomy levels per task type, and designs multi-agent orchestration flows when applicable. The Architect does not write the Context Document; they review, challenge, and sign off on it.

Prompt Layer

Context Engineer

Drafts the Context Document by translating business requirements into structured, machine-usable briefs. The Context Engineer writes; the Architect approves. This separation ensures the person closest to the problem defines the scope, while the person with architectural authority validates it. The quality of this handoff determines everything downstream.

Governance Layer

Reviewer

Executes the GOVERN phase through a tiered auditing model. An AI Auditor Agent performs the first-pass review, generating a Trust Score. When the Trust Score exceeds 90% and all automated tests pass, the human Reviewer performs a spot-check. Below that threshold, the Reviewer conducts a full line-by-line review. This scales governance without sacrificing quality.

Execution Layer

Builder

Directs AI agents during the GENERATE phase. Writes and iterates on prompts at the task level, monitors output quality, and escalates blockers. The Builder's primary metric is Generation Correctness — not lines produced.

Validation Layer

Product Owner

Participates in SCOPE and VALIDATE. Owns the acceptance criteria that agents are measured against. Ensures the team is building the right thing, not just building things correctly. Approves Context Documents before GENERATE begins.

Measurement

Primary Governance Metrics

SENDA teams track five primary metrics — including economic efficiency. All targets shown are calibration baselines, not fixed rules. Teams establish their own thresholds during the first three REFLECT cycles, then tighten them as governance matures.

DER

Defect Escape Rate

Target

< 5%

Formula

Defects found post-GOVERNTotal AI-generated artifacts reviewed

×100

Measures the quality of the GOVERN phase. SENDA defines three defect classes: P1 (functional breakage or security vulnerability), P2 (architectural violation or implicit constraint breach), and P3 (style deviation or naming inconsistency). Only P1 and P2 count toward DER — P3 issues are tracked separately. The 5% baseline comes from industry-standard defect containment rates; teams calibrate their own threshold during the first three REFLECT cycles.

Generation Correctness

Target

> 70%

Formula

Artifacts accepted at commitTotal artifacts submitted to GOVERN

Measures prompt quality and agent calibration. An artifact is "accepted" if it passes GOVERN without P1/P2 changes — cosmetic naming or formatting tweaks (P3) do not count as rejection. Pre-commit prompt iterations are not counted; only the artifact submitted to the Reviewer enters the formula. The 70% baseline is a starting point — teams should track their trend across cycles and tighten the target as Context Documents mature.

ICR

Intervention Capture Rate

Target

> 95%

Formula

Interventions caught in GOVERNTotal interventions (including post-ship)

The single most important SENDA metric. It measures whether the governance process is actually catching problems before they reach production. A low ICR means governance is theater, not control. This metric is non-negotiable — it is the one number that tells you if SENDA is working or failing.

CDS

Context Document Score

Target

> 80 / 100

Formula

Specificity30%

+Completeness25%

+Constraints25%

+Validation20%

A structured, rubric-based assessment of Context Document quality. Each dimension is scored 0–100 using a checklist: Specificity (are requirements unambiguous?), Completeness (are edge cases addressed?), Constraint Coverage (are boundaries explicit?), Validation Coverage (are acceptance criteria testable?). Scored by the Reviewer during REFLECT using a standardized rubric — not subjective opinion.

TROI

Token ROI

Target

> 1.5

Formula

(Baseline Hours×$/hr)−Token Cost

Token Cost

Measures the economic efficiency of AI agent usage. "Baseline Human Hours" is not a guess — it is derived from the team's historical velocity on comparable tasks before AI adoption (or from industry benchmarks for standard task types like CRUD, test generation, or documentation). Teams without historical data use conservative multipliers (2× for scaffolding, 1.5× for logic, 1× for novel architecture). Calibrate quarterly.

Start Here

SENDA Starter Kit

You don't need a consultant to start. Adopt these three practices in one week and you'll have more governance than 90% of teams using AI today. When you're ready to scale, we're here.

Day 1–2

Start with the Context Document

Before your next task, write a one-page brief: what you're building, what the AI is allowed to touch, what the acceptance criteria are. Use a simple template in your repo (/docs/context/). Don't optimize the format — just start writing constraints down. This alone eliminates the majority of AI hallucination problems.

You stop generating code for the wrong problem.

Day 3–4

Label your autonomy levels

Tag each task type in your backlog with an autonomy level: L1 for auth/payments, L2 for standard features, L3 for scaffolding. Don't use L0 or L4 yet — start in the middle. The act of labeling forces a conversation about where AI should and shouldn't operate. That conversation is the governance.

Your team has explicit, shared rules about AI boundaries.

Week 2

Track one metric: ICR

After one week of labeled work, count: how many issues did your review process catch before shipping, vs. how many escaped to production? That ratio is your Intervention Capture Rate. If it's below 90%, your review process has gaps. You don't need dashboards — a spreadsheet works. The goal is to make governance measurable, not perfect.

You have a number that tells you if governance is working.

Work with Sturion

Service Tiers

Three engagement models designed to meet teams at their current maturity level and deliver SENDA governance in a way that sticks.

Assessment

Understand where you stand.

Engagement/2 weeks

Current workflow and tooling audit
Autonomy level calibration workshop
Governance gap analysis
SENDA readiness scorecard
Recommended implementation roadmap

Start with an Assessment

Implementation

Stand up SENDA in your team.

Engagement/6–10 weeks

Full SENDA lifecycle setup
Context Document templates and training
Role assignment and RACI definition
Metric tracking infrastructure
Three complete SENDA cycles with embedded support
Reviewer and Context Engineer training

Implement SENDA

Managed Governance

Continuous oversight and optimization.

Retainer/Ongoing

Monthly REFLECT facilitation
Metric review and autonomy recalibration
Context Document quality reviews
Incident response for governance failures
Quarterly methodology updates

Discuss a Retainer

Operational Modes

Adaptive Governance Modes

SENDA is not one-size-fits-all. The same five phases apply everywhere, but the governance weight adapts to the project context. Standard mode for regulated environments; Flash mode for high-iteration product work. Same framework, different intensity.

Standard Mode

Full Governance

Flash Mode

Automated Governance

Documentation

Full Context Document — versioned in repo, reviewed before GENERATE.

Micro-briefs embedded in code (inline Markdown). Max 50 lines of scope per task.

Governance

Tiered human review with Trust Score thresholds. Reviewer signs off on every artifact.

Same Trust Score model, but above 90% the human reviews architecture diffs only. Same metrics, lighter touch.

Cycle Time

1–2 week cycles with structured REFLECT phases.

Sub-day cycles. REFLECT is automated via CI metrics dashboard. Human REFLECT monthly.

Failure Response

Metric-driven recalibration in REFLECT.

Adds FCT (Failure to Commit Time) — measures lag between generation and test rejection. High FCT auto-downgrades autonomy. Feeds back into the same DER/GC/ICR metrics.

Ideal For

Regulated industries, infrastructure, compliance-heavy environments.

Product teams, SaaS iteration, MVPs — where governance must exist but cannot slow deployment below daily.

Guardrail Automático

Integration tests and static analysis replace line-by-line review. If code passes all automated gates, it ships to staging. Humans review architecture, not syntax.

Micro-Contexts

Instead of 20-page Context Documents, atomized prompts scope each task to max 50 lines of logic. This keeps agents in L2 (Directed) and prevents hallucination at scale.

FCT Metric

Failure to Commit Time — how long between AI generation and test rejection. High FCT triggers an instant autonomy downgrade without waiting for the REFLECT phase.

Context-as-Code

Generation Correctness

Defect Escape Rate

Autonomy Calibration

Governance-First

Human Ownership

Context Engineer

GOVERN Phase

REFLECT Cycle

ICR

CDS Score

Token ROI

Trust Score

Flash Mode

Context Drift

Recap Loop

Micro-Contexts

FCT Metric

L0 Manual

L4 Autonomous

Tiered Auditing

SENDA Lifecycle

SCOPE Phase

AI Guardrails

Starter Kit

Defect Taxonomy

Context-as-Code

Generation Correctness

Defect Escape Rate

Autonomy Calibration

Governance-First

Human Ownership

Context Engineer

GOVERN Phase

REFLECT Cycle

ICR

CDS Score

Token ROI

Trust Score

Flash Mode

Context Drift

Recap Loop

Micro-Contexts

FCT Metric

L0 Manual

L4 Autonomous

Tiered Auditing

SENDA Lifecycle

SCOPE Phase

AI Guardrails

Starter Kit

Defect Taxonomy

Starter Kit

AI Guardrails

SCOPE Phase

SENDA Lifecycle

Tiered Auditing

L4 Autonomous

L0 Manual

FCT Metric

Micro-Contexts

Recap Loop

Context Drift

Flash Mode

Trust Score

Token ROI

CDS Score

ICR

REFLECT Cycle

GOVERN Phase

Context Engineer

Human Ownership

Governance-First

Autonomy Calibration

Defect Escape Rate

Generation Correctness

Context-as-Code

Defect Taxonomy

Starter Kit

AI Guardrails

SCOPE Phase

SENDA Lifecycle

Tiered Auditing

L4 Autonomous

L0 Manual

FCT Metric

Micro-Contexts

Recap Loop

Context Drift

Flash Mode

Trust Score

Token ROI

CDS Score

ICR

REFLECT Cycle

GOVERN Phase

Context Engineer

Human Ownership

Governance-First

Autonomy Calibration

Defect Escape Rate

Generation Correctness

Context-as-Code

Get Started

Ready to implement SENDA?

Your team is already using AI. The question is whether it is governed or improvised. SENDA gives you the framework to answer that question with confidence.

Start the Conversation Review the Lifecycle