A Governance-First Methodology by Sturion AI

SENDA

Software Engineering with Natively Directed Agents

A complete framework for software teams operating with AI agents — covering roles, lifecycle phases, autonomy levels, governance metrics, and delivery standards.

5

Lifecycle Phases

5

Autonomy Levels

5

Defined Roles

5

Governance Metrics

The Challenge

The questions existing methodologies can't answer

Scrum, SAFe, and Kanban were designed for human-paced development. When AI agents enter the loop, every assumption breaks. Teams are left improvising governance in production.

Who reviews AI-generated code?

Existing methodologies assume developers write every line. When 70% of a sprint's output is AI-generated, code review becomes a bottleneck that no one is formally accountable for.

How do you measure quality when speed is infinite?

Velocity metrics collapse when an agent can generate 2,000 lines in an hour. Story points, burn-down charts, and throughput KPIs all become meaningless without a new contract with reality.

Who owns the technical debt AI introduces?

AI agents optimize for the immediate task. They hallucinate dependencies, ignore long-term architecture, and produce code that passes tests but violates implicit constraints no prompt ever captured.

Where do you draw the boundary between AI autonomy and human control?

Without a formal autonomy framework, teams either under-leverage AI (defaulting to manual everything) or over-trust it (shipping unreviewed output). Both failure modes are expensive.

First Principles

The core philosophy

SENDA is built on three axioms that do not change regardless of which AI tools, which models, or which delivery framework you operate within.

01

Humans decide, agents execute.

Every architectural decision, every design trade-off, every acceptance of technical risk — these remain human responsibilities. AI agents are execution engines, not decision-makers. The moment you confuse the two, you have lost governance.

02

Every line of AI-generated code has a human owner.

Ownership is not the same as authorship. A developer who reviews, approves, and commits an AI-generated function owns that function with the same accountability as if they wrote it. There is no 'the AI did it' defense.

03

Speed is not productivity. Shipped quality is productivity.

AI agents can generate code faster than any human team. That speed is worthless — and actively dangerous — if it ships defects, accumulates unreviewed debt, or outpaces the team's ability to understand what it has built.

The Methodology

The SENDA Lifecycle

Five sequential phases, each with explicit entry conditions, defined outputs, and named human owners. Click any phase to explore it.

SENDALIFECYCLE01SCOPE02GENERATE03GOVERN04VALIDATE05REFLECT
Phase 1 / 5

SCOPE

Replaces Sprint Planning

Define what the team is building, what the AI is allowed to generate, and what the boundaries of acceptable output are. SCOPE produces a Context Document that lives in the repository as code (/docs/context/*.md) — not a static PDF. It is version-controlled, injectable into agent system prompts, and triggers automatic alerts when updated. Without a Context Document, agents generate plausible code for the wrong problem.

Replaces
Sprint Planning
Governance Framework

The Autonomy Spectrum

SENDA defines five levels of AI autonomy (L0–L4). Each level specifies what the AI does, what the human does, and the conditions under which that level is appropriate. Teams do not operate at a single level — they calibrate per task type.

L0

Manual

AI does

Nothing. AI is not used.

Human does

Everything.

When to use

Security-critical code, compliance boundaries, or areas where the team has no AI context established.

L1

Assisted

AI does

Autocomplete, snippet suggestion, documentation drafting.

Human does

Reads, selects, accepts or rejects every AI suggestion. Writes all structural code.

When to use

Core business logic, any code touching money, auth, or data persistence.

L2

Directed

AI does

Generates complete functions, tests, and documentation from structured prompts.

Human does

Writes the prompt, reviews all output, owns the commit. Intervenes on any deviation from Context Document.

When to use

Standard feature development, CRUD operations, integration code, test suite generation.

L3

Supervised

AI does

Generates multi-file implementations from high-level task descriptions. Selects libraries, patterns, and structure within approved constraints.

Human does

Defines the task boundary. Reviews diff. Approves or rejects the complete implementation as a unit.

When to use

Scaffolding, boilerplate-heavy work, test fixtures, documentation systems. Only in areas with well-established Context Documents.

L4

Autonomous

AI does

Executes multi-step workflows end-to-end, selects from pre-approved architectural patterns, and ships to staging without per-commit review. Does not make novel architectural decisions — only applies patterns the Architect has pre-authorized.

Human does

Pre-defines the approved pattern library and guardrails. Reviews aggregate output at phase boundaries. Retains veto authority on any deployment. Handles escalations.

When to use

Mature codebases with high test coverage, well-documented architecture, established SENDA governance history, and Trust Score above 90%. The Architect must explicitly authorize L4 per task scope. Not for greenfield work.

Team Structure

SENDA Roles

Every SENDA team has five defined roles with explicit boundaries. In small teams, individuals may hold multiple roles — but every responsibility must have a named owner. Unowned responsibilities are how governance fails.

Strategy Layer

Architect

Approves and owns the Context Document — accountable for its accuracy and completeness. Defines architectural guardrails, sets autonomy levels per task type, and designs multi-agent orchestration flows when applicable. The Architect does not write the Context Document; they review, challenge, and sign off on it.

Prompt Layer

Context Engineer

Drafts the Context Document by translating business requirements into structured, machine-usable briefs. The Context Engineer writes; the Architect approves. This separation ensures the person closest to the problem defines the scope, while the person with architectural authority validates it. The quality of this handoff determines everything downstream.

Governance Layer

Reviewer

Executes the GOVERN phase through a tiered auditing model. An AI Auditor Agent performs the first-pass review, generating a Trust Score. When the Trust Score exceeds 90% and all automated tests pass, the human Reviewer performs a spot-check. Below that threshold, the Reviewer conducts a full line-by-line review. This scales governance without sacrificing quality.

Execution Layer

Builder

Directs AI agents during the GENERATE phase. Writes and iterates on prompts at the task level, monitors output quality, and escalates blockers. The Builder's primary metric is Generation Correctness — not lines produced.

Validation Layer

Product Owner

Participates in SCOPE and VALIDATE. Owns the acceptance criteria that agents are measured against. Ensures the team is building the right thing, not just building things correctly. Approves Context Documents before GENERATE begins.

Measurement

Primary Governance Metrics

SENDA teams track five primary metrics — including economic efficiency. All targets shown are calibration baselines, not fixed rules. Teams establish their own thresholds during the first three REFLECT cycles, then tighten them as governance matures.

DER

Defect Escape Rate

Target

< 5%

Formula

Defects found post-GOVERNTotal AI-generated artifacts reviewed
×100

Measures the quality of the GOVERN phase. SENDA defines three defect classes: P1 (functional breakage or security vulnerability), P2 (architectural violation or implicit constraint breach), and P3 (style deviation or naming inconsistency). Only P1 and P2 count toward DER — P3 issues are tracked separately. The 5% baseline comes from industry-standard defect containment rates; teams calibrate their own threshold during the first three REFLECT cycles.

GC

Generation Correctness

Target

> 70%

Formula

Artifacts accepted at commitTotal artifacts submitted to GOVERN

Measures prompt quality and agent calibration. An artifact is "accepted" if it passes GOVERN without P1/P2 changes — cosmetic naming or formatting tweaks (P3) do not count as rejection. Pre-commit prompt iterations are not counted; only the artifact submitted to the Reviewer enters the formula. The 70% baseline is a starting point — teams should track their trend across cycles and tighten the target as Context Documents mature.

ICR

Intervention Capture Rate

Target

> 95%

Formula

Interventions caught in GOVERNTotal interventions (including post-ship)

The single most important SENDA metric. It measures whether the governance process is actually catching problems before they reach production. A low ICR means governance is theater, not control. This metric is non-negotiable — it is the one number that tells you if SENDA is working or failing.

CDS

Context Document Score

Target

> 80 / 100

Formula

Specificity30%
+Completeness25%
+Constraints25%
+Validation20%

A structured, rubric-based assessment of Context Document quality. Each dimension is scored 0–100 using a checklist: Specificity (are requirements unambiguous?), Completeness (are edge cases addressed?), Constraint Coverage (are boundaries explicit?), Validation Coverage (are acceptance criteria testable?). Scored by the Reviewer during REFLECT using a standardized rubric — not subjective opinion.

TROI

Token ROI

Target

> 1.5

Formula

(Baseline Hours×$/hr)Token Cost
Token Cost

Measures the economic efficiency of AI agent usage. "Baseline Human Hours" is not a guess — it is derived from the team's historical velocity on comparable tasks before AI adoption (or from industry benchmarks for standard task types like CRUD, test generation, or documentation). Teams without historical data use conservative multipliers (2× for scaffolding, 1.5× for logic, 1× for novel architecture). Calibrate quarterly.

Start Here

SENDA Starter Kit

You don't need a consultant to start. Adopt these three practices in one week and you'll have more governance than 90% of teams using AI today. When you're ready to scale, we're here.

01

Day 1–2

Start with the Context Document

Before your next task, write a one-page brief: what you're building, what the AI is allowed to touch, what the acceptance criteria are. Use a simple template in your repo (/docs/context/). Don't optimize the format — just start writing constraints down. This alone eliminates the majority of AI hallucination problems.

You stop generating code for the wrong problem.

02

Day 3–4

Label your autonomy levels

Tag each task type in your backlog with an autonomy level: L1 for auth/payments, L2 for standard features, L3 for scaffolding. Don't use L0 or L4 yet — start in the middle. The act of labeling forces a conversation about where AI should and shouldn't operate. That conversation is the governance.

Your team has explicit, shared rules about AI boundaries.

03

Week 2

Track one metric: ICR

After one week of labeled work, count: how many issues did your review process catch before shipping, vs. how many escaped to production? That ratio is your Intervention Capture Rate. If it's below 90%, your review process has gaps. You don't need dashboards — a spreadsheet works. The goal is to make governance measurable, not perfect.

You have a number that tells you if governance is working.

Work with Sturion

Service Tiers

Three engagement models designed to meet teams at their current maturity level and deliver SENDA governance in a way that sticks.

Assessment

Understand where you stand.

Engagement/2 weeks
  • Current workflow and tooling audit
  • Autonomy level calibration workshop
  • Governance gap analysis
  • SENDA readiness scorecard
  • Recommended implementation roadmap
Start with an Assessment
Most Popular

Implementation

Stand up SENDA in your team.

Engagement/6–10 weeks
  • Full SENDA lifecycle setup
  • Context Document templates and training
  • Role assignment and RACI definition
  • Metric tracking infrastructure
  • Three complete SENDA cycles with embedded support
  • Reviewer and Context Engineer training
Implement SENDA

Managed Governance

Continuous oversight and optimization.

Retainer/Ongoing
  • Monthly REFLECT facilitation
  • Metric review and autonomy recalibration
  • Context Document quality reviews
  • Incident response for governance failures
  • Quarterly methodology updates
Discuss a Retainer
Operational Modes

Adaptive Governance Modes

SENDA is not one-size-fits-all. The same five phases apply everywhere, but the governance weight adapts to the project context. Standard mode for regulated environments; Flash mode for high-iteration product work. Same framework, different intensity.

Standard Mode

Full Governance

Flash Mode

Automated Governance

Documentation

Full Context Document — versioned in repo, reviewed before GENERATE.

Micro-briefs embedded in code (inline Markdown). Max 50 lines of scope per task.

Governance

Tiered human review with Trust Score thresholds. Reviewer signs off on every artifact.

Same Trust Score model, but above 90% the human reviews architecture diffs only. Same metrics, lighter touch.

Cycle Time

1–2 week cycles with structured REFLECT phases.

Sub-day cycles. REFLECT is automated via CI metrics dashboard. Human REFLECT monthly.

Failure Response

Metric-driven recalibration in REFLECT.

Adds FCT (Failure to Commit Time) — measures lag between generation and test rejection. High FCT auto-downgrades autonomy. Feeds back into the same DER/GC/ICR metrics.

Ideal For

Regulated industries, infrastructure, compliance-heavy environments.

Product teams, SaaS iteration, MVPs — where governance must exist but cannot slow deployment below daily.

Guardrail Automático

Integration tests and static analysis replace line-by-line review. If code passes all automated gates, it ships to staging. Humans review architecture, not syntax.

Micro-Contexts

Instead of 20-page Context Documents, atomized prompts scope each task to max 50 lines of logic. This keeps agents in L2 (Directed) and prevents hallucination at scale.

FCT Metric

Failure to Commit Time — how long between AI generation and test rejection. High FCT triggers an instant autonomy downgrade without waiting for the REFLECT phase.

Context-as-Code
Generation Correctness
Defect Escape Rate
Autonomy Calibration
Governance-First
Human Ownership
Context Engineer
GOVERN Phase
REFLECT Cycle
ICR
CDS Score
Token ROI
Trust Score
Flash Mode
Context Drift
Recap Loop
Micro-Contexts
FCT Metric
L0 Manual
L4 Autonomous
Tiered Auditing
SENDA Lifecycle
SCOPE Phase
AI Guardrails
Starter Kit
Defect Taxonomy
Context-as-Code
Generation Correctness
Defect Escape Rate
Autonomy Calibration
Governance-First
Human Ownership
Context Engineer
GOVERN Phase
REFLECT Cycle
ICR
CDS Score
Token ROI
Trust Score
Flash Mode
Context Drift
Recap Loop
Micro-Contexts
FCT Metric
L0 Manual
L4 Autonomous
Tiered Auditing
SENDA Lifecycle
SCOPE Phase
AI Guardrails
Starter Kit
Defect Taxonomy
Defect Taxonomy
Starter Kit
AI Guardrails
SCOPE Phase
SENDA Lifecycle
Tiered Auditing
L4 Autonomous
L0 Manual
FCT Metric
Micro-Contexts
Recap Loop
Context Drift
Flash Mode
Trust Score
Token ROI
CDS Score
ICR
REFLECT Cycle
GOVERN Phase
Context Engineer
Human Ownership
Governance-First
Autonomy Calibration
Defect Escape Rate
Generation Correctness
Context-as-Code
Defect Taxonomy
Starter Kit
AI Guardrails
SCOPE Phase
SENDA Lifecycle
Tiered Auditing
L4 Autonomous
L0 Manual
FCT Metric
Micro-Contexts
Recap Loop
Context Drift
Flash Mode
Trust Score
Token ROI
CDS Score
ICR
REFLECT Cycle
GOVERN Phase
Context Engineer
Human Ownership
Governance-First
Autonomy Calibration
Defect Escape Rate
Generation Correctness
Context-as-Code
Get Started

Ready to implement SENDA?

Your team is already using AI. The question is whether it is governed or improvised. SENDA gives you the framework to answer that question with confidence.