Spec lifecycle
Specification-driven development

SDD is the contract layer between intent and code.

CafeKit turns a feature idea into source-backed requirements, design contracts, task packets, verification evidence, and synchronized machine state before implementation is allowed to move.

Spec-driven development loop

CafeKit keeps the assistant moving through persistent contracts instead of relying on chat memory.

01
/hapo:specs

Evidence

Gather codebase and external evidence before requirements or design are finalized.

02
requirements + design

Contract

Lock behavior, constraints, canonical contracts, traceability, and out-of-scope boundaries.

03
tasks/task-R*.md

Task packets

Split work into self-contained packets with related files, criteria, dependencies, and evidence.

04
validate-spec-output.cjs

Readiness

Deterministic validation must pass before ready_for_implementation can be true.

05
/hapo:develop

Develop

Implement one unblocked task after task-aware inspection of real entrypoints and blast radius.

06
/hapo:test + /hapo:code-review

Proof

Run exact evidence commands, prove reachability, and reject scope drift or hidden placeholders.

07
/hapo:sync

Sync

Update task_registry and task markdown only after proof, or audit drift before continuing.

What SDD means in CafeKit#

Specification-driven development in CafeKit is not a long planning ritual. It is a control system for AI coding work.

The assistant is allowed to code only after the work has a durable contract:

  • what problem is in scope
  • what is explicitly out of scope
  • which requirements must be satisfied
  • which design contracts cannot be changed silently
  • which task packet is active now
  • which commands and runtime proof will show the task is done
  • which state files must move after proof exists

This matters because chat memory is soft. A spec is hard. Chat can summarize, forget, rationalize, or drift. specs/<feature>/ is the durable source of truth that the next assistant, reviewer, or teammate can read without trusting prior conversation context.

The lifecycle in one line#

/hapo:specs
  -> validate spec artifacts
  -> /hapo:develop one task packet
  -> /hapo:test + /hapo:code-review
  -> /hapo:sync after proof

This is the main SDD delivery loop. For a vague request, architecture debate, multi-system change, or unclear acceptance criteria, use /hapo:brainstorm as an optional pre-spec step before this loop starts.

Artifact contract#

CafeKit specs are stored in this shape:

specs/<feature>/
├── spec.json
├── requirements.md
├── research.md
├── design.md
├── tasks/
│   ├── task-R0-01-<foundation>.md
│   ├── task-R1-01-<feature-step>.md
│   └── task-R2-01-<feature-step>.md
└── reports/
    └── red-team-report.md
Artifact
Owns
Must contain
spec.json
Machine state
status, current_phase, scope_lock, approvals, task_files, task_registry, readiness flags
requirements.md
Behavior contract
numeric requirement IDs, EARS-style acceptance criteria, constraints, NFRs
research.md
Evidence record
codebase scout, external research or skip rationale, decision, rejected alternatives
design.md
Implementation contract
architecture, canonical contracts, invariants, traceability, risk, test strategy
tasks/task-R*.md
Execution packets
context, constraints, related files, dependencies, steps, completion criteria, evidence
reports/
Review trail
optional research, validation, red-team, and review reports for high-risk specs

Forbidden legacy artifacts:

  • init.json
  • spec-state.json
  • hydration.md

Task hydration may happen in the assistant UI, but task files remain the source of truth. Hydration must not be written as a markdown artifact.

Phase-by-phase flow#

1. Analyze intent#

/hapo:specs <feature-description> first decides whether the request is ready for a spec.

It should route to /hapo:brainstorm when the output is vague, architecture choices are still open, scope boundaries are unclear, acceptance criteria are missing, or the work spans several independent subsystems.

It may warn that a spec is not necessary for a very small one-file change. CafeKit should be strict for meaningful feature work, not ceremonial for trivial edits.

2. Scan existing specs and dependencies#

Before creating new artifacts, CafeKit scans specs/ for incomplete specs and overlapping scope.

The point is to avoid two specs fighting over the same files, contracts, migrations, commands, or runtime surfaces. If dependency exists, the relationship should be recorded in spec.json so implementation order is visible.

3. Assess complexity and lock scope#

CafeKit evaluates the request across intent, implementation hypothesis, gap size, risk, and blast radius.

The output is a scope lock:

Scope lock fieldMeaning
sourceThe original user request or approved design summary.
in_scopeBehavior the spec is allowed to deliver.
out_of_scopeBehavior the assistant must not add.
expansion_policyUsually requires-user-approval.

Once scope is confirmed, the assistant should not silently expand, shrink, or reinterpret it. A needed scope change is a spec update, not an implementation shortcut.

4. Create spec.json#

spec.json is the machine-readable state file. It tracks status, current phase, approvals, scope, design context, task files, task registry, timestamps, and readiness.

Important fields:

FieldWhy it matters
statusCanonical values include in_progress, blocked, done, and archived. New specs should not emit legacy in-progress.
current_phaseShows whether the spec is in init, requirements, design, tasks, develop, test, or review.
scope_lockPrevents hidden scope changes during specs and implementation.
task_filesMust exactly match physical files under tasks/.
task_registryMachine state for each task file.
ready_for_implementationHard gate; true only after approvals, registry sync, validation, and task readiness pass.

5. Write evidence before final requirements#

For non-trivial specs, research.md must exist before requirements, design, or tasks are finalized.

Evidence can be:

  • targeted codebase scout findings
  • official or current external documentation
  • constraints from package/runtime files
  • explicit skip rationale for trivial docs-only or isolated work

The Evidence Summary should record:

  • codebase scout result
  • external research result or skip rationale
  • selected decision
  • rejected alternatives
  • remaining gaps
  • downstream task and test implications

The goal is to prevent requirements from being written from memory when the repo or current upstream docs can answer the question.

6. Write requirements#

requirements.md turns intent into testable behavior. Requirements should use numeric IDs and acceptance criteria precise enough to verify.

Good requirements are:

  • singular
  • unambiguous
  • testable
  • mapped to later tasks
  • honest about non-functional needs such as security, performance, reliability, accessibility, migration, and rollback

If a requirement cannot be tested, it is not ready to become implementation work.

7. Write design contracts#

design.md explains how the feature will be built. It should not just say "implement the feature." It should lock the decisions a later implementation must preserve.

For auth, session, transport, persistence, schemas, generated artifacts, commands, package exports, or runtime-sensitive work, the design must include canonical contracts and invariants. These are the rules /hapo:develop must not silently replace.

Examples:

Contract typeExample of what must be explicit
Runtime entrypointroute, command, worker, provider, hook, API boundary, UI mount.
Persistenceschema, migration, datastore, deletion/retention policy.
Transportrequest/response shape, event contract, queue topic, CLI args.
Integrationwhich prior task output must be imported, mounted, registered, or invoked.
Testingunit, integration, E2E, visual, accessibility, smoke, security, or performance proof.

8. Break into task packets#

CafeKit does not hand /hapo:develop a vague feature. It hands over task packets.

Task files follow this naming convention:

tasks/task-R{N}-{SEQ}-<slug>.md

R0 is for shared foundation work. R1+ are feature clusters. Sequence numbers are two digits.

Each task should be self-contained:

Task sectionRequired role
ContextWhy the task exists and what outcome it targets.
ConstraintsMUST, SHOULD, MUST NOT, scope guardrails.
StepsImplementation checklist with enough code-level detail to act.
RequirementsRequirement IDs and acceptance criteria covered.
Related FilesExact paths when known; otherwise scout before finalizing.
Completion CriteriaObservable checks that prove the task is complete.
EvidenceCommands, artifact proof, runtime proof, negative-path proof, reachability proof.
Risk AssessmentRisks, severity, and mitigations.

Vague task files are invalid. A junior developer or AI coding agent should be able to execute a task without guessing.

9. Build task_registry#

spec.json.task_registry is keyed by task file path.

Status values:

  • pending
  • in_progress
  • blocked
  • done

Each registry entry must include:

  • id
  • title
  • status
  • dependencies
  • blocker
  • started_at
  • completed_at
  • last_updated_at

done is illegal without fresh verification evidence. If proof is missing, the task stays pending, in_progress, or blocked.

10. Validate readiness#

Before implementation, run:

node .claude/scripts/validate-spec-output.cjs specs/<feature>

The validator checks artifact shape, forbidden files, task path naming, required task sections, requirement coverage, task file inventory, task_registry, and readiness consistency.

Scope is locked

scope_lock is an object and scope expansion requires explicit approval.

Evidence exists

research.md has an Evidence Summary or a justified skip rationale.

Tasks are real files

spec.json task_files exactly matches the tasks/ directory.

Registry is synced

task_registry has one complete entry for every task file.

Every task proves done

Completion Criteria and Evidence are specific enough to execute.

Validator passes

node .claude/scripts/validate-spec-output.cjs specs/<feature> exits cleanly.

ready_for_implementation = true is allowed only when the spec is structurally valid, required approvals are complete, task files and registry match disk, validation requirements are satisfied, and tasks contain executable evidence expectations.

How develop uses the spec#

/hapo:develop is not "start coding from the whole idea." It is a task executor.

It has two modes:

ModeCommandBehavior
Specific-task mode/hapo:develop <feature> <task-file>Load exactly one task packet, implement it, verify it, sync it, then stop.
Full-spec mode/hapo:develop <feature>Build a queue from task_registry, select the next pending unblocked task, finish the full loop, then re-read state.

Before coding, develop must extract:

  • objective and constraints
  • related files
  • completion criteria
  • evidence commands
  • requirement IDs
  • named technologies and runtime choices
  • design contracts and invariants
  • prior task outputs that must be consumed

Then it scouts real entrypoints and callers. Runtime-facing code is not done if it is only created as an orphan file. It must be imported, mounted, registered, invoked, or otherwise reachable from the real runtime boundary.

Quality gate after implementation#

A task is complete only when all four proof layers pass:

LayerWhat must be true
Automated verificationCompile/typecheck/build and exact task evidence commands pass.
Spec complianceScoped requirements and design contracts are implemented with no hidden extras.
Code reviewNo blocking correctness, security, architecture, or regression findings.
Task evidenceRuntime/artifact proof, reachability, and negative paths match the task's Evidence section.

NO_TESTS is not a pass. A command that exits 0 but runs zero tests is not a pass. Build success alone is not enough proof for user-facing or runtime-facing work.

Practical SDD routine#

Use this routine for planned feature delivery:

/hapo:specs <approved idea>
node .claude/scripts/validate-spec-output.cjs specs/<feature>
/hapo:develop <feature> <task-file>
/hapo:test <feature>
/hapo:code-review --pending
/hapo:sync <feature> <task-file> done

Use /hapo:brainstorm <idea> before this routine only when the idea is not ready for specs.

For larger specs, prefer one task per implementation session. Smaller diffs make evidence, review, and rollback easier.

When SDD is worth it#

Use the full SDD path when the work changes product behavior, touches multiple files, crosses runtime boundaries, updates auth/data/schema/package contracts, affects users, needs review evidence, or will be handed between people or assistant sessions.

Skip or shorten it when the change is truly trivial: typo, isolated copy edit, one-line config, or emergency fix that belongs in /hapo:hotfix.

The core idea#

CafeKit SDD is a forcing function:

  • intent becomes scope
  • scope becomes requirements
  • requirements become design contracts
  • design becomes task packets
  • task packets become verified code
  • verified code becomes synchronized state

That is the center of CafeKit. The assistant is not successful because it produced code quickly. It is successful when the code matches the approved spec, is reachable in the real system, has evidence, passes review, and leaves state accurate for the next run.