The Cookbook · 05
Product & Engineering
Agent-driven feature development, bug triage and fix loops, QA, and release notes that write themselves.
If you're a non-engineer reading this, don't bounce. This section is exactly as much for you as for the founder with a CTO. Agents have collapsed the gap between "I have an idea" and "the code is shipping." Whether you write code or you don't, the same loops apply — you just toggle how much approval you sit in. The thesis: most engineering work is not the creative leap. It's the scaffolding, the tests, the bug repro, the release notes, the PR review, the spec-to-branch translation. An agent does all of that, and a good engineer (or a non-engineer with an agent) becomes a force multiplier. A few principles up front: The agent writes code. You own the merge. Approval gates on anything touching production. Always. Spec before code. Vibe-coding without a spec gets you to a working demo and a maintenance nightmare. Spec gets you to working and shippable. Tests are the agent's safety net, not yours. Make it generate them. Make them run on every PR. Trust the green. Branch hygiene matters. One PR per logical change. The agent will happily ship a 47file PR — don't let it. If you don't write code at all: the way you "earn out" on engineering agents is by reading the PR description, sanity-checking the demo, and clicking merge. You're the product manager. The agent is the IC.
1. Agent-Driven Feature Development
Tip 1.1 — Vibe-code loop: spec → branch → PR → demo
What it does: You describe what you want in plain English. The agent asks clarifying questions until it has a real spec. Then it scaffolds a branch, writes the code, writes the tests, opens a PR, deploys to a preview environment, and sends you a Loom of the demo. You approve and merge. Why it wins: This is the actual workflow that lets non-engineers ship product. The agent owns the part that requires fluency in your codebase. You own the part that requires judgment about what's worth building.
Tools: A git host (GitHub), your codebase, a preview deploy environment (Vercel, Netlify, Fly.io), a CI runner, optionally a screen-recording tool for the demo Loom. How to wire it: 1. The agent has read-write access to a fork or branch of your repo. Never main directly. 2. You start a session: "I want X." The agent runs a planning loop: clarifying questions until it has the spec. 3. Spec gets written to /specs/
Tip 1.2 — PR review automation
What it does: Every PR (yours, the agent's, a teammate's) gets reviewed by a second agent before a human eye touches it. The reviewer checks: tests cover changes, no obvious bugs,
no secrets leaked, lint passes, the PR description matches what the code does, the changes match the spec. Why it wins: PR reviews are the bottleneck in any team bigger than one. A first-pass agent reviewer catches 60-70% of the issues a human would, in 30 seconds instead of 30 minutes. Humans only see PRs that are already clean. Tools: GitHub Actions or a similar CI, an LLM call per PR (Claude, GPT, etc.), your repo's lint and test config. How to wire it: 1. GitHub Action on pull_request : triggers the review agent. 2. The agent loads the diff, the spec (if linked), the PR description, and a checklist. 3. It comments on the PR with findings: ✅ for clean, 🟡 for nitpicks, 🔴 for blockers. 4. Blockers prevent merge until addressed. You override with a label if you disagree. 5. The agent re-runs on push so iterative fixes get instant re-review. Example prompt to your agent: Set up a GitHub Action that on every pull_request to main: loads the diff, the linked spec, the PR description, and runs a review pass. Check: do tests cover the changes, any obvious bugs or perf issues, any secrets in diffs, lint pass, description matches code, code matches spec. Comment on the PR with a checklist. ✅ clean, 🟡 nits, 🔴 blockers. Blockers should fail the CI gate. Re-run on push. Watch out for: Reviewer agents are sycophantic by default. Prompt it explicitly: "find what's wrong. If it looks clean to you, look harder." Don't let the reviewer agent commit fixes. Comments only. Mixing review and write authority leads to circular nonsense. Calibrate the rubric by feeding it 10 PRs you've reviewed and seeing if it matches your calls. Skill file: security-audit + a simplify review pattern. The simplify skill is pattern-only — tell your agent to write one from this recipe (heuristics: dead code, duplicate utilities, overabstracted layers, premature config).
Tip 1.3 — Spec-to-PR scaffolder
What it does: You drop a spec doc into a folder or Notion page. The agent reads it, plans the implementation, breaks it into 2-4 small PRs (not one giant one), and opens them in dependency order with checkboxes for each.
Why it wins: "Build this feature" with a 2-page spec usually produces one 47-file PR nobody can review. Forcing the agent to split the work into reviewable chunks is how shipping stays fast at any team size. Tools: A spec inbox (folder, Notion DB, Linear), GitHub, your branching convention. How to wire it: 1. Specs live in /specs/inbox/ . Each spec has: problem, proposed solution, acceptance criteria. 2. The agent watches that folder. New spec → run the breakdown: split into 2-4 ordered PRs. 3. It opens the first PR. When that merges, it auto-opens the next. 4. Each PR links back to the spec and to the sibling PRs. 5. Spec moves to /specs/in-flight/ and finally /specs/shipped/ . Example prompt to your agent: Watch /specs/inbox/ . For any new spec file, read it and propose a breakdown into 24 small PRs in dependency order. Send me the proposed breakdown for approval before any code. On approval, open the first PR (feature branch off main, implementation, tests, description with spec link). When PR 1 merges, automatically open PR 2 from the next branch. Repeat. Move the spec file through inbox/ → inflight/ → shipped/ . Watch out for: Some specs are genuinely one PR. Don't let the agent force a split where it doesn't help. Sibling PRs that depend on each other need careful base-branch management. Use stacked PRs if your platform supports them (Graphite, gh CLI stacks). Acceptance criteria in the spec is what the agent tests against. If criteria are vague, the implementation will be vague. Skill file: project-planning, scope-analyzer
2. Bug Triage & Fix Loop
Bugs are the most agent-shaped problem in engineering. The flow is: user reports → reproduce → diagnose → fix → test → ship. Each step is bounded and verifiable. Agents eat this for breakfast.
Tip 2.1 — Bug triage from user reports
What it does: A user reports a bug — by Slack, email, Discord, support ticket, anywhere. The agent classifies it (severity, area, likely root cause), tries to reproduce it locally, attaches the repro steps and stack trace to a GitHub issue, and assigns priority. Why it wins: Most teams' bug intake is chaos. Reports come in five places, half don't have repro steps, the team rediscovers the same bug three times because nobody centralized. The agent imposes structure for free. Tools: Your support channels (email, Discord/Slack webhooks, support tool API), your test runner / local repro env, GitHub Issues API. How to wire it: 1. Per channel, the agent listens for new bug reports. 2. For each report, it classifies: severity (1-4), area (auth, payments, UI, etc.), likely root cause guess (1 sentence). 3. It searches your existing issues for duplicates. If duplicate, links and pings the original reporter. 4. If new, it tries to reproduce in a sandbox. Success → attaches repro steps + stack trace. 5. Opens a GitHub issue with all of the above, labels it, and pings whoever owns that area. Example prompt to your agent: Listen for new bug reports in Discord #bug-reports, support@email, and the help widget API. For each: classify severity (1-4) and area, search existing GitHub issues for duplicates (similarity threshold 0.8), and if novel, attempt to reproduce in a sandbox using the repro steps in the report. If reproduction succeeds, capture the stack trace and the exact steps. Open a GitHub issue with: title, severity, area, reporter, original report, repro steps, stack trace, your root-cause guess. Label and assign per bugrouting.json . Watch out for: Don't let the agent close duplicates without human review. Sometimes "looks like a dupe" isn't. Reproduction in a sandbox needs sample data. If the bug is data-specific, the agent should request anonymized data from the user with a one-click upload. Don't accept user-reported severity at face value. Users always say critical. Skill file: security-audit (issue-creation pattern)
Tip 2.2 — Auto-repro and fix draft
What it does: For any issue tagged agent-fix-candidate , the agent reads the issue, reproduces locally, writes a fix on a branch, writes a regression test, runs the full test suite, and opens a PR. Why it wins: Most bugs are small. A typo, an off-by-one, a missing null check. Humans spend 20 minutes per bug on the boilerplate (reproduce, fix, test, PR). The agent does it in 60 seconds and humans only review. Tools: Your repo, your test suite, GitHub Actions / your CI. How to wire it: 1. Tag agent-fix-candidate on small, bounded issues. 2. The agent picks up the tag, reads the issue and the related code area. 3. Repro: runs the failing scenario. 4. Drafts the fix. Tests it locally. 5. Adds a regression test. Re-runs the suite. 6. Opens a PR linked to the issue. Example prompt to your agent: Watch GitHub Issues for agent-fix-candidate label. For each tagged issue: read the issue, reproduce the bug locally, draft a minimal fix on a new branch, add a regression test, run the full test suite. If green, open a PR linking to the issue with a description: bug, root cause, fix approach, tests added. If the suite fails or you can't repro, comment on the issue with what you tried and remove the label. Watch out for: The agent will sometimes "fix" symptoms instead of root cause. Force it to write the regression test first, fail it, then fix until it passes. Don't tag systemic bugs agent-fix-candidate . Those need a human architect. A green test suite doesn't mean the fix is right. Always demo the fix on the preview deploy. Skill file: security-audit. A simplify companion is pattern-only — tell your agent to write one from this recipe.
3. QA Loop
Tip 3.1 — Auto-generate test cases from changed code
What it does: On every PR, the agent reads the diff and generates test cases for the changed code. Unit tests for changed functions, integration tests for changed flows, edge
cases the human probably forgot. Why it wins: Coverage on greenfield code is fine. Coverage on changed-but-existing code is where bugs hide. Auto-generating tests on the diff means the test suite grows with the code, not behind it. Tools: Your test framework, the PR diff, an LLM, your CI. How to wire it: 1. GitHub Action on pull_request : agent loads the diff and the test files. 2. For each changed function/component, agent proposes 3-5 test cases: happy path, edge cases, error cases. 3. Drafts the tests in the right test file. Runs them. If any fail, that's interesting — surface to the PR comments. 4. Author can accept, modify, or dismiss the generated tests. Example prompt to your agent: On every PR, read the diff. For each changed function or component, propose 3-5 test cases covering happy path, edge cases, and error cases. Draft the tests in the appropriate test file in the same PR. Run them. If any pass-on-buggy-code or fail-oncorrect-code, flag in PR comments. The author can accept/modify/dismiss any test. Watch out for: Don't generate tests for trivial code. A getter doesn't need a test. Generated tests can over-fit to current implementation. Force "test behavior, not implementation." Coverage % can be gamed. Look at branch coverage and mutation testing if you care about quality. Skill file: security-audit. A simplify companion is pattern-only — tell your agent to write one from this recipe.
Tip 3.2 — Browser-driven E2E tests via agent-browser
What it does: The agent uses agent-browser (headed Playwright/Chromium) to drive your app like a user, exploring critical flows on every deploy and flagging anything broken. Real browser, real UI, real assertions. Why it wins: Unit tests pass while the production UI is broken. End-to-end browser tests catch the gap. Writing them by hand is tedious — driving them with an agent that can read the UI and decide what to click is dramatically faster. Tools: agent-browser , your preview deploy URL, a list of critical user flows.
How to wire it: 1. Define your critical flows in flows.md : per flow, the goal, the start URL, the expected end state. 2. On every preview deploy, the agent runs each flow via agent-browser : open URL, snapshot interactive elements, click through, assert. 3. Failure → screenshot + stack trace into the PR comments. 4. Optional: have the agent propose new flows after major releases by reading the changelog. Example prompt to your agent: On every preview deploy, run flows.md against the preview URL using agent-browser. Per flow: open URL, snapshot, walk through the steps, assert the end state matches. On failure: capture screenshots and DOM state, attach to PR comments, fail the deploy check. After each merge to main, re-run on production and ping me if anything fails (could be data-specific). Watch out for: Flaky tests destroy trust fast. Build in retries and clear failure-modes. Test data has to be predictable. Don't run E2E on a database where data changes midtest. Browser tests are slow. Parallelize across flows. Skill file: agent-browser, _auto-headed-browser
4. Release Notes & Communication
Tip 4.1 — Auto-draft release notes from merged PRs
What it does: At the end of each sprint (or daily, weekly, whatever cadence), the agent reads every merged PR since the last release, groups them by theme (features, fixes, perf, refactor), and drafts release notes in two flavors: external customer-facing, and internal team changelog. Why it wins: Release notes are the highest-leverage doc nobody writes. Customers want them, marketing wants them, support wants them. The agent assembles them for free. Tools: GitHub API for merged PRs, your release notes destination (changelog page, customer email, Notion). How to wire it: 1. Cron weekly Friday afternoon. Agent pulls every PR merged since last release date.
- Groups by labels ( feature , bug , perf , chore ). 3. For each, drafts a one-line customer-readable description (skip internal refactors). 4. Assembles two drafts: external (marketing tone, highlights only) and internal (full changelog). 5. Stages both for review. On approval: publish to changelog page, email customers, post in Slack. Example prompt to your agent: Every Friday at 3pm, generate release notes for the week. Pull every PR merged to main since last Friday. Group by labels. For each customer-facing change, draft a one-line description in marketing tone. Skip internal refactors. Build two artifacts: external changelog (features + notable fixes, no internal noise) and internal changelog (everything). Stage at /release-notes/YYYY-MM-DD/ . Ping me Friday 4pm. On approval, publish to the changelog page and Slack #releases. Watch out for: PR descriptions are usually engineer-speak. The agent has to translate. Force "explain like the customer never reads code." Don't ship release notes that promote experimental flags as features. Filter on released label, not merged . Coordinating release notes with the marketing/content stack means the customer email and the LinkedIn post are aligned. Cross-link to the marketing section's content engine. Skill file: content-engine, email-followups
Tip 4.2 — Status page / postmortem drafter
What it does: When an incident happens (CI broken, prod down, deploy failed), the agent drafts the status page update and, after resolution, the postmortem doc. You edit and publish. Why it wins: During an incident is the worst time to write good prose. The agent has the templates and the facts (logs, timeline, who touched what) and produces a clean draft so the humans can focus on the fix. Tools: Your monitoring (Sentry, Datadog, custom), your status page (Statuspage, Instatus, or a custom MDX page), your git history. How to wire it: 1. Monitor triggers fire to the agent.
2. Agent confirms severity (read recent alerts, count user reports), drafts a status page
update with severity tag and a clean one-paragraph customer-facing description. 3. On resolution, the agent assembles the postmortem template: timeline, root cause, what we fixed, what we'll prevent. 4. You polish, publish. Example prompt to your agent: When Datadog fires a severity: critical or severity: high alert: read the alert, the last 30 min of logs, and any new bug reports in the last 15 min. Draft a status page update at the matching severity with a customer-readable description. Ping me on Telegram with the draft and a one-tap publish button. On resolution (alert clears for 30+ min), assemble the postmortem at postmortems/YYYY-MM-DD/ using the template: timeline, root cause, immediate fix, follow-up actions. Watch out for: Don't auto-publish status pages. The wrong update is worse than a late update. Postmortems should be blameless. Have the agent strip out names of specific engineers. Keep customer-facing language concrete: "checkout was failing for new EU customers" beats "service degradation." Skill file: security-audit, email-followups
How it all stacks
The whole engineering pipeline is one loop: idea → spec → branch → PR → tests → merge → release → bugs → back to start. Agents instrument every stage, and the multiplier compounds: a feature shipped today comes with its own tests, gets reviewed in 60 seconds, gets a release note auto-drafted on Friday, and any bug that comes in next week gets repro'd and fix-drafted before a human reads it. Install order:
PR review automation (Tip 1.2). Lowest-risk install, immediate quality lift on every PR.
Auto-test generation (Tip 3.1). Pairs with PR review. Coverage starts growing the day you turn it on.
Bug triage (Tip 2.1). The day inbound bug volume gets messy, install this.
Vibe-code loop (Tip 1.1). Once the safety net (review + tests) is in, let the agent ship features.
Release notes (Tip 4.1). Cheap to install, makes everyone downstream love you.
Auto-repro + fix (Tip 2.2). Once the agent has demonstrated it doesn't break things, give it small bugs to fix end-to-end.
E2E browser tests (Tip 3.2). Higher setup cost, very high return on a maturing product.
Postmortems (Tip 4.2). Install before you need it. If you're a non-engineer founder: install 1.2, 2.1, 4.1 first. Those work without you needing to read code. Add 1.1 when you have a clear feature you want shipped and you trust the agent to scaffold the safety rails.
Finance
Newsletter
Get the next one in your inbox.
One short email a week. Operator takes on AI agents, no hype.