Agentic CodingGuide

Testing & QA

Verification feedback — unit tests, E2E tests, and agentic QA.

Tests are the agent's way of proving its own work. Without them, you're the verification step — and the loop only moves as fast as you do.

Test behavior, not implementation

LLM-generated tests tend to mirror the implementation — testing what the code does, not what the product requires. This produces tests that pass when they should fail.

  • Write tests that describe outcomes from the user or system's perspective
  • Avoid asserting on internal state or implementation details — if behavior didn't change, tests shouldn't break
  • Ask the LLM "What would a QA engineer test here?" rather than "Write tests for this function"
  • Describe the contract (given these inputs, these outputs are guaranteed) and test that

Unit tests

Fast, isolated, the first feedback signal that runs in CI.

  • Test pure functions aggressively — easy to test and agents tend to generate them
  • Mock at the boundary — mock external services, not internal functions
  • One assertion per test — tests that assert many things are hard to diagnose when they fail
  • Test edge cases explicitly — agents often forget null, empty, and boundary cases
  • Name tests as specificationsit("returns empty array when user has no orders") not it("works")

Coverage: high on business logic, lower on glue code. Use mutation testing (Stryker, mutmut) to verify tests actually catch bugs.

E2E tests

Slower and more brittle, but catch a different class of bugs — full system integration.

  • Test critical paths only in CI — login, checkout, core workflows
  • Test from the user's perspective — click buttons, fill forms, read outcomes
  • Stable selectors — use data-testid, not CSS classes or XPath
  • Idempotent setup — each test sets up and tears down its own state

Tools: Playwright (recommended), Cypress, Selenium

Agentic QA

Use an agent to QA features the way a human tester would — browse the UI, interact with it, check for regressions, inspect network traffic, and report findings.

Tools:

  • agent-browser CLI — headless browser the agent controls directly
  • Playwright MCP — full browser automation via MCP tool calls
  • Chrome DevTools MCP — console errors, network requests, performance profiles, DOM state
  • Maestro (mobile / React Native) — declarative YAML flows at the native view-hierarchy level; agents can author and iterate on flows directly

An agent QA workflow: open the feature, walk through the user flow, take screenshots at key steps, check console for errors, inspect network requests, report findings with specific failure details.

Where this beats traditional E2E scripts:

  • No brittle selectors — the agent finds elements by label or visual context
  • Exploratory testing — give the agent a feature description and let it try to break it
  • Natural language assertions — "confirm the total updates when the quantity changes" is the test
  • Debugging included — when something fails, the agent explains why

Agentic QA is the difference between "the tests pass" and "a human-equivalent tester tried to use it and here's what they found."

On this page