How We Guide Agentic Development, From Product Idea to Merged Code
Agentic development is everywhere right now. Open any feed, and someone is talking about how an agent can write a feature, fix a bug, or ship a whole app in an afternoon. But how many people have actually done it in real life?
The hard part of development has never been generating code. It is ensuring that what gets built is what the client’s business actually needs and what their end users expect, not a literal reading of a request. This is a product management problem, and it is where most agentic development workflows quietly fall apart.
This post is the second in our series about building with AI. The first covered the governed AI tooling we built and why we built it ourselves. This post covers how we run delivery with our AI tools: how an idea becomes a verified plan, how that plan becomes working code, and how we keep humans firmly in control. The short version of our thesis is this. The answer is not a better model. It is a better plan, plus a person who owns the outcome.
The Lesson: Context Is the Bottleneck, Not Capability
When you can build anything quickly, the real challenge is understanding the context of the problem. We have learned this from two directions at once: from coding with AI ourselves, and from working with clients on what they were really trying to achieve.
Modern models are good. When an agent knows what to build and where to find the context, the output is strong and fast. The trouble starts in the gaps. When the instructions are thin, the agent fills the space with assumptions, and those assumptions are rarely the ones the business would have chosen. The devil is in the details, and the details are exactly what a one-line prompt leaves out.
What we’re finding is that the work has moved upstream. The challenge is not in how the agent writes code. It is in how completely we can define what it should build before it writes a single line.
From Client Ask to What the Client Actually Needs
A product manager’s job is not to build what the client asked for. It is to ensure the deliverable meets the client’s needs and their end users’ expectations based on the ask. This distinction carries our whole approach.
We treat the start of every engagement as product management, and we have built agentic AI tooling that makes it a repeatable part of our daily operations rather than an improvised first step. We work with the client to capture the business intent and the end-user outcome behind the request, our agentic AI tooling drafts that into a structured, written plan, and we review and correct it before it goes anywhere. The plan sets out:
- The objectives and the outcome the client is really after
- What is in scope, and what is explicitly out of it
- The design decisions that matter
- A breakdown into epics and stories an agent can act on
The AI does the drafting, but the judgment about whether it is right stays with us. The results are detailed enough for an agent to act on and explicit enough for our team and the client to recognize the intent.
The plan is a living artifact, kept under version control alongside the code so it stays current as the work evolves. It is not a slide deck that ages the moment the meeting ends. Most importantly, we verify it with the client before any code is written. Agreement on what “done” means comes first, not last. Because it is structured, that same plan becomes the single source the rest of our workflow builds on.
The Handoff: A Plan Becomes a Tracked Backlog
From there, the same agentic AI tooling reads the approved plan and creates work items directly in our issue tracker: epics broken into stories, with acceptance criteria, inter-story dependencies, and a starting priority. This is the agentic workflow we built and run every day, not manual glue we reinvent for each project. The backlog lives on the same board the team runs delivery on, so planning and execution share one source of truth instead of drifting into separate documents.
We deliberately keep the language agile. Epics, stories, acceptance criteria, a definition of done. These are not new concepts invented for AI. They are the shared vocabulary engineering teams already use, now carrying enough detail and context that an agent can act on them reliably.
Our agentic AI tooling generates the backlog, but it generates it as a proposal. A human reviews it before anything is executed. Before work starts, we go through the board ourselves:
- Re-scope a story that is too broad, or split and merge items
- Correct a priority or fix a dependency the plan missed
- Assign the work to specific team members
People decide what actually gets built and in what order. Nothing moves from planning to execution without that review.
During delivery, those backlog items, the stories on the board, are what drive the agents. An agent picks up a story with its context attached, does the work, and the item’s state moves with it as the change goes through review. When something surfaces that is not in the plan, a regression, an edge case, or a defect caught in review, it is captured back into the tracker as a new item and followed like any other, rather than fixed silently off the board. This is also where the lesson from the first post pays off: in areas where AI is a reliable accelerator we let an agent carry more of a story, and in areas where it is weaker, such as infrastructure-as-code and other narrow, high-precision work, we scope the items tighter and keep a heavier human hand.
The effect is that the agent is never guessing at scope. It is working on a well-formed story with the background it needs, the same way a well-briefed developer would.
This is also why a comprehensive plan does not mean a giant prompt. The details live in the structure, spread across well-scoped stories, so each unit of work carries exactly the context it needs and nothing it does not.
Human in the Middle, by Design
We do not let agents run end-to-end. A person shapes the work, adjusts the instructions as it proceeds, and owns the gate that confirms the deliverable is what the client needs and what we are willing to stand behind.
This is the same discipline we described in the first post. Validation runs before anything ships, and confidentiality checks run automatically with no bypass. Agentic output enters the same quality gates as any production code. Changes that don’t meet the bar don’t merge.
This review discipline has its own complexities at the developer level, which our engineering team covers in how to code more effectively with coding agents.
The reason we built this on agile rather than a new method is adoption. Developers already work this way, so we improved the process they know step by step instead of asking them to learn a different one, which is what makes the change stick.
The difference between demos and delivery is human ownership. A demo ends when the code runs. A delivery ends when the right thing has been built, reviewed, and shipped.
How the Team Has Changed
The shape of a delivery team used to be fairly fixed. A product manager working with the client, a project manager driving the development team, a senior architect owning technical direction, and two to five developers writing the code.
That shape has not been replaced. It has matured. The same roles are present, but the boundaries move up. Developers take on more product-manager thinking because writing a good story and judging whether output meets intent is now part of the job. Product managers take on more of the project-management ownership. Finally, project managers lead delivery, coordinate the work, and keep the engagement on track.
The senior architect matters as much as ever, and arguably more. Someone has to ensure that the plans and designs reflect the right technical choices, not in the abstract, but against the client’s existing stack, constraints, and direction of travel. An agent will happily propose a clean solution that ignores the system it has to live in. The judgment stays human.
The pattern across all of this is the same. The roles that are growing in value are the ones facing the client and owning judgment. The work that compresses is the mechanical middle. A smaller, more senior team now covers ground that used to need a larger one.
Why This Is Hard to Fake
It is easy to write about how agentic development should work. The feeds are full of it. It is much harder to point to a way of working that has survived contact with real engagements.
We have built the tooling, ran it on live client work, found where it broke, and fixed both the tools and the process. The planning discipline, the backlog structure, the human gates, and the architecture review all exist because we needed them, not because they made a good post. This is not how we think it should be done, it is how we do it today.
The Best Way Today, and We Keep Moving
For a CTO or CIO, this is a repeatable way to compress delivery without losing control of quality or direction. For any executive weighing the investment, the takeaway is simpler. AI changes the economics of building software, but only when the plan is right and a human owns the judgment.
We are also clear-eyed about one thing. This is the best way to run agentic development with today’s models and today’s capabilities, but these are improving constantly. As we covered in the first post, our system improves through a learnings loop, so as the models advance, we update our tooling and our process to match. We do not expect this to be the final shape of the work. We expect to keep moving, and to keep using the best approach available.
If you are working through these decisions and want to talk to a team that already runs this in production, we are straightforward to reach.