Building Internal AI Tools

Most organizations adopting AI start with whatever tools are available and figure out governance later. We chose a different path. At A-CX, we spent considerable time building internal AI tools from the ground up, and the process forced us to rethink not just what we use, but how we work.

This post covers the decisions we made, the system we built, and what we have learned about where AI genuinely increases productivity and where it does not. If you are a technical leader seeking a blueprint, you will find concrete guidance here. If you are an executive evaluating whether AI tooling is worth the investment, you will find an honest account of what it takes and what you get.

The Control Problem with Public Plugins

The most common starting point for AI tooling is to install public or community plugins. The appeal is obvious: they are ready-made, well-documented, and fast to adopt. The problem is that you cannot verify what is in a public plugin at the point of adoption. More critically, you have no control over what changes in it after an update. Every update is a change you inherit without a review step, a diff, or the ability to roll back.

For a consultancy handling client work, this is not a theoretical concern. The AI tools we use touch conversations that contain client context, commercial information, and strategic thinking. An unverified plugin update that changes outputs or introduces data handling we never agreed to is an unacceptable risk. The decision to build our own tooling was a governance decision first, and a productivity decision second.

The Verified Exception Path

We did not conclude that everything needed to be built from scratch. Some specialized tooling is better sourced than built, and building everything in-house is its own form of inefficiency.

Our answer was a defined verification process. Before any third-party plugin enters the system, it goes through a structured review:

We confirm the source is auditable and reputable.
We review the plugin’s full content before adoption.
We pin the version, so updates require an explicit decision rather than automatic inheritance.

We drew on published best practices in this space before finalizing our criteria.

The result is a system that is largely internal and fully controlled, with a defined, narrow path for verified external additions. That path has gates. Nothing gets through without a review.

AI Plugin verification flow: source audit, content review, version pin, and approval gate for internal AI tooling

Quality Gates and Validation

A system is only as reliable as the discipline behind what goes into it. Most DIY AI tooling efforts focus on building content rather than validating it. That is where quality degrades over time.

We built validation tooling that runs before anything ships. It catches structural errors in skills, enforces content standards, and flags issues that would cause silent failures in production. The validation step runs during the commit process. A skill that fails validation does not reach the live system.

For engineers, this means the same discipline that governs production code governs the AI tooling: a pull request that breaks the build does not merge. For non-technical stakeholders, it means the system has a quality control step that most AI tooling implementations skip entirely.

Security and Confidentiality Posture

AI tools that touch client conversations need explicit confidentiality design, not an afterthought. For A-CX, this was non-negotiable.

We built confidentiality controls into the tooling at the content level. Certain patterns trigger a gate that requires explicit human confirmation before any output leaves a session. This is not a policy document sitting somewhere waiting to be consulted. The check runs automatically, every time, with no bypass.

Beyond automated checks, we defined clear rules about what information can appear in an AI-assisted output and what requires human review. These rules are part of the tooling itself, not separate documentation that team members might or might not read.

Designed for the Whole Team

One of our core design goals was that the tooling should work for everyone on the team, not just the people who built it. Operations staff should be able to use it without understanding how it works. Developers should get what they need without configuration overhead at the start of each project.

We achieved this by treating operations use cases and engineering use cases as distinct surfaces with different distribution models. Operations tools cover work that spans the whole team: meeting capture, document creation, client briefs, proposals, and administrative workflows. These are available to everyone at the organization level. They just work.

Engineering tools cover developer workflows: code review, testing discipline, architecture decision documentation, and security checks. These are distributed at the repository level, scoped to the work that happens in each codebase. A developer in a Java repository has a different active toolset than one working in a C# repository. Neither has to configure anything. The right tools are present when they open the project.

Two-surface AI tooling distribution: operations skills at org level, engineering skills scoped per repository

A System That Improves Itself

The most consequential design decision we made was building a feedback loop from the start. A one-time build degrades. A system with a defined improvement mechanism gets better.

Every project where we use the tooling generates learnings. We built a dedicated skill that captures what worked, what did not, and what surprised us. Those observations feed directly back into the system. When a skill consistently underperforms, that shows up in the learning. When a new pattern proves reliable, it gets codified.

The skill-creation skill itself is a direct product of this loop. It encodes what we have learned about writing effective AI instructions: what makes a skill reliable versus brittle, and what content belongs in a skill versus a reference document. New skills are built against that standard because the standard is maintained inside the system. It evolves as we learn.

Feedback loop for AI tooling: project work, learnings capture, skill updates, better tools

Rethinking the Process, Not Just the Tools

We did not build AI tools for our existing development process. We reviewed the process itself and updated it for what AI makes possible.

Architecture Decision Records are a good example. Documenting architectural decisions is important. It is also consistently skipped under time pressure, which means decisions go undocumented, and the reasoning behind them is lost. We built ADR creation into the engineering tooling so that the record is produced as part of the workflow, not as a follow-up task that competes with the next sprint. Developers do not spend time on it. It happens.

Code review, test coverage analysis, security scanning, and naming consistency checks all benefit from AI assistance that compounds over time. The discipline is higher, and the overhead is lower. Less time on process artifacts. More time on the work that actually requires a human.

AI-era tooling is not a layer you put on top of your current process. It is an opportunity to remove inefficient work and make high-value work faster and more consistent.

Where AI Helps Most and Where It Does Not

We have run this tooling across different parts of our work, and the results are not uniform. That is worth saying directly, because the honest version of this story is more useful than a uniformly positive one.

In frontend development and Java, productivity has increased significantly. These environments have established conventions, large training datasets, and pattern-heavy tasks that AI handles well. The quality of AI assistance in these areas consistently justifies the overhead of maintaining the tooling.

In C++ development within specialized audio frameworks and in infrastructure-as-code work, the picture is more limited. The tooling provides value, but less of it. These environments have narrower training data, tighter precision requirements, and enough domain specificity that AI suggestions need more careful review before use. In these areas, AI is a useful assistant rather than a reliable accelerator.

Knowing where AI helps most lets you invest the tooling effort where it returns the most. It also sets honest expectations for your team, which matters more than any productivity claim.

What This Means for Your Organization

The decisions we made are not unique to A-CX. Any organization adopting AI at the team or company scale is making these same decisions, whether deliberately or not. The plugin you install without a review process is still a decision. The absence of a feedback loop is still a design choice. The gap between your operations tools and your engineering tools either gets addressed or becomes a source of inconsistency over time.

Building this for ourselves taught us what the hard problems actually are, where the standard advice falls short, and what the governance layer needs to look like for a team that uses AI on real client work. That experience is the foundation of how we help other organizations build theirs.

If you are working through these decisions and want to talk to a team that has already made the mistakes, we are straightforward to reach.

Ilpo Niva

Ilpo, Co-Founder and Chief AI Officer of A-CX, is a seasoned product creation executive with over 20 years of experience in innovation, strategy, and technology leadership. With a background at industry leaders like Nokia and Microsoft, Ilpo has a proven track record in product development, rapid prototyping, and operational excellence across global markets. His work emphasizes a forward-thinking approach to customer experience and organizational transformation, highlighting his expertise in driving growth and technological advancement within competitive markets.

CAIO, Co-Founder