Skip to content
Carroll Groomes HoldingCGH
← All writing
4 min read

AI that earns its keep

Capability is not the question. The right question is whether the payback covers the supervision cost. A working test for putting AI into real operations.

AISupervised automationOperations

There is a strange asymmetry in how AI gets introduced into operational software. The pitch is usually about capability — what the model can do, what it has done in benchmarks, what is theoretically possible. The buying decision is rarely framed around the question that actually determines whether the feature survives: does it earn its keep?

In a service business, "earning its keep" has a specific meaning. The AI has to deliver enough operational leverage to cover the cost of supervising it. If the supervision cost is higher than the work it replaces, you have added a chore, not a tool.

The supervision tax

Every AI feature in operations carries supervision cost. Someone has to review what the model did before it goes out, approve templates and boundaries, spot-check outputs, catch and unwind mistakes, and update prompts and configuration when the work changes.

That cost can be small. It can also be larger than the value of the work the model handled, in which case the feature is a net negative even when it "works." The supervision tax is invisible in demos because demos do not run for six months and there is no operator on the hook for the output.

When AI earns its place

A useful test: would a small operating team voluntarily run this feature for six months without being told to?

The features that pass that test share a few characteristics:

  • The work is repeatable enough that the model handles a large enough volume to matter.
  • The work is bounded enough that mistakes are catchable and reversible.
  • The output is inspectable, so the operator can do efficient spot checks rather than full review.
  • The supervision is scoped — the operator reviews structured exceptions, not raw drafts.
  • The failure mode is non-catastrophic — the system asks a human when in doubt rather than acting and apologizing later.

Most useful AI in operations sits on the boring end of the spectrum: drafting templated messages, summarizing structured calls, categorizing inbound work, surfacing exceptions. These pay back because the volume is high and the supervision tax stays low.

When it doesn't

The reverse is also true. AI features that look impressive but do not earn their keep tend to share:

  • The work is rare enough that the supervision setup costs more than it saves.
  • The output is hard to inspect without re-doing the underlying reasoning.
  • Mistakes are expensive or hard to reverse.
  • The model is acting somewhere the operator does not have time to keep watching.

These features die quietly. The operator stops using them, then turns them off, then forgets they existed. The line item on the budget stays for a while longer.

The honest framing

The right framing for AI in operations is not "can this model do X?" — current models can do an enormous range of things. The right framing is: for this specific operational workflow, does the model add more leverage than the cost of supervising it?

That question can have different answers for the same model and the same business at different scales. A workflow that does not earn its keep at five customers a week may earn it at fifty. The shape of the work matters more than the impressiveness of the model.

Closing

We are building on the assumption that the AI features that survive in service operations will be the ones that pay back the supervision tax. That is a high bar. It rules out a lot of demo-friendly features. It also tends to rule in the ones that quietly hold up over time. That is the cohort we care about.