Human-in-the-Loop for AI Agents

Human-in-the-Loop for AI Agents is the work that defines the next phase of enterprise software. ERP systems hold the most valuable business data in the company — customers, orders, invoices, inventory, employees, financials. AI agents that can read that data, reason about it, and act on it are the difference between an ERP that records history and an ERP that drives the business. This guide is the field-tested pattern for this agent pattern in an ERP context.

What is an AI agent

An AI agent is a software system that:

Perceives its environment (reads ERP data, receives emails, watches for events).
Reasons about what to do (uses an LLM to plan, decide, summarise).
Acts on the environment (creates records, sends emails, triggers workflows).
Learns from feedback (ratings, corrections, outcomes).

The difference between an agent and a chatbot is action. A chatbot talks; an agent does.

The agent is only as good as its tools

An LLM that can read and write Acumatica records is more useful than an LLM that can chat about Acumatica. The tools — the function calls the agent can make — are what determine the agent’s value. Spend the time designing the tools, not the prompt.

The agent architecture

The standard pattern for an enterprise AI agent has six components:

Component	What it does	Example
Perception	Reads state	Acumatica REST API for AR balance, ageing
Tool registry	Defines what the agent can do	list_overdue_invoices, send_payment_reminder
Planner	Decides what to do	LLM with a tool-using prompt
Executor	Does the work	Function call to the tool, then to the Acumatica API
Memory	Remembers what happened	Vector store of past actions; conversation history
Evaluator	Checks the work	Human feedback, automated checks, outcome tracking

PYTHON · AGENT TOOL DEFINITION

from langchain.tools import tool
import requests

@tool
def list_overdue_invoices(customer_id: str = None) -> list:
    """List overdue AR invoices, optionally filtered by customer."""
    url = "https://acumatica.example.com/entity/Default/24.200.001/ARInvoice"
    params = {"$filter": "Status eq 'O’ and DueDate lt Today()"}
    if customer_id:
        params["$filter"] += f" and CustomerID eq '{customer_id}'"
    headers = {"Authorization": f"Bearer {get_acumatica_token()}"}
    r = requests.get(url, params=params, headers=headers)
    r.raise_for_status()
    return r.json()

@tool
def send_payment_reminder(invoice_refnbr: str, customer_email: str) -> str:
    """Send a payment reminder email for a specific invoice."""
    # ... send via your transactional email provider
    return f"Reminder sent to {customer_email} for {invoice_refnbr}"

For the broader API integration patterns, see the REST API definitive guide.

RAG over ERP data

A common agent pattern is "ask questions about your data." The implementation: Retrieval-Augmented Generation (RAG). The flow:

Index the data. Embed the ERP records (or summaries) into a vector store.
Retrieve on question. Find the K most similar records to the user's question.
Augment the prompt. Pass the retrieved records as context to the LLM.
Generate the answer. The LLM answers the question with the records as context.

PYTHON · RAG PIPELINE

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
from langchain.chains import RetrievalQA

# Step 1: Index (run nightly)
embeddings = OpenAIEmbeddings()
vectorstore = Pinecone.from_documents(ar_invoices, embeddings, index_name="ar")

# Step 2-4: Query
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o"),
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5})
)
answer = qa.run("Which customers have overdue invoices over $50k?")

Prompt patterns that work

The prompt is the agent contract. The patterns that work:

System prompt sets the persona and the rules. "You are an AR assistant for a mid-sized distributor. You can read invoices, send payment reminders, and create credit memos. You cannot delete records or change prices."
Few-shot examples for ambiguous cases. Show the LLM 3-5 examples of the right output for the ambiguous cases.
Structured output for tool calls. Force the LLM to output JSON in the shape the tool expects.
Re-ask on uncertainty. If the LLM is uncertain, ask it to re-read the context, or to ask the user for clarification.

PROMPT · STRUCTURED OUTPUT

You are an Acumatica AR assistant.

When asked to send a payment reminder, respond with JSON only:
{
  "action": "send_reminder" | "ask_clarification" | "no_action",
  "invoice_refnbr": string | null,
  "customer_email": string | null,
  "message": string | null,
  "reason": string
}

Do not add any text outside the JSON.

Safety, guardrails, and observability

AI agents that touch production data need guardrails. The minimum:

Allow-list of tools. The agent can only call tools on the allow list. No shell access, no arbitrary file reads.
Rate limits per agent. The agent cannot make 1000 API calls in a minute.
Cost limits per conversation. The agent cannot run for 30 minutes and rack up a $500 LLM bill.
Audit log of every action. Every tool call is logged with the prompt, the result, the timestamp, the user.
Human-in-the-loop for high-impact actions. The agent cannot release a $1M invoice without a human approval.

The agent will hallucinate

No matter how good the prompt, the LLM will sometimes invent a tool call that does not exist, or return a wrong value. The only protection is a deterministic layer between the LLM and the production system. The agent suggests; the deterministic layer validates and executes.

For the broader observability patterns, see the monitoring guide and the distributed tracing guide.

Evaluating the agent

An agent without evaluation is a science experiment. The metrics that matter:

Metric	What it measures	How to compute
Task success rate	Did the agent complete the task?	Human eval on 100 sample tasks
Tool call accuracy	Did the agent call the right tool?	Compare against ground truth
Hallucination rate	Did the agent invent a tool or value?	Schema validation + manual review
Latency p50/p95	How long does a task take?	Traces
Cost per task	How much does a task cost?	Token usage × rate

Wrapping up

The architecture, the tools, the prompts, the safety, the evaluation. Get all five right and the agent is a product. Skip the safety or the evaluation and the agent is a liability. The discipline is the same as any production system, with the LLM as the variable you cannot fully control.

Wrapping up

That is the working approach I use on Acumatica projects. The same patterns show up whether you are in Nairobi, Johannesburg, Kigali, Lusaka or Harare — and they are the things that keep work moving when an upgrade lands at 6 PM on a Friday. If you are stuck on something specific, reach out or keep reading through the rest of the Acumatica blog.

John Kihiu

Acumatica ERP Developer · Laravel Engineer

Independent software engineer in Nairobi specialising in Acumatica customisations, Laravel backends, and tax fiscalisation integrations across East and Southern Africa.

What is an AI agent

The agent architecture

RAG over ERP data

Prompt patterns that work

Safety, guardrails, and observability

Evaluating the agent

Wrapping up

Related reading

Wrapping up