Human-in-the-Loop for AI Agents is the work that defines the next phase of enterprise software. ERP systems hold the most valuable business data in the company — customers, orders, invoices, inventory, employees, financials. AI agents that can read that data, reason about it, and act on it are the difference between an ERP that records history and an ERP that drives the business. This guide is the field-tested pattern for this agent pattern in an ERP context.
What is an AI agent
An AI agent is a software system that:
- Perceives its environment (reads ERP data, receives emails, watches for events).
- Reasons about what to do (uses an LLM to plan, decide, summarise).
- Acts on the environment (creates records, sends emails, triggers workflows).
- Learns from feedback (ratings, corrections, outcomes).
The difference between an agent and a chatbot is action. A chatbot talks; an agent does.
An LLM that can read and write Acumatica records is more useful than an LLM that can chat about Acumatica. The tools — the function calls the agent can make — are what determine the agent’s value. Spend the time designing the tools, not the prompt.
The agent architecture
The standard pattern for an enterprise AI agent has six components:
| Component | What it does | Example |
|---|---|---|
| Perception | Reads state | Acumatica REST API for AR balance, ageing |
| Tool registry | Defines what the agent can do | list_overdue_invoices, send_payment_reminder |
| Planner | Decides what to do | LLM with a tool-using prompt |
| Executor | Does the work | Function call to the tool, then to the Acumatica API |
| Memory | Remembers what happened | Vector store of past actions; conversation history |
| Evaluator | Checks the work | Human feedback, automated checks, outcome tracking |
from langchain.tools import tool
import requests
@tool
def list_overdue_invoices(customer_id: str = None) -> list:
"""List overdue AR invoices, optionally filtered by customer."""
url = "https://acumatica.example.com/entity/Default/24.200.001/ARInvoice"
params = {"$filter": "Status eq 'O’ and DueDate lt Today()"}
if customer_id:
params["$filter"] += f" and CustomerID eq '{customer_id}'"
headers = {"Authorization": f"Bearer {get_acumatica_token()}"}
r = requests.get(url, params=params, headers=headers)
r.raise_for_status()
return r.json()
@tool
def send_payment_reminder(invoice_refnbr: str, customer_email: str) -> str:
"""Send a payment reminder email for a specific invoice."""
# ... send via your transactional email provider
return f"Reminder sent to {customer_email} for {invoice_refnbr}"
For the broader API integration patterns, see the REST API definitive guide.
RAG over ERP data
A common agent pattern is "ask questions about your data." The implementation: Retrieval-Augmented Generation (RAG). The flow:
- Index the data. Embed the ERP records (or summaries) into a vector store.
- Retrieve on question. Find the K most similar records to the user's question.
- Augment the prompt. Pass the retrieved records as context to the LLM.
- Generate the answer. The LLM answers the question with the records as context.
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
from langchain.chains import RetrievalQA
# Step 1: Index (run nightly)
embeddings = OpenAIEmbeddings()
vectorstore = Pinecone.from_documents(ar_invoices, embeddings, index_name="ar")
# Step 2-4: Query
qa = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o"),
retriever=vectorstore.as_retriever(search_kwargs={"k": 5})
)
answer = qa.run("Which customers have overdue invoices over $50k?")
Prompt patterns that work
The prompt is the agent contract. The patterns that work:
- System prompt sets the persona and the rules. "You are an AR assistant for a mid-sized distributor. You can read invoices, send payment reminders, and create credit memos. You cannot delete records or change prices."
- Few-shot examples for ambiguous cases. Show the LLM 3-5 examples of the right output for the ambiguous cases.
- Structured output for tool calls. Force the LLM to output JSON in the shape the tool expects.
- Re-ask on uncertainty. If the LLM is uncertain, ask it to re-read the context, or to ask the user for clarification.
You are an Acumatica AR assistant.
When asked to send a payment reminder, respond with JSON only:
{
"action": "send_reminder" | "ask_clarification" | "no_action",
"invoice_refnbr": string | null,
"customer_email": string | null,
"message": string | null,
"reason": string
}
Do not add any text outside the JSON.
Safety, guardrails, and observability
AI agents that touch production data need guardrails. The minimum:
- Allow-list of tools. The agent can only call tools on the allow list. No shell access, no arbitrary file reads.
- Rate limits per agent. The agent cannot make 1000 API calls in a minute.
- Cost limits per conversation. The agent cannot run for 30 minutes and rack up a $500 LLM bill.
- Audit log of every action. Every tool call is logged with the prompt, the result, the timestamp, the user.
- Human-in-the-loop for high-impact actions. The agent cannot release a $1M invoice without a human approval.
No matter how good the prompt, the LLM will sometimes invent a tool call that does not exist, or return a wrong value. The only protection is a deterministic layer between the LLM and the production system. The agent suggests; the deterministic layer validates and executes.
For the broader observability patterns, see the monitoring guide and the distributed tracing guide.
Evaluating the agent
An agent without evaluation is a science experiment. The metrics that matter:
| Metric | What it measures | How to compute |
|---|---|---|
| Task success rate | Did the agent complete the task? | Human eval on 100 sample tasks |
| Tool call accuracy | Did the agent call the right tool? | Compare against ground truth |
| Hallucination rate | Did the agent invent a tool or value? | Schema validation + manual review |
| Latency p50/p95 | How long does a task take? | Traces |
| Cost per task | How much does a task cost? | Token usage × rate |
Wrapping up
The architecture, the tools, the prompts, the safety, the evaluation. Get all five right and the agent is a product. Skip the safety or the evaluation and the agent is a liability. The discipline is the same as any production system, with the LLM as the variable you cannot fully control.
Wrapping up
That is the working approach I use on Acumatica projects. The same patterns show up whether you are in Nairobi, Johannesburg, Kigali, Lusaka or Harare — and they are the things that keep work moving when an upgrade lands at 6 PM on a Friday. If you are stuck on something specific, reach out or keep reading through the rest of the Acumatica blog.
Independent software engineer in Nairobi specialising in Acumatica customisations, Laravel backends, and tax fiscalisation integrations across East and Southern Africa.