Istio Service Mesh — A Field Guide

Istio Service Mesh — A Field Guide is the work that turns a deploy into a system. The deployment is one moment; the system is the next 18 months of uptime, incidents, and improvements. This guide is the field-tested pattern for this DevOps practice.

CI/CD

The CI/CD pipeline is the spine of the system. The right pipeline has six stages:

Lint. Catch syntax errors and style violations in seconds.
Test. Run unit tests, integration tests, security scans.
Build. Produce the artefact (container, package, binary).
Deploy to staging. The same way you deploy to production.
Smoke test staging. The top 5 user flows.
Deploy to production. Canary, blue-green, or rolling.

GITHUB ACTIONS · 6-STAGE PIPELINE

name: Pipeline
on: [push, pull_request]
jobs:
  lint:
    steps: [run: npm run lint]
  test:
    steps: [run: npm test]
  build:
    needs: [lint, test]
    steps: [run: docker build -t app:"$GITHUB_SHA" .]
  deploy-staging:
    needs: [build]
    steps: [run: ./deploy.sh staging]
  smoke-test:
    needs: [deploy-staging]
    steps: [run: ./smoke-test.sh https://staging.example.com]
  deploy-prod:
    needs: [smoke-test]
    if: github.ref == "refs/heads/main"
    steps: [run: ./deploy.sh production]

For the broader CI/CD patterns, see the CI/CD guide.

Observability

The three pillars: metrics, logs, traces. Modern systems need all three.

Pillar	What it answers	Tool
Metrics	How much? How fast? How many?	Prometheus + Grafana, Datadog, New Relic
Logs	What happened?	Loki, ELK, Datadog Logs
Traces	Where did the request spend its time?	Jaeger, Tempo, OpenTelemetry

Logs are the last resort

If you are reading logs to debug, your metrics and traces are not good enough. The right signal is a metric or a trace; logs are for the deep-dive when neither helps.

For the broader observability patterns, see the monitoring guide and the distributed tracing guide.

Incident response

When the system breaks, the runbook:

Detect. The alert fires.
Triage. On-call engineer investigates, identifies severity.
Mitigate. Restore service (rollback, scale up, fail over).
Communicate. Stakeholders, customers, status page.
Resolve. Permanent fix.
Post-mortem. Blameless. What went well, what didn't, what to change.

PYTHON · ALERT ROUTING

def route_alert(alert):
    if alert.severity == "critical":
        page_oncall(alert)
        notify_incident_channel(alert)
    elif alert.severity == "high":
        notify_team_channel(alert)
    elif alert.severity == "medium":
        create_ticket(alert)
    else:
        log(alert)

For the broader incident patterns, see the incident response runbook.

The blameless post-mortem

The most important habit. The template:

Summary. What happened, in 3 sentences.
Timeline. When the alert fired, when the on-call engaged, when the fix deployed.
Root cause. The technical cause (not the human cause).
What went well. The habits that caught the incident early.
What didn't. The habits that let the incident happen.
Action items. The specific changes that will prevent the next one.
Lessons. The systemic insight, if any.

Blameless is not "no accountability"

Blameless means the system is accountable, not the person. If the same engineer made the same mistake twice, the action item is to fix the system so the next engineer cannot make the same mistake. The person is not named.

Wrapping up

The pipeline, the observability, the incident response, the post-mortem. Get all four right and the system is reliable. The discipline is the same as any production system — fail safely, learn quickly, and improve the system after every incident.

Wrapping up

That is the working approach I use on Acumatica projects. The same patterns show up whether you are in Nairobi, Johannesburg, Kigali, Lusaka or Harare — and they are the things that keep work moving when an upgrade lands at 6 PM on a Friday. If you are stuck on something specific, reach out or keep reading through the rest of the Acumatica blog.

John Kihiu

Acumatica ERP Developer · Laravel Engineer

Independent software engineer in Nairobi specialising in Acumatica customisations, Laravel backends, and tax fiscalisation integrations across East and Southern Africa.

CI/CD

Observability

Incident response

The blameless post-mortem

Wrapping up

Related reading

Wrapping up