Helm Charts for Production — A Field Guide is the work that turns a deploy into a system. The deployment is one moment; the system is the next 18 months of uptime, incidents, and improvements. This guide is the field-tested pattern for this DevOps practice.
CI/CD
The CI/CD pipeline is the spine of the system. The right pipeline has six stages:
- Lint. Catch syntax errors and style violations in seconds.
- Test. Run unit tests, integration tests, security scans.
- Build. Produce the artefact (container, package, binary).
- Deploy to staging. The same way you deploy to production.
- Smoke test staging. The top 5 user flows.
- Deploy to production. Canary, blue-green, or rolling.
name: Pipeline
on: [push, pull_request]
jobs:
lint:
steps: [run: npm run lint]
test:
steps: [run: npm test]
build:
needs: [lint, test]
steps: [run: docker build -t app:"$GITHUB_SHA" .]
deploy-staging:
needs: [build]
steps: [run: ./deploy.sh staging]
smoke-test:
needs: [deploy-staging]
steps: [run: ./smoke-test.sh https://staging.example.com]
deploy-prod:
needs: [smoke-test]
if: github.ref == "refs/heads/main"
steps: [run: ./deploy.sh production]
For the broader CI/CD patterns, see the CI/CD guide.
Observability
The three pillars: metrics, logs, traces. Modern systems need all three.
| Pillar | What it answers | Tool |
|---|---|---|
| Metrics | How much? How fast? How many? | Prometheus + Grafana, Datadog, New Relic |
| Logs | What happened? | Loki, ELK, Datadog Logs |
| Traces | Where did the request spend its time? | Jaeger, Tempo, OpenTelemetry |
If you are reading logs to debug, your metrics and traces are not good enough. The right signal is a metric or a trace; logs are for the deep-dive when neither helps.
For the broader observability patterns, see the monitoring guide and the distributed tracing guide.
Incident response
When the system breaks, the runbook:
- Detect. The alert fires.
- Triage. On-call engineer investigates, identifies severity.
- Mitigate. Restore service (rollback, scale up, fail over).
- Communicate. Stakeholders, customers, status page.
- Resolve. Permanent fix.
- Post-mortem. Blameless. What went well, what didn't, what to change.
def route_alert(alert):
if alert.severity == "critical":
page_oncall(alert)
notify_incident_channel(alert)
elif alert.severity == "high":
notify_team_channel(alert)
elif alert.severity == "medium":
create_ticket(alert)
else:
log(alert)
For the broader incident patterns, see the incident response runbook.
The blameless post-mortem
The most important habit. The template:
- Summary. What happened, in 3 sentences.
- Timeline. When the alert fired, when the on-call engaged, when the fix deployed.
- Root cause. The technical cause (not the human cause).
- What went well. The habits that caught the incident early.
- What didn't. The habits that let the incident happen.
- Action items. The specific changes that will prevent the next one.
- Lessons. The systemic insight, if any.
Blameless means the system is accountable, not the person. If the same engineer made the same mistake twice, the action item is to fix the system so the next engineer cannot make the same mistake. The person is not named.
Wrapping up
The pipeline, the observability, the incident response, the post-mortem. Get all four right and the system is reliable. The discipline is the same as any production system — fail safely, learn quickly, and improve the system after every incident.
Wrapping up
That is the working approach I use on Acumatica projects. The same patterns show up whether you are in Nairobi, Johannesburg, Kigali, Lusaka or Harare — and they are the things that keep work moving when an upgrade lands at 6 PM on a Friday. If you are stuck on something specific, reach out or keep reading through the rest of the Acumatica blog.
Independent software engineer in Nairobi specialising in Acumatica customisations, Laravel backends, and tax fiscalisation integrations across East and Southern Africa.