dbt Fundamentals — A Field Guide

dbt Fundamentals — A Field Guide is the work that turns raw data into decisions. The pipeline from "we have data" to "we have a model that runs in production" is the same in every industry: ingest, transform, model, serve, monitor. This guide is the field-tested pattern for this tool in a production data context.

The data pipeline

The canonical data pipeline has five stages:

Ingest. Pull from source (database, API, file, stream).
Store. Land in a warehouse (Snowflake, BigQuery, Postgres) or a lake (S3, GCS).
Transform. Clean, join, aggregate (dbt, Spark, dbt + Airflow).
Serve. Expose to consumers (BI tool, API, application).
Monitor. Quality, freshness, lineage.

The pipeline is the product

A data pipeline that nobody can trust is useless. A data pipeline that runs but nobody knows what it does is a liability. Treat the pipeline as a product: documented, tested, monitored, owned.

dbt for analytics engineering

dbt is the de facto standard for SQL-based data transformation. The patterns:

SQL · DBT MODEL

-- models/marts/finance/ar_aging.sql
{{ config(materialized="incremental", on_schema_change="append_new_columns") } }

with ar as (
    select * from {{ ref("stg_ar_invoices") } }
    where docdate <= current_date
    {% if is_incremental() %}
        and _loaded_at > (select max(_loaded_at) from {{ this } })
    {% endif %}
),

customers as (
    select * from {{ ref("dim_customer") } }
)

select
    ar.customer_id,
    c.acctname as customer_name,
    ar.status,
    sum(ar.curyorigdocamt) as total_amount,
    sum(ar.curydocbal) as open_balance,
    max(ar.duedate) as oldest_due_date,
    datediff("day", max(ar.duedate), current_date) as days_overdue
from ar
left join customers c on ar.customer_id = c.baccountid
where ar.status = "O"
group by ar.customer_id, c.acctname, ar.status

For the broader data engineering patterns, see the data warehouse feed guide.

ML ops in production

The lifecycle of an ML model in production:

Train. Offline training on historical data.
Evaluate. Test on a held-out set; track metrics.
Deploy. Serve the model (REST, batch, embedded).
Monitor. Track drift, accuracy, latency.
Retrain. When drift exceeds threshold.

PYTHON · MODEL SERVING

from fastapi import FastAPI
import joblib
import numpy as np

app = FastAPI()
model = joblib.load("model.pkl")

@app.post("/predict")
def predict(features: list[float]):
    X = np.array(features).reshape(1, -1)
    prediction = model.predict(X)
    probability = model.predict_proba(X).max()
    return {"prediction": int(prediction[0]), "confidence": float(probability)}

Vector databases and embeddings

Vector databases store embeddings — the dense numeric representation of text, images, audio. The use cases:

Semantic search. "Find me the docs that are about X" — even if X is not a keyword.
Recommendation. "Find me the items similar to this one."
RAG. As discussed in the AI agents guide.
Clustering. Group similar items for analysis.

PYTHON · VECTOR SEARCH

from sentence_transformers import SentenceTransformer
import pinecone

model = SentenceTransformer("all-MiniLM-L6-v2")
index = pinecone.Index("my-index")

def search(query: str, k: int = 5) -> list:
    embedding = model.encode(query).tolist()
    results = index.query(embedding, top_k=k, include_metadata=True)
    return [r["metadata"]["text"] for r in results["matches"]]

Monitoring the pipeline

The monitoring that matters:

Freshness. Is the data up to date?
Volume. Is the volume what we expect?
Schema. Has the source schema changed?
Quality. Are the values within expected ranges?
Lineage. Where did this number come from?

For the broader monitoring patterns, see the monitoring guide and the Query Store guide.

Wrapping up

The pipeline, the transformation, the model serving, the vector search, the monitoring. Get all five right and the data is a product. The discipline is the same as any production system — fail safely, monitor always, and improve the system after every incident.

Wrapping up

That is the working approach I use on Acumatica projects. The same patterns show up whether you are in Nairobi, Johannesburg, Kigali, Lusaka or Harare — and they are the things that keep work moving when an upgrade lands at 6 PM on a Friday. If you are stuck on something specific, reach out or keep reading through the rest of the Acumatica blog.

John Kihiu

Acumatica ERP Developer · Laravel Engineer

Independent software engineer in Nairobi specialising in Acumatica customisations, Laravel backends, and tax fiscalisation integrations across East and Southern Africa.

The data pipeline

dbt for analytics engineering

ML ops in production

Vector databases and embeddings

Monitoring the pipeline

Wrapping up

Related reading

Wrapping up