Skip to main content
MLOps Academy
Back to Production Environment

Data Management in Production

Production data pipelines, feature serving, governance, and data quality at scale

Production Data Pipelines
Running and operating data pipelines in production for training and inference.

Requirements

  • SLA/SLO for pipeline completion (e.g. “training data refreshed by 6am”)
  • Idempotency and exactly-once or at-least-once semantics where needed
  • Monitoring: run duration, failure rate, data freshness, row counts
  • Alerting on failures, schema changes, or anomalous volumes
  • Rollback: ability to re-run from a previous version or revert outputs

Operational Practices

  • Version pipeline code and config; deploy via CI/CD like application code
  • Log lineage (inputs → outputs) for audits and debugging
  • Use the same pipeline code path as dev/staging where possible; differ only by config and scale