This pipeline ingests from 6 source systems, applies dbt transformations to a central data warehouse, and publishes 12 Power BI reports on schedule — eliminating 40 hours per week of manual analyst effort and reducing report turnaround from 3 days to under 1 hour.
| Source | Type | Ingestion method | Cadence |
|---|---|---|---|
| Finance ERP | PostgreSQL | Full + incremental CDC | Hourly |
| HR System (HRIS) | REST API (JSON) | Incremental by updated_at | Daily 02:00 |
| CRM | Salesforce API | Bulk API 2.0 delta | Daily 03:00 |
| Field Operations | CSV over SFTP | File watch + parse | On arrival |
| M&E Platform | MySQL | Incremental by record_id | Every 4 hours |
| Budget System | Excel (SharePoint) | Graph API file download | Daily 06:00 |
The transformation layer uses dbt Core running against BigQuery. Models are organised in three layers:
models/
staging/ # 1:1 source cleaning, type casting, renaming
stg_finance/
stg_hr/
stg_crm/
stg_field_ops/
intermediate/ # business logic joins
int_beneficiary_programme.sql
int_expenditure_actuals.sql
marts/ # final fact + dimension tables for BI
fct_disbursements.sql
fct_programme_outcomes.sql
dim_beneficiary.sql
dim_geography.sql
All models have schema tests (not_null, unique, accepted_values) and row-count change tests. A failing test blocks the DAG downstream tasks and fires a Slack alert.
Apache Airflow 2.8 orchestrates the pipeline with the following DAG structure:
| Report | Audience | Refresh cadence |
|---|---|---|
| Executive Dashboard | CEO, Country Director | Hourly |
| Programme Outcomes Scorecard | Programme Managers | Daily |
| Finance vs Budget | Finance Team | Daily |
| Donor Disbursement Tracker | Finance, Donors | Daily |
| Beneficiary Reach Report | M&E Officers | Daily |
| Field Operations Summary | Operations Manager | On arrival |
| HR Headcount & Attrition | HR Manager | Daily |
| Compliance & Audit Log | Compliance Officer | Weekly |
The sandbox page has been completed as a deployment-ready browser simulation. It generates demo source records, optionally injects realistic data quality issues, runs staging and mart transformations, displays dbt-style test results, charts value by programme, explains impact, lists practical use cases and exports both source and mart CSV files.
| Simulator component | Purpose | Portfolio value |
|---|---|---|
| Demo data generator | Creates safe multi-source data without exposing client information. | Demonstrates data engineering logic while respecting confidentiality. |
| Quality gates | Checks duplicate IDs, missing budgets, invalid programmes, over-budget records and date validity. | Shows how poor-quality records are quarantined before reporting. |
| Reporting mart | Builds fct_pipeline_reporting with utilisation, cost-per-case and clean reporting fields. | Connects ETL work directly to Power BI-ready business outputs. |
| Impact panel | Estimates time saved, turnaround improvement and mart quality score. | Links technical automation to measurable business outcomes. |
Packaged sample files are available under data/synthetic/etl_pipeline_synthetic_source.csv and data/synthetic/etl_pipeline_reporting_mart.csv.
Pipeline health is monitored via: