Methodology

A step-by-step walkthrough of the analytical and modelling approach.

01
Problem Framing & Label Definition
Churn was defined as account cancellation or 60+ days of zero usage within a 30-day forward window. Prediction horizon: 30 days before the event. Class imbalance (92% non-churn / 8% churn) was addressed using SMOTE oversampling and class_weight adjustment.

Evaluation metric chosen: AUC-ROC (prioritised over accuracy due to class imbalance). Secondary metrics: Precision@K for top-N ranked customers.
02
Feature Engineering (120+ Features)
Features were generated across five categories:

Behavioural: Call frequency trend (7/14/30d windows), data usage delta, top-up interval regularity.
Account: Tenure, plan type, contract end proximity, number of complaints.
Financial: Average monthly spend, payment delay history, credit score proxy.
Network: Call drop rate, coverage zone, roaming frequency.
Engagement: App login recency, customer service contact frequency.

Rolling window aggregations (7/14/30/90 days) created temporal features without data leakage using a strict time-split.
03
Model Selection & Hyperparameter Tuning
Four models were benchmarked: Logistic Regression (baseline), Random Forest, LightGBM, and XGBoost. XGBoost achieved the highest AUC (0.893) and was selected for production.

Hyperparameter search used Optuna (Bayesian optimisation, 200 trials):
max_depth: 6, learning_rate: 0.05, n_estimators: 400, subsample: 0.8, colsample_bytree: 0.7, scale_pos_weight: 11.5

Threshold calibration: operating threshold set at 0.38 (optimised for F1 on validation set, not default 0.5).
04
SHAP Explainability Layer
SHAP (SHapley Additive exPlanations) was integrated to make every prediction interpretable. For each scored customer, a SHAP waterfall chart is generated showing the top 5 features driving their individual churn risk.

This was a business requirement: retention agents needed to understand why a customer was flagged to personalise their outreach — not just receive a score.

Global SHAP summary identified: data_usage_30d_delta, complaint_count_90d, and tenure_months as the three most predictive features.
05
Model Validation & Drift Monitoring
Validation strategy: time-based train/test split (no random shuffle) to prevent leakage. Walk-forward validation across 4 monthly cohorts confirmed stable AUC (0.87–0.91).

Production monitoring: PSI (Population Stability Index) tracked weekly to detect feature drift. Model retrain triggered when PSI > 0.20 on >3 key features. MLflow used for experiment tracking and model registry.
06
Deployment & CRM Integration
Model serialised with joblib and served via Flask REST endpoint. Daily batch scoring job (Airflow DAG) scores the entire customer base nightly and pushes top 2,000 at-risk customers to the CRM (Salesforce) retention queue.

A/B test design: 50% of flagged customers routed to retention team vs. control group. Pilot results: 34% reduction in churn among treated cohort over 90 days.
07
Streamlit-to-Flask Portfolio Adaptation
The original Streamlit application was converted into a Flask-ready backend and a fast portfolio sandbox. The Flask layer exposes API routes for training, prediction, sample data preview, and download, while the browser sandbox mirrors the workflow for instant demos without Streamlit cold-boot delays. The adapted methodology preserves the core business flow: train on historical churn, score current customers, prioritise high-risk customers, explain drivers, and export an action queue for CRM or retention teams.
📄 Full Documentation 🧪 Try the Sandbox →