Methodology

The project follows a structured HR analytics pipeline from data intake to business impact.

01
Data Intake & Profiling
Raw HR data is ingested from employee spreadsheets, HRIS exports, and payroll systems. The profile step identifies column types, missingness, duplicate records, and likely PII fields.
02
PII Detection
The pipeline uses header matching and value pattern scanning to classify columns as PII. High-risk fields including names, emails, phone numbers, and IDs are tagged for removal or masking. This enables analytics without exposing personal identifiers.
03
Cleaning & Standardisation
Missing department values are imputed to Unknown, salary strings are converted to numeric bands, and date fields are normalised before analysis. Outliers are flagged, but not automatically removed, to preserve signal for HR review.
04
Exploratory Data Analysis
The cleaned data is summarised with modern visual analytics: bar charts for headcount, donut charts for gender balance, and heatmaps for attrition hotspots. EDA reveals the most material workforce trends before any predictive modelling is performed.
05
Insight Generation
The pipeline produces operational metrics such as attrition rate, engagement score, training coverage, and tenure distribution. These metrics are surfaced in executive dashboards and action-oriented HR briefs.
06
Business Impact & Adoption
By delivering trustworthy HR analytics, the project supports retention planning, compensation fairness reviews, and workforce allocation decisions. The result is faster decision-making, fewer manual spreadsheet errors, and a more confident people analytics practice.
📄 Documentation 🧪 Sandbox →