The engine follows a practical applied-ML workflow: train from labelled examples, evaluate performance, apply the model to unlabelled data, and return both labels and confidence for human review.
CSV/Excel files are parsed, the text field is selected, labels are normalised to positive, neutral, or negative, and invalid rows are excluded from model training.
TF-IDF transforms text into weighted word and phrase features. This keeps the model lightweight and fast enough for routine operational scoring.
A Logistic Regression classifier learns the relationship between text features and sentiment labels using balanced class weights to reduce bias toward majority classes.
The trained model labels unlabelled feedback and appends sentiment, confidence, and class probability fields to the original dataset for export.
The most important design choice is retraining with the user's own labelled examples. Customer experience, NGO beneficiary feedback, product reviews, and support comments use different vocabulary. By allowing retraining, the model becomes more relevant to the organisation's real language instead of relying only on a generic sentiment dictionary.