Default prediction with Zenml

What we built

A simple ML pipeline that helps to perform loan default prediction. The actual tasks carried out by the pipeline include:

dataset preparation
feature engineering
model training and retraining
model serving

The submission also includes a GUI that allows users to:

run batch inference
view business reports (outcome of prediction)
monitor for data drift

Why we chose this for our project to work on?

Our goal of participation is to learn Zenml. We began by exploring some public datasets and found the (AMEX Default Prediction dataset)[https://www.kaggle.com/competitions/amex-default-prediction/data], which contains borrowers’ 13-month profile data as learning features and the occurrence of a default event as the learning target. We modeled a binary classifier based on this dataset.

We decided that this use case would be interesting to work on as we have not seen a lot of MLOps use cases shared publicly from an industry like banking, that requires very strict compliance and high transparency (during auditing) in the areas of dataset preparation, model development, model evaluation (and interpretation) and more.

How we used ZenML to build out our pipeline?

The first pipeline we created is the model training pipeline. The pipeline contains the following steps:

create training config
fetch training data
fetch validation data
fetch labels
feature engineer training data
feature engineer validation data
training data preparation
train an XGBoost model
create a prediction service
evaluate model and compare models to find the winning model
deploy last best model (MLflow deployer)

We also attempted to build 2 slightly different inference pipelines to support 2 use cases:

A batch inference pipeline that answers “how many borrowers will default their payment next month?”
single-customer inference pipeline to predict if a given customer id will fulfill their payment or not. These pipelines contain the following steps:
fetch inference data
feature engineer the inference data
load prediction service
make prediction with deployed model
store predictions

What stack components we used and how we structured our code

We use the MLflow component in this submission. Despite using Evidently to perform drift detection, it was not integrated using Zenml’s component. All the Zenml pipeline code are kept in the app folder, and we followed the common code structure used by Zenml users by writing all the pipelines in a ‘pipelines’ folder while the steps are kept in a ‘steps’ folder. Here’s an example of how we created the training pipeline using the individual steps:

Any demo or visual artifacts to showcase what we built

Run the launch.sh script in our project to use the GUI in your browser.