Roughly 4 to 5% of their customers are not renewing contracts. As each contract has a lot of value, this loss is huge. They want to predict in advance who is likely to churn and try to prevent it.
The business needed above 40% precision for a certain recall. Their internal data science team was not able to get any measurable precision (<2%). Despite extremely qualified data scientists using very powerful models (variants of Gradient Boosting Machines), the improvement was minimal.
In most champion, challenger scenarios like this (where we need to beat an existing model hands-down), we have a rule of thumb.
If the data is structured, don’t worry much about improving the algorithm. Focus on other aspects. Here is what we did:
In 3 months, we moved the precision from 2% to 38% with these systematic experiments and with the same algorithm they had been using…
Even today, the basics of machine learning matter. Thinking about data, splitting it correctly, doing visual observations, systematic experimentations never go out of fashion!