Applied AITech

Feature Selection: Greedy Subset Scorer

By February 17, 2021 February 18th, 2021 No Comments

At Elula, our AI software is set up for rapid development, integration, and production. We’re always undertaking R&D and looking for ways to improve our product. We built our platform (we call the “Engine”) with this primary goal in mind. Ours is a robust, modular platform, allowing any component to be enhanced without affecting the rest of the pipeline. Rigorous automated testing ensures change won’t affect the end-to-end solution. This is a must-have for us because we continually experiment with new techniques and ideas that improve and future-proof our products.

A recent enhancement to our Sticky product has been the upgrade of its feature selection algorithm, the “Greedy Subset Scorer.” Once we developed the idea, in just weeks, we prototyped, evaluated, and productionised it—delivering rapid, tangible value to our customers.

Why we’re excited about the Greedy Subset Scorer

Sticky predicts which home loan customers are likely to churn, considering 20,000+ features (or attributes of churn) to understand who will churn and then uses explainable AI to understand why. With this many features available, choosing feature sets manually would cause significant underperformance.

Feature engineering is a critical challenge faced by data scientists. In big data environments like ours, where total potential features can exceed total observations, scientifically intelligent feature selection is required. There are many available packages for such, much of which consider feature-target correlations (e.g., in scikit-learn’s feature_selection module, SelectKBest and SelectPercentile). These methods are fast (can run under 5 minutes for 10,000 features) but have a major drawback: they fail to capture interactions between multiple features.

For example, when predicting home loan customer churn, it might turn out that customers in Melbourne who make additional repayments on their loans are likely to churn due to some new government program—but that doesn’t apply to customers outside of Victoria. A feature selection package that looked only at individual features, rather than feature sets, would see “Victoria” and “additional repayments” as unimportant features; but an algorithm built to consider feature sets would notice that the combination of “Victoria” and “additional repayments” is a strong churn signal.

At Elula we’ve developed several proprietary algorithms, including the Greedy Subset Scorer, that capture complex feature interactions while still maximising model explain-ability (using too many features dilutes the impact of each individual one), training speed (we need results in time to ensure they are useful), and accuracy (using too many features often causes overfitting).

The Greedy Subset Scorer Itself

Elula’s feature selection algorithms capture the most important feature interactions and intelligently select the best (i.e., most predictive) subsets of those features. We first compare thousands of machine learning models, each with randomly selected feature subsets; this enables us to determine both how well each model performed and how important each feature was in each model.

After each experiment batch, we find the best-performing model and advance its features to the next batch. This means every experiment in the subsequent batch consists of both important features and additional random ones. By repeating for each batch, we increase our list of “top” features iteratively – yielding more meaningful and rapid experiments over time.

For example, suppose in predicting home loan customer churn, that we randomly select 100 features for our first experiment. We then evaluate both the 100-feature model’s predictive performance and each of the 100 features’ importances within that model. Suppose that the most important features in this case are “resides near a suburb experiencing housing price decline” and “has many recent education-related credit card transactions.” In the second experiment, rather than selecting another 100 features randomly, we keep the “suburb” and “education transaction” features mentioned and then select an additional 98 features randomly. This preservation of key features, instead of completely random iterative selection, is what makes the Greedy Subset Scorer smart and effective.

Our robust back-testing framework enables simple comparison between the Greedy Subset Scorer and our earlier feature selection algorithms: over the last 12 months, the Greedy Subset Scorer consistently outperformed the others. Once we verified its superiority, embedding and deploying Greedy Subset Scorer was easy to do within our modular platform; we only had to change the feature selection module and nothing else needed reworking. Our customers benefited immediately when we upgraded to Greedy Subset Scorer.

The journey to improve Sticky doesn’t stop here.  Look out for our upcoming articles on Feature Factory (our feature store), DQC (our data observability tool), and an illustrated guide to our unique AI tech stack that enables the modular, rapid-development platform mentioned earlier.