Through the ages, engineers have demonstrated that Mother Nature holds the solution to some of humankind’s most challenging problems. The natural world is full of incredible feats of engineering honed to near perfection through evolution by natural selection.
Engineers have mimicked the shape of the whale fin and applied it to wind turbines; aeronautical engineers at Airbus modelled from the tips of eagles wings; swimming costumes mimic shark skin’s overlapping scales (dermal denticles), and the echolocation power of bats was replicated in military sonar technology to locate objects.
Today, almost everything we do is captured digitally. This has driven an exponential growth in the wide variety of data being captured by enterprises and companies are keen to use this expanse of data to drive AI solutions to business problems. However, the amount of data, while a blessing, can also be a curse when it comes to deploying production-ready AI.
We know that the sheer quantity of inputs, or features, used in machine learning models can quickly become unmanageable. At Elula we regularly use datasets that involve a thousand or more inputs, and find ourselves with two problems: (1) How do we select which inputs, or features, are relevant? And (2) How do we combine and modify inputs to create relevant new features?
Typically, a data scientist (or team of data scientists) will spend days, weeks and sometimes months manually and painstakingly testing the importance of each feature, using their own judgement and experience to (hopefully) build new features, and then manually run models to evaluate the improvement each new feature may bring.
Not only is this manual process time consuming, it is also prone to error, not repeatable and reliant on a data scientist having a deep knowledge of the business domain.
Evolutionary algorithms have seen widespread success in many areas of science, and a process known as evolutionary feature engineering has been documented in literature with varying perspectives due to its complexity. However, attempts in applying this process to solve real business problems have not been seen, at least not at scale – until now.
At Elula one of the methods we are trialling to solve this common problem is evolutionary feature engineering used in our customer engagement and retention product, Sticky (check out Sticky in this video link).
Our proprietary algorithm is simple and intuitive in comparison to the algorithms in the literature and most importantly, can scale to the degree we require.
So, what is evolutionary feature engineering?
Starting with a baseline model, we train it on all features. Once we have our baseline model trained and know how important each feature is (now possible for traditionally uninterpretable models, such as a deep neural net), we create a feature ecosystem. We assign each feature a gender: female or a male. We then form pairs of features, pairing the best female features with the best male features. This process continues until each feature is paired up.
Once features are paired, we produce offspring. The more important a feature is, the more offspring it will have. Offspring are produced by combining and transforming the parents. For example, an offspring can be the sum of the two parents, the logarithm of one parent, or one parent divided by another. More important features will pass on more of their information, and less important features will not pass on their information, nor will they survive into the next generation.
Just like the evolution of species and natural selection, through this process, the strongest features live and the weakest die.
The evolutionary feature engineering algorithm enables Elula to automatically combine important features and test many combinations, to arrive at even more useful features. This process also allows us to control the number of features we want. For example, we can automatically turn a 3,000-feature dataset into a 150-feature dataset and do so intelligently.
The algorithm can easily run hundreds of iterations, so that at the end of the process we have hundreds of generations of features. However, we only have the children that were produced by important features and found to be important themselves.
We run hundreds of iterations, and at each iteration we keep track of how the model performs to check if it has plateaued or arrived at an optimal feature set.
Like everything, there are certainly some drawbacks. For example, if two features are found to be unimportant on their own, but one of their children is important, that child will never exist.
However, Elula’s evolutionary feature engineering algorithm can automate and accelerate the process of feature engineering to uncover some important child features which would have otherwise gone unnoticed. This process also frees our data scientists to spend their time in other areas.
The questions for data scientists are; what answers does Mother Nature hold for artificial intelligence? Can theories like evolutionary feature engineering help us answer some of our most complex challenges? What will Darwin’s influence be on modern thought?
“To kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact.” – Charles Darwin