TensorFlow 2.0 — Feature Engineering Complexity continues

sandeep srinivasan
Oct 8 · 4 min read

Very low levels of abstraction can inhibit and delay innovation

The first time I looked at TensorFlow, it reminded me of assembly language or perhaps Fortran!
Even though these languages are powerful, due to their low abstraction level, they are also extremely tedious to program and accomplish even small tasks. This lack of abstraction can pose many challenges and barriers to rapid innovation.

The level of abstraction matters

If researchers and developers today were to write FFT (Fast Fourier Transforms) algorithms in Fortran and not use a software package such as Matlab , Mathematica or OctaveDSP chips (digital signal processors) would not be able to translate even simple words, let alone, paragraphs and music. In addition, perhaps the DSP chip, that is present in every mobile device (+Alexa , Google home etc.), would cost $5000 and not $1.50 cents .

One of the key elements that allowed the rapid innovation in the area of DSP’s over the last two decades, was the level of abstraction that Matlab, Mathematica and other similar software packages provided to perform complex computations such as an FFT, with simply one line of code:

Y = fft(X)

TensorFlow 2.0

TensorFlow, while powerful with all the great algorithms embedded needs to be augmented with additional layers of abstraction to scale…

Even though TensorFlow 2.0 is a significant leap forward in raising the abstraction level using the Keras API’s, it still does not sufficiently reduce the complexities of Feature Engineering to build accurate Machine Learning Models. It is thus very hard to use for Non-Machine Learning domain experts and other professionals, which makes one wonder if the non ML experts are simply, ‘not the target audience’ for TensorFlow 2.0.

Automatic Feature Engineering v/s TensorFlow 2.0 Linear Model on the Titanic Dataset

TensorFlow 2.0 describes a linear classifier example of the Titanic dataset. The goal is to build a linear classifier to predict if a person survived the journey on the Titanic or not, based on a set of input features . The dataset contains categorical and numerical features. To deal with this in TensorFlow2.0, there is a lot of code that needs to be written, to build a simple linear classifier model. This also requires a deep understanding of Machine Learning models, Feature Engineering and TensorFlow.

Titanic Dataset
Feature Engineering on the Titanic Dataset using TensorFlow 2.0

VERIFAI Machine Learning Platform: Automatic Feature Engineering

VerifAI’s Automatic Feature Engineering is a set of algorithms that transform the input data into a form (numerical vectors) that the Machine Learning Algorithms can understand. Our Automatic Feature Engineering hides the complexity and details of data transformation required before feeding the data into a DNN (Deep Neural Network) or other machine learning models.

For instance the categorical feature columns in the Titanic dataset such as {‘fare’ , ‘sex’, ‘n_siblings_spouses’, ‘class’ , ‘deck’ ,’ embark_town’ and ‘alone’ } are automatically encoded into numerical values a machine learning model can understand, using a feature hasher, factorizer, one hot encoding or other encoding algorithms . The model’s accuracy is dependent on how the inputs are encoded and interpreted by the machine learning algorithms, thus it is important to map, encode, transform and combine the features accurately to input into a ML model.

ML engineers spend a lot of time mapping features into usable inputs for models. Our Automatic Feature Engineering (AutoMapper) is an important step towards simplifying & democratizing ML, making it available to all.

Automatic mapping and selection of Features that are usable by a DNN and other ML models
Building a Classifier for the Titanic Dataset using VerifAI Machine Learning Platform — 4 lines of code

The VerifAI Machine Learning Platform allows developers to build Classifiers, Regressors and Reinforcement Learning Algorithms with just a few lines of code or no code at all.

VerifAI AutoMapper Produces Feature Analysis plots
Feature Importance Plot produced by the VerifAI AutoMapper

To stay informed about the VerifAI-ML platform please  type in your email and click the ‘Stay Informed’ 

Optimizing Design Verification using Machine Learning: Doing better than Random

William HughesSandeep SrinivasanRohit SuvarnaMaithilee Kulkarni

As integrated circuits have become progressively more complex, constrained random stimulus has become ubiquitous as a means of stimulating a designs functionality and ensuring it fully meets expectations. In theory, random stimulus allows all possible combinations to be exercised given enough time, but in practice with highly complex designs a purely random approach will have difficulty in exercising all possible combinations in a timely fashion. As a result it is often necessary to steer the Design Verification (DV) environment to generate hard to hit combinations. The resulting constrained-random approach is powerful but often relies on extensive human expertise to guide the DV environment in order to fully exercise the design. As designs become more complex, the guidance aspect becomes progressively more challenging and time consuming often resulting in design schedules in which the verification time to hit all possible design coverage points is the dominant schedule limitation. This paper describes an approach which leverages existing constrained-random DV environment tools but which further enhances them using supervised learning and reinforcement learning techniques. This approach provides better than random results in a highly automated fashion thereby ensuring DV objectives of full design coverage can be achieved on an accelerated timescale and with fewer resources.
Two hardware verification examples are presented, one of a Cache Controller design and one using the open-source RISCV-Ariane design and Google’s RISCV Random Instruction Generator. We demonstrate that a machine-learning based approach can perform significantly better on functional coverage and reaching complex hard-to-hit states than a random or constrained-random approach.


Click here to download entire document: http://arxiv.org/abs/1909.13168


Democratizing Machine Learning

The phrase ‘Democratizing Machine Learning’ is being used by many companies small and large, including us @VerifAI and in this article.


de·moc·ra·tize — democratizing:

— introduce a democratic system or democratic principles to.
“public institutions need to be democratized”

— make (something) accessible to everyone.
“mass production has not democratized fashion”

In the context of this article, the second definition is more fitting than the first.

Every company may have a different notion of what they mean by Democratizing Machine Learning and AI.

To make a figurative analogy: A democratic government may give its people the right to vote, but perhaps the voting ballots are hard to decipher.
Such a country would still be a democracy in principle, but in practice this would be a democracy that doesn’t serve all its constituents well.

For example, a country that educates people as to how to use polling booths or makes the ballots very simple, in addition to easy access, would serve all its constituents well.

Democracy and Machine Learning ?

Similar figurative analogies apply to democratizing Machine Learning.
While companies may provide access to ML tools and API’s, they may not provide a level of abstraction or ease of use that makes sense for a developer or a professional who may not be a Machine Learning expert.

To truly democratize machine learning, would mean that a wide range of professionals (including, but not limited to, developers, engineers, scientists, marketeers, accountants, radiologists, cardiologists, law enforcement officers, judges, lawyers and other domain experts) could use Machine Learning to improve their productivity. To Democratize Machine Learning, we need to make Machine Learning not only accessible to everyone but also easily useableby everyone.

We can broadly segment Machine Learning techniques into branches called Supervised LearningUnsupervised Learning and Reinforcement Learning. Democratizing each of these branches of Machine Learning presents their own challenges.

Supervised learning is currently the most widely used form of machine learning, across all industry segments, and, one of the most common applications of Supervised Learning is Classification.

Let take a closer look at what it means to Democratize Machine Learning in the context of a building a Classification app.

Democratizing Classification — Supervised Learning

Classification is one of the most widely used Supervised Machine Learning techniques. This includes classifying images, text and other kinds of data. According to a McKinsey report, Classification techniques will have global economic impact of $2T on multiple industry segments. This number maybe hard to fathom , but, what this says is that Classification of data (images, text, voice) is being used widely across many industry segments, embedded in many ‘killer apps’ and workflows.

In traditional software (Software 1.0), classification is done by writing an algorithm to classify an image, for instance. These algorithms need to be continuously updated to catch ‘unseen’ or ‘different’ conditions. In contrast Machine Learning uses example data to produce a ‘classification algorithm’ that can be generalized to classify unseen conditions quite accurately (e.g. new images).

Classification use case: An interesting classification problem in the banking/lending industry is to assess the current risk of an active loan, based on historical data and the consumer’s profile and behavior. Given a user profile , and data about the loan terms etc., the Machine Learning model, can classify a particular customer’s loan to be safe, risky or unsafe.

A comprehensive data-set was published by LendingClub, a peer-to-peer lending company, the dataset contained about a million active loans, and each loan was described in terms of around 52 features (variables). These features include information about the loan, such as the term of the loaninterest rateloan amount, loan purpose etc. The data also contained features about the customer such as address , annual income, last payment, months since last delinquent payment , total late fees etc.

Democratizing Classification for the bank loan analyst would be to provide the analyst software that would produce the highest accuracy machine-learning model and predict a customers loan to be safe, risky or unsafe.
Choosing the right features to include in the Machine Learning model is critical to the accuracy of the Classifier.

A good effort towards democratizing Classification would be to automate the following steps: Feature Selection, Feature Mapping and Model Creation.

Automatic Feature Selection algorithms are a key part of democratizing machine learning. For the example LendingClub data, the original number of features were 52 (of which 35 were numerical and 18 were categorical).
The final number of features selected, that produced the highest accuracy Classifier was 14. There were a number of features excluded : 37 (of which 24 were numerical and 13 were categorical).

VerifAI’s algorithms excludes redundant features automatically, to produce the highest accuracy Classifier, for the lending-club data. The loan analyst (domain expert) should not have to know which features to exclude or include to achieve a high accuracy model. Software algorithms need to do a good job doing Automatic Feature Engineering.

Automatic Feature Selection using VerifAI-Machine Learning Platform (VerifAI-MLP)
Automatically Feature Engineering helps Democratize Machine Learning

Automatic Feature Mapping is another key set of techniques that transforms the input user data into a form (numerical values) that the Machine Learning Algorithms can understand. Automatic Feature Mapping hides the complexity and details of data transformation required before feeding the data into a DNN (Deep Neural Network) or other machine learning models.

For instance the feature loan_term column may have values such as “60 months”, “42 months” , “36 months” etc. These strings values are automatically encoded into numerical values a machine learning model can understand, using a feature hasher, factorizer, one hot encoding or other encoding algorithms . The model’s accuracy is dependent on how the inputs are encoded and interpreted by the machine learning algorithms, thus it is important to map and encode the features accurately to input into a ML model. ML engineers spend a lot of time mapping features into usable inputs for models. Automatic Feature Mapping (AutoMapper) is an important step towards democratizing ML.

Automatic mapping of Features that are usable by a Deep Neural Network (DNN)

Automatic Model Selection and Creation is the next important step towards democratizing Machine Learning. Model selection is a complex process that ML engineers learn over time with experience. This process comprises of choosing the best-fit ML model for a given data-set, and then tuning the hyper-parameters for the chosen model to improve accuracy and reduce over-fitting.

For a Classification problem, there can be many models we can use: For instance: SVM (Support Vector Machines), DecisionTrees, Random Forests, DNN etc.

An Example of Types of Classifiers
Classifier Accuracy on two Datasets

Classifiers accuracy can vary significantly on differently data-sets, making them highly data-dependent. To make Classifiers robust, we need model-selection algorithms to mitigate data-dependent variance.

Automatic Model Creation Loop

A fully automatic feature engineering and model creation loop, is an essential step towards democratizing ML Classification problems.

Democratizing Reinforcement Learning

Reinforcement Learning (RL) is a disruptive technology with significant potential to improve many challenging engineering and business problems. In reinforcement learning a software agent interacts with an environment by taking actions and getting back rewards for these actions, with a goal of maximizing future cumulative rewards.

Sequential Decision Making: Any process where a sequence of decisions need to be made to maximize (or minimize) a future reward can be modeled as a Reinforcement Learning problem. For example , an RL agent(s) could make a sequence of decisions that could (a) maximizing the click-thru-rate for a customer on your advertising platform, (b) reduce the cost of your home heating and cooling bill over time, (c) minimize attacks on your enterprise network.

Reinforcement Learning is based on a Markov Decision Process (MDP) model, where the action taken by the agent in an environment , impacts the next state in the environment and the reward generated by the environment. The outcomes are partly random and partly controlled by the agent taking the actions.

The multi-armed bandit : RL uses the notion of exploration and exploitation to learn an optimal policy that could maximize future reward. A good example of the tradeoff RL makes at each iteration can be seen by studying the multi-armed-bandit problem , wherein each arm of the bandit (each arm of the bandit pulls a slot-machine lever) provides rewards according to some unknown probability distribution. In that case, exploration is trying many bandit arms, while exploitation is operating a particular bandit arm.

By trading off exploration versus exploitation an RL agent is able to sample a very large state space and make decisions that can maximize future reward better than any ‘directed random’ technique.

Reinforcement Learning Model to Minimize home heating bill
Reinforcement Model implemented as a Deep-Q-Network (DQN)

To democratize reinforcement learning, we need to enable users to express their intent in a manner that can be mapped into an RL problem. There are many current efforts in industry and research towards democratizing RL. Some of these efforts include Facebook’s Horizon, AWS SageMaker-RLOpenAI’s SpinningUpGoogle’s Dopamine and others.

Capturing user Intent: The key issue here is one of capturing the user’s intent and mapping it to an RL algorithm and a DNN architecture. While there has been a lot of effort on optimizing RL algorithms , there needs to be a significant effort to capture user intent, and elevate the abstraction level of the RL-Environment to solve many real world engineering, scientific and business problems.

Democratizing RL — Intent to Implementation

Checks and Balances — Verifying Machine Learning Models

All good democracies have checks and balances that protect the long term sustainability of a nation. Software development based on training neural networks has been termed as Software 2.0, perhaps appropriately so. To sustain this new paradigm, we need to be able to verify Machine Learning models and maintain checks and balances. A large number of decisions are being made by Machine Learning models, and this trend is accelerating into every aspect of our lives.

Courts , Law enforcement, DMV , Insurance Companies, Banks , Hospitals, Schools and many more institutions are using or will be using Machine Learning to make predictions about our daily lives.

Black Box Models: An incorrect prediction may affect individuals in an unintended and undesirable manner. If we project out into to the future with a hypothesis that more decisions will be made by machine learning models than humans, verifying machine learning models is an area that needs attention and innovation.

Explainable and Interpretable Machine Learning are active fields of research, where the algorithms try to explain the decisions made by black-box models such as a DNN or an SVM Classifier. These algorithms also compute the probability of the accuracy of the predictions, and weight them by individual features. Interpretable ML techniques provide insight into the ML models and the bias in the input training data. Some of these algorithms can help us explain and correct mis-predictions and introspect a Machine Learning model. Interpretable and verifiable ML algorithms are a key part of checks and balances required to democratize Machine Learning.

With the advent of new ML algorithms such as generative adversarial networks (GANS) , it is possible to skew the inputs just enough to get an incorrect prediction from a ML model. There will perhaps be many more advances in Machine Learning like GANS that will make verification of ML models a significant challenge. On the flip side, technologies such as RL and GANS can be used to verify complex systems, in a manner that is impossible by traditional methods.

Next Generation Verification: To democratize Machine Learning we need a verification methodology to verify Models and Algorithms produced by Software 2.0 and Software 1.0.

Impact of Democratizing Machine Learning

While it maybe non trivial to measure the impact of Machine Learning, many industry luminaries and academics have compared AI/ML to ‘electricity’ that will enable the 4th industrial revolution. Even if we were to discount the hyperboles describing AI and Machine Learning, it is still technology that will impact a majority of industries and many aspects of society. For instance, we may not have autonomous self driving cars (level 5) in the near horizon, contrary to the hype, but, we will still see human assisted self-driving cars , autonomous package delivery vehicles and fully autonomous golf carts in the near future.

The economic impact of Machine Learning will depend on how accessible , verifiable , interpretable and useable we make make these technologies for people to build upon. There are significant implications of how Machine Learning will impact many current jobs. Current tasks that require repetitive computer action, will be automated. Democratization of ML will lead to an understanding and acceptance of such technologies into our daily lives. This in turn may help alleviate the concerns of job displacement and will lead to new jobs that don’t exist today.

Democratizing Machine Learning will lead to solving of NP hard problems in deterministic compute time

An investor recently told me that Machine Learning is already democratized since his 11yr old son (6th grader) is building ML apps using Python and TensorFlow (this is very encouraging!). While his observation is accurate in multiple dimensions, I argue that we are at the very beginning stages of democratizing Machine Learning.

“mass production has not democratized fashion — yet!”

Democratizing Machine Learning to us means empowering people from all disciplines to solve problems that are currently deemed unsolvable.