VerifAI’s 2022 Verification Challenge was a great success. The response was amazing! The number of participants exceeded our expectations!
We built the Verification Challenge WebApp using VerifAI’s Machine Learning Platform, with Zero lines of Code!
The Verification Challenge App demonstrates how Machine Learning (in particular Reinforcement Learning) can learn to generate inputs to a Simulator and perform significantly better than Random.
With VerifAI’s platform, you can build your own Verification WebApp’s for each of your unit/block-level Tests or Regression, to improve coverage and reduce simulation time.
The answer to the question above is, Yes ! We can do significantly better than Random and humans in lesser time, using techniques such as Reinforcement Learning.
The Objective of this Verification Challenge is for participants to use innovative Machine Learning techniques to speed up verification and find bugs faster. The goal in this challenge is to maximize the average FIFO depths in a MESI Cache Controller Design. There are 4 FIFO Queues in this Cache Controller, one for each CPU. Each FIFO queue can hold up to 16 entries. The goal is to maximize the number of entries in each FIFO. Simply put, the higher average FIFO depth across all 4 queues the better are our chances of finding hard bugs. Participants can tune the Machine Learning Hyper-parameters to increase the FIFO depths. Finding bugs faster and Speeding up Verification, reduces costs and improves time to market significantly.
The graphs below shows the FIFO depths reached after each iteration. Each iteration in this case consists of 120 simulations. As we can see, each iteration produces a higher coverage (higher FIFO depths) with the same number of simulations. VerifAI's ML model learns from each iteration to do better.
The Histograms below shows the distribution of the average FIFO depths. The goal is to shift the distribution to the right, to get higher occurrence of larger FIFO depths in a predictable manner, with each iteration.
A Cache Controller makes sure that the data in the Cache and Main Memory are consistent (coherent). MESI (Modified Exclusive Shared Invalid) is a Cache Coherence Protocol widely used in Cache Controllers!
The job of a FIFO Queue is to store instructions or data in a queue, and service them later in the order they were entered into the queue.
Filling up the FIFO queues activates parts of the design that are normally inactive. Bugs in deep states of the MESI Cache Controller design can only be found if the FIFO queues are filled. Therefore, maximizing FIFO depth , can find bugs that are normally hard to find.
It is hard to fill FIFO queues with random instructions and knob settings. A typical UVM Design Verification (DV) Methodology generates random instructions, and may set random address ranges to try and fill the FIFO queues.
In this example, a DNN (Deep Neural Network) learns from an Initial Random (UVM) simulation and then generates the next set of instructions and addresses for the simulations. You can tune the hyper-parameters for the DNN so it learns to find the right knobs (features) that make the biggest difference in increasing the FIFO depth. After you tune the hyper-parameters, just click 'Optimize' and wait for the results. The higher your score the better.
In each iteration, input knobs are fed into the open-source Verilator simulator as stimulus, the simulator produces and output and we measure the associated FIFO depths reached for each CPU. These outputs and the inputs that were fed into the simulator are now fed into the VerifAI Optimizer, which is a Neural Network that predicts the next set of knobs for the simulator. In each iteration, the VerifAI Neural Network learns which input stimuli to the simulator produces the best output, highest FIFO depths in this particular case.
VerifAI Optimizer Flow for Cache Controller Design
The design is an open-source MESI Cache Controller, that has 4 CPU's.The Cache Controller design is an open source design from opencores.org and its licensed under LGPL. The Cache Controller design is shown in Figure 1.The controller supports up to four CPUs. Each CPU has its own FIFO of depth 16. FIFO's are used to resolve load-store conflicts. For example, if multiple CPUs try to access the same address at the same time, the CPU with higher priority gets the access and the CPU with lower priority will insert the access request into its FIFO queue. These requests are serviced at a later time. It is hard to fill FIFO queues with random traffic, since only address conflicts cause entries to be inserted into the queues.
In this experiment, we use the open source simulator Verilator that drives the VerifAI Machine Learning based Optimizer. The Optimizer produces the next set of knobs for the simulator to increase the FIFO depths.
The DV Knobs that are simulator inputs are shown below. VerifAI’s Machine Learning Model, learns to generate the best input settings to maximize the FIFO depths:
As mentioned above the goal of this experiment is to tune the initial knobs to generate a weighted random distribution of stimulus for the simulator. The Hyper-parameters are used to tune the DNN (Deep Neural Network) to produce the highest average FIFO depths. Each iteration should move the histograms distributions to the right, such that there are higher number of occurrences for higher FIFO depths.
The final score of your results is calculated as a weighted sum of the histogram distribution. This histogram has 16 bins: An example calculation:
The graphs below shows the FIFO depths reached after each iteration. Each iteration in this above case consists of 120 simulations.
At VerifAI we believe that using Machine Learning in every step of the verification process , can speed up Software and Hardware verification by greater than 50%.
For more information , please visit www.verifai.ai or email hello@verifai.ai.
The Problem
The tremendous advances in Integrated Circuit (IC) Design has brought us great products over the last decade. These advances in IC's have also increased the complexity of Design Verification significantly. Design Verification (DV), the process of verifying that an IC functions as intended, takes up more than 50% of the time and cost of designing an IC (Reference: Research study by Siemens). Costs of DV are increasing, and, the time-to-market for new IC projects are slipping due to DV. To meet the growing demand for IC's we need to find innovative ways to speed up verification and reduce the associated costs. Additionally, as the research highlights, DV requires a significant amount of engineering talent and, the demand for DV Engineers grew at a 6.8% CAGR. There are not enough DV engineers being produced to meet this demand. Using innovative Machine Learning approaches presents significant opportunities to accelerate innovation in DV.
The Objective for the DV Challenge
The Objective of this DV Challenge is for participants to use innovative Machine Learning techniques to speed up verification and find bugs faster. The goal in this challenge is to maximize the average FIFO depths in a MESI Cache Controller Design. There are 4 FIFO Queues in this Cache Controller, one for each CPU. Each FIFO queue can hold up to 16 entries. The goal is to maximize the number of entries in each FIFO. Simply put, the higher average FIFO depth across all 4 queues the better are our chances of finding hard bugs. Participants can tune the Machine Learning Hyper-parameters and DV Knobs (settings) to increase the FIFO depths. VerifAI's Machine Learning Platform helps DV Engineers speed up verification. Finding bugs faster and Speeding up DV, reduces costs and improves time to market significantly.
What is a MESI Cache Controller
A Cache Controller makes sure that the data in the Cache and Main Memory are consistent (coherent). MESI (Modified Exclusive Shared Invalid) is a Cache Coherence Protocol widely used in Cache Controllers!
What is the FIFO (First In First Out) Queue
The job of a FIFO Queue is to store instructions or data in a queue, and service them later in the order they were entered into the queue.
Why fill up FIFO Queues
Filling up the FIFO queues activates parts of the design that are normally inactive. Bugs in deep states of the MESI Cache Controller design can only be found if the FIFO queues are filled. Therefore, maximizing FIFO depth , can find bugs that are normally hard to find.
How to fill up the FIFO Queues
It is hard to fill FIFO queues with random instructions and knob settings. A typical UVM DV Methodology generates random instructions, and may set random address ranges to try and fill the FIFO queues.
In this example, a DNN (Deep Neural Network) learns from an Initial Random (UVM) simulation and then generates the next set of instructions and addresses for the simulations. You can tune the hyper-parameters for the DNN so it learns to find the right knobs (features) that make the biggest difference in increasing the FIFO depth. After you tune the hyper-parameters, just click 'Optimize' and wait for the results. The higher your score the better.
Flow: What are we actually doing ?
In each iteration, input knobs are fed into the open-source Verilator simulator as stimulus, the simulator produces and output and we measure the associated FIFO depths reached for each CPU. These outputs and the inputs that were fed into the simulator are now fed into the VerifAI Optimizer, which is a Neural Network that predicts the next set of knobs for the simulator. In each iteration, the VerifAI Neural Network learns which input stimuli to the simulator produces the best output, highest FIFO depths in this particular case.
Figure-1: VerifAI Optimizer Flow for Cache Controller Design
More about the MESI Cache Controller
The design is an open-source MESI Cache Controller, that has 4 CPU's.The Cache Controller design is an opensource design from opencores.org and its licensed under LGPL. The Cache Controller design is shown in Figure 1.The controller supports up to four CPUs. Each CPU has its own FIFO of depth 16. FIFO's are used to resolve load-store conflicts. For example, if multiple CPUs try to access the same address at the same time, the CPU with higher priority gets the access and the CPU with lower priority will insert the access request into its FIFO queue. These requests are serviced at a later time. It is hard to fill FIFO queues with random traffic, since only address conflicts cause entries to be inserted into the queues.
In this experiment, we use the open source simulator Verilator that drives the VerifAI Machine Learning based Optimizer. The Optimizer produces the next set of knobs for the simulator to improve the FIFO depth.
Figure-2: Cache Controller with 4 CPU Ports
DV Knobs to Tune
The DV Knobs exposed in this design are shown below. You can set these percentages to generate the initial random knob settings for the simulations. The knob settings are to create the initial randomized settings that are used as the inputs to the simulator.
Hyperparameters to Tune
Results: What to expect
As mentioned above the goal of this experiment is to tune the initial knobs to generate a weighted random distribution of stimulus for the simulator. The Hyper-parameters are used to tune the DNN (Deep Neural Network) to produce the highest average FIFO depths. Each iteration should move the histograms distributions to the right, such that there are higher number of occurrences for higher FIFO depths.
The final score of your results is calculated as a weighted sum of the histogram distribution. This histogram has 16 bins: An example calculation:
The Histogram below shows the distribution of the average FIFO depths. The goal is to shift the distribution to the right, to get higher occurrence of larger FIFO depths.
Feature Importance Plot shows the relative importance of the Knobs and their effect on the FIFO Depths
FIFO Challenge WebPage
The tremendous advances in Integrated Circuit (IC) Design has brought us amazing products over the last decade. These products have permeated into our daily lives in more ways than we had ever imagined and now are an integral part of our day. The advances in IC's have also increased the complexity of Design Verification (DV) significantly. Design Verification, the process of verifying that an IC functions as intended, takes up more than 50% of the time and cost of designing an IC*.
Costs of DV is increasing, and, the time-to-market for new IC projects are slipping due to DV. To meet the growing demand for IC's we need to find innovative ways to speed up verification and reduce the associated costs. Additionally, as the research highlights, DV requires a significant amount of engineering talent and, the demand for DV Engineers grew at a 6.8% CAGR*.
Source: *2020 Wilson Research Group Functional Verification Study - Harry Foster, Siemens EDA
There are not enough DV engineers being produced to meet this demand
With the advent of Machine Learning techniques, there have been multiple instances of Machine Learning engineers doing better than Domain Experts on hard problems. For instance, Machine Engineers from Google Deep Mind (AlphaGO) beat the world GO champion Lee Sedol . This was an unprecedented achievement for ML Engineers, with little domain knowledge of the game GO.
Another stunning example of ML engineers doing better than Domain Experts is in the field of protein folding, where Google Deep Mind (Alpha Fold), a ML program predicted protein structures better than any domain expert.
Yet another example of ML engineers doing better than Domain Experts is in the area of Chip Floorplanning on an IC. A recently published paper in nature by Google researchers, demonstrates how a ML model did better than Domain experts.
It is rather unusual that all these achievements mentioned of ML engineers doing better than domain experts, come from one company, namely Google!
There are a growing number of examples of Domain Experts who are using Machine Learning to do better in their domains. For instance, researchers at Univ of Washington developed a ML model that does as good or better than Google Deep Mind (Alpha Fold).
Another example of Domain Experts doing better with ML: Researchers at the Center for Computational Imaging and Personalized Diagnostics at Case Western Reserve University have showed that the inclusion of hand-crafted features derived from deep understanding of the problem domain in conjunction with a ML model, significantly outperform more traditional approaches for lung cancer classification on CT scans and also for predicting cancer outcomes from digital pathology images.
Challenging fields such as Hardware and Software Verification will require domain experts to embrace Machine Learning techniques and for Machine Learning engineers to attack Verification problems from a vastly different perspective.
DV Engineer 2.0 is a metaphor for Domain Experts who use Machine Learning to do better in their domains, and vice-versa, where Machine Learning Engineers do better than domain experts in their domains.
Machine Learning and Software 2.0 will play an oversized role in helping the DV Engineer 2.0 to take on the challenge of reducing the cost and time of verification, while challenging the traditional Software 1.0 stack.
We have demonstrated the value of using Machine Learning
techniques on industrial designs to do significantly better than
constrained random methods. Using RL for DV can save months of verification resource for
Coverage Improvement and Early Bug Finding. DNN’s provide a mechanism to fit non-linear functions that can
mimic complex behaviors - even of a simulator.
As integrated circuits have become progressively more complex, constrained random stimulus has become ubiquitous as a means of stimulating a designs functionality and ensuring it fully meets expectations. In theory, random stimulus allows all possible combinations to be exercised given enough time, but in practice with highly complex designs a purely random approach will have difficulty in exercising all possible combinations in a timely fashion.
As a result it is often necessary to steer the Design Verification (DV) environment to generate hard to hit combinations. The resulting constrained-random approach is powerful but often relies on extensive human expertise to guide the DV environment in order to fully exercise the design.
As designs become more complex, the guidance aspect becomes progressively more challenging and time consuming often resulting in design schedules in which the verification time to hit all possible design coverage points is the dominant schedule limitation. This paper describes an approach which leverages existing constrained-random DV environment tools but which further enhances them using supervised learning and reinforcement learning techniques.
This approach provides better than random results in a highly automated fashion thereby ensuring DV objectives of full design coverage can be achieved on an accelerated timescale and with fewer resources.
Two hardware verification examples are presented, one of a Cache Controller design and one using the open-source RISCV-Ariane design and Google’s RISCV Random Instruction Generator.
We demonstrate that a machine-learning based approach can perform significantly better on functional coverage and reaching complex hard-to-hit states than a random or constrained-random approach.
The entire article is below:
The phrase ‘Democratizing Machine Learning’ is being used by many companies small and large, including us @VerifAI and in this article.
Definitions
de·moc·ra·tize — democratizing:
— introduce a democratic system or democratic principles to.
“public institutions need to be democratized”
— make (something) accessible to everyone.
“mass production has not democratized fashion”
In the context of this article, the second definition is more fitting than the first.
Every company may have a different notion of what they mean by Democratizing Machine Learning and AI.
To make a figurative analogy: A democratic government may give its people the right to vote, but perhaps the voting ballots are hard to decipher.
Such a country would still be a democracy in principle, but in practice this would be a democracy that doesn’t serve all its constituents well.
For example, a country that educates people as to how to use polling booths or makes the ballots very simple, in addition to easy access, would serve all its constituents well.
Similar figurative analogies apply to democratizing Machine Learning.
While companies may provide access to ML tools and API’s, they may not provide a level of abstraction or ease of use that makes sense for a developer or a professional who may not be a Machine Learning expert.
To truly democratize machine learning, would mean that a wide range of professionals (including, but not limited to, developers, engineers, scientists, marketeers, accountants, radiologists, cardiologists, law enforcement officers, judges, lawyers and other domain experts) could use Machine Learning to improve their productivity. To Democratize Machine Learning, we need to make Machine Learning not only accessible to everyone but also easily useable by everyone.
We can broadly segment Machine Learning techniques into branches called Supervised Learning, Unsupervised Learning and Reinforcement Learning. Democratizing each of these branches of Machine Learning presents their own challenges.
Supervised learning is currently the most widely used form of machine learning, across all industry segments, and, one of the most common applications of Supervised Learning is Classification.
Let take a closer look at what it means to Democratize Machine Learning in the context of a building a Classification app.
Classification is one of the most widely used Supervised Machine Learning techniques. This includes classifying images, text and other kinds of data. According to a McKinsey report, Classification techniques will have global economic impact of $2T on multiple industry segments. This number maybe hard to fathom , but, what this says is that Classification of data (images, text, voice) is being used widely across many industry segments, embedded in many ‘killer apps’ and workflows.
In traditional software (Software 1.0), classification is done by writing an algorithm to classify an image, for instance. These algorithms need to be continuously updated to catch ‘unseen’ or ‘different’ conditions. In contrast Machine Learning uses example data to produce a ‘classification algorithm’ that can be generalized to classify unseen conditions quite accurately (e.g. new images).
Classification use case: An interesting classification problem in the banking/lending industry is to assess the current risk of an active loan, based on historical data and the consumer’s profile and behavior. Given a user profile , and data about the loan terms etc., the Machine Learning model, can classify a particular customer’s loan to be safe, risky or unsafe.
A comprehensive data-set was published by LendingClub, a peer-to-peer lending company, the dataset contained about a million active loans, and each loan was described in terms of around 52 features (variables). These features include information about the loan, such as the term of the loan, interest rate, loan amount, loan purpose etc. The data also contained features about the customer such as address , annual income, last payment, months since last delinquent payment , total late fees etc.
Democratizing Classification for the bank loan analyst would be to provide the analyst software that would produce the highest accuracy machine-learning model and predict a customers loan to be safe, risky or unsafe.
Choosing the right features to include in the Machine Learning model is critical to the accuracy of the Classifier.
A good effort towards democratizing Classification would be to automate the following steps: Feature Selection, Feature Mapping and Model Creation.
Automatic Feature Selection algorithms are a key part of democratizing machine learning. For the example LendingClub data, the original number of features were 52 (of which 35 were numerical and 18 were categorical).
The final number of features selected, that produced the highest accuracy Classifier was 14. There were a number of features excluded : 37 (of which 24 were numerical and 13 were categorical).
VerifAI’s algorithms excludes redundant features automatically, to produce the highest accuracy Classifier, for the lending-club data. The loan analyst (domain expert) should not have to know which features to exclude or include to achieve a high accuracy model. Software algorithms need to do a good job doing Automatic Feature Engineering.
Given a set of features that are categorical, numerical and ordinal, VerifAI's Analyzer find the optimal set that leads to the highest accuracy ML Model.
Automatic Feature Selection using VerifAI-Machine Learning Platform (VerifAI-MLP)
Automatic Feature Mapping is another key set of techniques that transforms the input user data into a form (numerical values) that the Machine Learning Algorithms can understand. Automatic Feature Mapping hides the complexity and details of data transformation required before feeding the data into a DNN (Deep Neural Network) or other machine learning models.
For instance the feature loan_term column may have values such as “60 months”, “42 months” , “36 months” etc. These strings values are automatically encoded into numerical values a machine learning model can understand, using a feature hasher, factorizer, one hot encoding or other encoding algorithms . The model’s accuracy is dependent on how the inputs are encoded and interpreted by the machine learning algorithms, thus it is important to map and encode the features accurately to input into a ML model. ML engineers spend a lot of time mapping features into usable inputs for models. Automatic Feature Mapping (AutoMapper) is an important step towards democratizing ML.
Given a set of features that are categorical, numerical and ordinal, Automatically map them to features that are readily usable by a DNN
Automatic Model Selection and Creation is the next important step towards democratizing Machine Learning. Model selection is a complex process that ML engineers learn over time with experience. This process comprises of choosing the best-fit ML model for a given data-set, and then tuning the hyper-parameters for the chosen model to improve accuracy and reduce over-fitting.
For a Classification problem, there can be many models we can use: For instance: SVM (Support Vector Machines), DecisionTrees, Random Forests, DNN etc.
An Example of Types of Classifiers
Very low levels of abstraction can inhibit and delay innovation
The first time I looked at TensorFlow, it reminded me of assembly language or C !
Even though these languages are powerful, due to their low abstraction level, they are also extremely tedious to program and accomplish even small tasks. This lack of abstraction can pose many challenges and barriers to rapid innovation
The level of abstraction matters
If researchers and developers today were to write FFT (Fast Fourier Transforms) algorithms in assembly language or C and not use a software packages such as Matlab , Mathematica: DSP chips (digital signal processors) would not be able to translate even simple words, let alone, paragraphs and music. In addition, perhaps the DSP chip, that is present in every mobile device (+Alexa , Google home etc.), would cost $5000 and not $1.50 cents .
One of the key elements that allowed the rapid innovation in the area of DSP’s over the last two decades, was the level of abstraction that Matlab, Mathematica and other similar software packages provided to perform complex computations such as an FFT, with simply one line of code:
Y = fft(X)TensorFlow 2.0
TensorFlow, while powerful with all the great algorithms embedded needs to be augmented with additional layers of abstraction to scale…
Even though TensorFlow 2.0 is a significant leap forward in raising the abstraction level using the Keras API’s, it still does not sufficiently reduce the complexities of Feature Engineering to build accurate Machine Learning Models. It is thus very hard to use for Non-Machine Learning domain experts and other professionals, which makes one wonder if the non ML experts are simply, ‘not the target audience’ for TensorFlow 2.0.
Automatic Feature Engineering v/s TensorFlow 2.0 Linear Model on the Titanic Dataset
TensorFlow 2.0 describes a linear classifier example of the Titanic dataset. The goal is to build a linear classifier to predict if a person survived the journey on the Titanic or not, based on a set of input features . The dataset contains categorical and numerical features. To deal with this in TensorFlow2.0, there is a lot of code that needs to be written, to build a simple linear classifier model. This also requires a deep understanding of Machine Learning models, Feature Engineering and TensorFlow.
VerifAI Machine Learning Platform: Automatic Feature Engineering
VerifAI's Automatic Feature Engineering is a set of algorithms that transform the input data into a form (numerical vectors) that the Machine Learning Algorithms can understand. Our Automatic Feature Engineering hides the complexity and details of data transformation required before feeding the data into a DNN (Deep Neural Network) or other machine learning models.
For instance the categorical feature columns in the Titanic dataset such as {‘fare’ , ‘sex’, ‘n_siblings_spouses’, ‘class’ , ‘deck’ ,’ embark_town’ and ‘alone’ } are automatically encoded into numerical values a machine learning model can understand, using a feature hasher, factorizer, one hot encoding or other encoding algorithms . The model’s accuracy is dependent on how the inputs are encoded and interpreted by the machine learning algorithms, thus it is important to map, encode, transform and combine the features accurately to input into a ML model.
ML engineers spend a lot of time mapping features into usable inputs for models. Our Automatic Feature Engineering (AutoMapper) is an important step towards simplifying & democratizing ML, making it available to all.
The VerifAI Machine Learning Platform allows developers to build Classifiers, Regressors and Reinforcement Learning Algorithms with just a few lines of code or no code at all.
VerifAI's Design Verification Challenge #1 started on Feb-22-2021 and ended on April-10-2021.
The goal of the Challenge is to increase coverage on a Cache Controller design by tuning Machine Learning Hyper parameters and DV settings (knobs) online!
The Cache Controller is a widely used hardware design block that copies data and code from main memory to the cache memory.
The challenge allows participants to the come up with the best knob settings and hyper-parameters that would lead to the highest coverage on this Cache Controller Design.
The participants used their knowledge of Machine Learning and/or DV to tune the knobs to get the best scores!
The Numbers
The Challenge was an amazing success, with a large group of participants from a wide range of companies and universities.
We were surprised to see such high engagement numbers for a specialized field such as IC Design Verification.
Gamification of ML and DV
This level user engagement can be attributed to Gamification and the competitive nature of the participants.
Comparing our engagement times to large sites is not an apples to apples comparison, we are simply highlighting the potential of gamification for hard esoteric problems such as Design Verification, IC design etc.
Gamification can be a catalyst for collective innovation, even for complex and hard problems such as Verification.
CrossTraining
Many participants were ML engineers, DataScientists and DV engineers. Most of them either knew ML or DV and not both domains.
The challenge enabled the ML engineers to learn about DV, and gave the DV engineers insights into ML.
The benefits of using ML for verification in this challenge were far beyond just improving coverage on the Cache Controller.
In summary this ML DV Challenge was an amazing experience for the VerifAI team, and also the participants!
Stay tuned for VerifAI's upcoming ML and DV Challenges!