Automated Design of Tiny Machine Learning Models: A Practical Guide (Part 1)

Danilo Pietro Pau, Danil Zherebtsov, Dmitriy Proshin, and Andrey Korobitsyn
July 15, 2022


The authors will review the current state of tiny machine learning (TinyML) at the community level, also introducing and elaborating on the main practitioners' needs to develop ML models, as well as discuss in detail how the technology driving TinyML is rapidly evolving to be available to the community such as introducing advanced tools to achieve automated deployment.

Given its length, this article will be split into two parts; this is the first half, and the interested readers are referred to the second half in the next issue of the IEEE IoT newsletter.

The Internet of Things (IoT) is a rapidly growing market that is expected to reach $1.39 trillion by 2026, according to the Mordor Intelligence forecast[1]. The upsurge in the number of IoT devices exponentially drives the increased volumes of processed data and demand for low-latency processing. Furthermore, since IoT devices embody several heterogeneous sensors, they spend loads of energy transferring sensor data and require many resources for centralized traffic storage and processing. These factors lead to a significant increase in IT infrastructure costs that overload most organizations. As a result, the IoT infrastructure market size was predicted to grow at more than 25% CAGR from 2017 to 2024[2]. Moreover, such centralization of functions will not scale and therefore meet the expectations of the IoT market forecast.

This current situation forces companies to search for more efficient solutions, and a growing number of organizations enthusiastically joined the TinyML community formed by the TinyML Foundation in 2019. As a result, the global community has expanded to more than six thousand practitioners representing 27 countries on all continents. One of the main goals of the TinyML community today is to promote TinyML capabilities into the developer world and shift the existing mindset of end-users by showcasing that it is possible to successfully run a deep learning model on memory-constrained hardware, even using only 8-bit precision or less, without compromising accuracy.

The Gap That Prevents Mass TinyML Adoption

To understand the main obstacles to designing resource-constrained ML solutions productively, let us look at the industry landscape. The Global Developer Population and Demographic Study by Evans Data Corporation[3] revealed that currently, the developer community numbers about 26.9 million software developers worldwide, which is expected to grow up to 28.7 million by 2024. The global community can be divided into four main subgroups: C/C++/C#, Python, Java, and Artificial Intelligence (AI) developers. Being the smallest community of all (around 350.000 specialists), AI developers are required for most of the process during ML model creation, while the largest C/C++/C# The embedded developers subgroup fails to dive deep into AI-powered techniques due to the acute lack of time and expertise. They could invest their time in learning AI skills, but, unfortunately, it will take years they do not have. Considering the forecast above, this has created a major gap in the success of ML applications in the market that needs to be addressed urgently.

The traditional methodologies for creating Artificial Neural Networks (ANN) require a lot of knowledge and complex expertise since AI practitioners need to manually choose appropriate layers for creating machine learning (ML) models, optimize them by adjusting many hyperparameters, and compress models after training to achieve a tiny model size and evaluate quality which is often an iterative process which must be executed multiple times until project requirements are met. Therefore many embedded developers have provoked the following questions that need to be addressed: how to reduce the expertise gap and empower the whole community to successfully build TinyML models and run operations on low-power memory-constrained microcontrollers (MCU)? How to dramatically increase productivity, introduce automation, and shorten development time?

According to the above questions, there are five main requirements that embedded developers must meet to implement TinyML projects at scale, briefly introduced below.

  • Interoperability: the community needs to find a common, agreed, and shared way to exchange information between tools and generalize the approach to creating TinyML projects.
  • Scalability: solutions shall be deployed and run on many devices and sensors instead of serving only a few unique devices or projects.
  • Automation: implementing tools to streamline the model creation process and obtaining its optimal structure, including deployment on the target device, is vital for embedded specialists without deep ML expertise.
  • Productivity: it is critical to implement user-friendly tools and methods that do not require months of training while increasing the speed of project deployment while eliminating error-prone hand-crafting.
  • Lifecycle model management: when implementing TinyML projects, embedded specialists shall interpret and evaluate the quality of a model profile in real-time on a microcontroller even after implementation.

End-to-End Methodology for AI-Driven Development

The development of AI-driven IoT applications based on the TinyML approach should be quick to adopt and cost-effective. Let us describe step-by-step the pipeline:

  1. Define the task and collect data. Firstly, a practitioner should define the problem to solve, depending on the target IoT application, and prepare the necessary data for model training. If the practitioner does not have a suitable dataset with historical data, he can rely on open-source datasets to initiate ML project execution. However, if the case is too specific, he will have to prepare data independently, even generating them synthetically.
  2. Clean and label data. Data preprocessing becomes a critical step since inaccurate data can greatly reduce prediction accuracy. If the practitioner does not use an ML as a service platform, he needs to address missing values, fix or remove incorrect or duplicate data, normalize them, apply domain transformation as needed, and add labels to provide a context within a dataset manually.
  3. Search for a model and train it. The search and the training steps are usually performed on powerful computing platforms with virtually unlimited memory and computational power. The result is an ML model able to recognize certain patterns. However, often they are not deployable because of their complexity.

For TinyML to be deployed on MCUs, models shall meet the following requirements:

  • RAM: much less than 1MB; FLASH: less than 2MB
  • Energy: mW scale, battery to last for years;
  • Processor: 10s - 100s MHz, at most;
  • Cost: low to enable massive deployment.

For TinyML to be deployed inside the same sensor package, models shall meet the following requirements:

  • Total available memory: 40 KiB
  • Energy: µW scale;
  • Processor: 5 - 10 MHz, often much less;
  • Cost: ultra-low to enable massive deployment.

In the following, we will present an alternative approach to automatically creating compact neural networks that can be embedded in MCUs and sensors with the characteristics described above.

  1. Convert Neural Network (NN) into optimized code. Most embedded engineers face the issue of hand-crafting C code and validating it with ad-hoc procedures, which are laborious, time-consuming, and error-prone. To avoid these bottlenecks, ST developed the X-CUBE-AI technology[4] that allows an engineer to accurately and automatically convert pre-trained ML models into optimized C code to be run, validated, and profiled on STM32 MCUs.
  2. Deploy the application in the field. As a final step, embed the tiny model into an MCU or a sensor integrated into the IoT application for test-driving. This approach may require multiple iterations back to step 2 (enlarging the dataset), step 3 (to design and train alternative models), etc. These multiple iterations are why the embedded developer community recommends embracing automation to increase productivity, shortening the development time.

Drawbacks of the Traditional Approach to Building TinyML Solutions

The ML domain is rich in terms of cloud-based and open-source solutions, including Neural Architecture Search (NAS) methods, AutoML tools, and NN frameworks, which allow building machine learning models, often resulting in a high level of accuracy. As for TinyML tasks, they usually require additional methods which allow the practitioners to build models of minimal size and without loss of accuracy. However, before considering an alternative approach, let us look at some of the considerations for leveraging existing solutions.

  • Achieving high accuracy often comes at the expense of size. There are many Neural Architecture Search (NAS) methods that help to optimize model metrics, such as:
    • Bayesian Optimization (Sequential Model-Based Optimization) is the de-facto standard used in many tools for global optimization of black-box functions;
    • Reinforcement learning (RL) is an ML method that enables learning in an interactive environment and aims to find a proper action model;
    • Evolutionary method helps to solve model optimization problems in a stochastic manner;
    • Gradient-based optimization requires a gradient to identify appropriate search directions and ensure better designs during optimization iterations;
    • Random/Heuristic search methods that do not require gradient optimization.

From an industry perspective, the number of tools that support the aforementioned methods is sizable (e.g., Roy Tune, Scikit-Optimize, Microsoft NNI, Google Vizier, AWS Sage Maker). Speaking of the most supported frameworks, the most popular frameworks are held by TensorFlow, Keras, MXnet, and PyTorch. The main challenge here is that most tools focus on finding the best metric, while TinyML tasks require building models of minimal size without losing accuracy.

  • Model building requires manual labor and profound expertise. As the traditional process of TinyML model creation is laborious and time-consuming, it often requires vast data science knowledge. Speaking from the embedded developers' perspective, a huge productivity gap exists during deployment caused by the need for hand-crafting and validating the solution after its creation. Most AI specialists use ML frameworks such as TensorFlow that build accurate models. However, they are typically inferred on virtually unconstrained processors, cannot be deployable on MCUs, and, even more challenging, on sensors that are even more constrained in memory, computational power, and energy consumption. In other words, since this step cannot be overlooked, practitioners have to spend much time finding ways to validate, optimize and compress models.
  • Models require compression after training. Model compression is an inevitable step to reducing the size of huge and resource-intensive models and achieving their successful deployment and functioning on the targeted hardware where computing and power resources are severely constrained. The most widely used compression techniques are pruning, quantization, and distillation. Many users adopt TensorFlow Lite, which deals with pruning and quantization methods, including post-training quantization, in-training quantization, post-training pruning, and post-training clustering. The model size optimization demands profound technical knowledge and can have possible implications while running inference on tiny devices with optimized 8 and 16-bit support.

In the second part of this article which will be published in the next issue of the IEEE IoT newsletter, the authors will describe the alternative automated pipeline for TinyML design, covering deployment optimization for STM32 MCUs, and illustrating the proposed methodology with real use cases developed with Neuton TinyML and X-CUBE-AI.







Danilo Pau

Danilo Pau graduated in 1992 from Politecnico di Milano, Italy. One year before, he joined STMicroelectronics, where he worked on HDMAC and then MPEG2 video memory reduction, video coding, embedded graphics, and computer vision. Today, his work focuses on developing solutions for deep learning tools and applications. Since 2019 Danilo is an IEEE Fellow; he served as Industry Ambassador coordinator for IEEE Region 8 South Europe, was vice-chairman of the "Intelligent Cyber-Physical Systems" Task Force within IEEE CIS, was IEEE R8 AfI member in charge of internship initiative, a Member of the Machine Learning, Deep Learning, and AI in the CE (MDA) Technical Stream Committee IEEE Consumer Electronics Society (CESoc) and currently is AE of IEEE TNNLS. In addition, he wrote and achieved on behalf of ST, the IEEE Milestone on Multiple Silicon Technologies on a chip, 1985, which was granted in 2021. With over 81 patents, 120 publications, 113 MPEG authored documents, and more than 47 invited talks/seminars at various worldwide Universities and Conferences, Danilo's favorite activity remains mentoring undergraduate students, MSc engineers, and Ph.D. students from various universities in Italy, the US, France, and India.


Andrey KorobitsynAndrey Korobitsyn is the founder and CEO at Neuton.AI, a San Jose-based provider of tiny machine learning solutions for edge devices. ​​With over 20 years of experience in the IT business, Andrey oversees global development strategy and execution, which has enabled the company to grow from a startup to a steady market player. Andrey holds a Master's degree from the Moscow Engineering Physics Institute and a Ph.D. in Computer Science. He is also an alumnus of Stanford University Graduate School of Business and the American Institute of Business and Economics.


Dmitry ProshinDmitry Proshin graduated from the Penza State Technical University in 1997 with a degree in Information Processing and later received a Ph.D. in technical sciences. After graduating, he worked in the software development field to automate production processes at NPF KRUG-SOFT LLC. At Bell Integrator, LLC, he became interested in developing intelligent algorithms in the banking sector. He has developed and implemented more than five unique subsystems and algorithms. In addition, he has been part of the Neuron neural network framework development team. Dmitry is the author of more than 150 works, including 25 articles, 5 monographs, and 6 manuals on topics such as mathematical methods of information processing, modeling, programming, software systems, SCADA, MES, and ERP systems, and educational methodology. He has received 4 state registration certificates for his developments and two patents.


Danil ZherebtsovDanil Zherebtsov is a full-stack machine learning engineer with over 8 years of experience in the field. Before joining the Neuton team in 2019, he executed various end-to-end complex machine learning projects in multiple domains: telecom, networking, retail, manufacturing, marketing, fraud detection, oil and gas, engineering, and NLP. As a head of Machine Learning & Analytics at Neuton, Danil is working on the development of an automated TinyML platform facilitating sensor and audio data processing. Danil is an active contributor to the open-source community, with over 60,000 active users of tools featured in his repository. As an inspired writer, Danil publishes articles popularizing data science and introducing new methodologies for solving engineering tasks.