🎉 Big news: LightlyTrain now supports DINOv2. Read our announcement.

Embedded COVID mask detection on an Arm Cortex-M7 processor using PyTorch

How we built a visual COVID-19 mask quality inspection prototype running on-device on an OpenMV-H7 board and the challenges on the way.

Share blog post

TL;DR

How we built a visual COVID-19 mask quality inspection prototype running on-device on an OpenMV-H7 board and the challenges on the way.

‍

TLDR; The source code to train and deploy your own image classifier can be found here: https://github.com/ARM-software/EndpointAI/tree/master/ProofOfConcepts/Vision/OpenMvMaskDefaults

‍

In the summer of 2020, we worked with Arm to build an easy-to-use tutorial on how to train and deploy an image classifier on an Arm microcontroller. In this post, we show how we approached and solved the following challenges:

Convert a PyTorch ResNet to TensorFlow and quantize it to use 8-bit integer values
Collect, select, and annotate data of faulty and non-faulty masks
Use self-supervised pre-training to boost model performance when working on fewer images.

‍

The Results to Expect

The goal of this project was to show an end-to-end workflow on how to train and deploy a convolutional neural network to an OpenMV-H7 board.

The video below showcases how our classifier detects faulty masks in real-time.

The OpenMV-H7 Board

The board consists of an STM32H743VI Arm Cortex-M7 processor running at 480MHz, multiple peripherals, and a camera module mounted on it.
The camera module has an OV7725 sensor from OmniVision and can record in VGA resolution (640x480) at 75 FPS.

Since the board has limited computing power and memory, we aimed for a very small deep learning model. We call the variant ResNet-9 since it’s more of a cut in half ResNet-18 variant. Below you can find some numbers about the model configuration, runtime, and other metrics.

Input size: 64x64x3
CPU Freq.: 480 MHz
Operations: 33.4 MOp
Model size: 90 kBytes
Inference Time: 150 ms
Operations/s: 249 MOp/s

Detailed specs can be found on the official website of OpenMV here.

A close-up picture of the OpenMV H7 Board we used.

Data Collection

Neural networks are very data-hungry. In order to efficiently collect enough training data we did the following:

We used the camera on the OpenMV-H7 board to record video sequences. With the USB interface and the OpenMV IDE, we were able to easily record the camera stream and save it as a video file.
To simulate a real production line we mounted the camera on cardboard to make sure the camera is stable. The optics point to the production line which is a metal plate with tall borders. This setup ensures, that the camera sees defect and non-defect masks within the same environment.
Finally, we moved masks through our inspection line using a combination of push and pull.

A picture of our data collection pipeline. We cut a small hole into the cardboard to clamp the USB table holding the board into it.

Data Selection and Annotation

At this stage we have multiple video files, each having captured a few minutes. The next challenge is to extract the frames and annotate the data. We use FFmpeg for the frame extraction and Lightly to select a diverse set of frames. Note that we had more than 20k frames but no time to annotate all of them. Using Lightly we selected a few hundred frames covering all relevant scenarios.
Lightly uses self-supervised learning to get good representations of the images. It then uses these representations to select the most interesting images which should be annotated. The benefit of this method is that we can access the pre-trained model and fine-tune it on only a handful of labeled images.

Example images taken with the OpenMV H7 camera showing the three labels for the data. From left to right: good mask, defect mask, no mask.

Model Fine-Tuning
To prevent the model from overfitting, we simply froze the pre-trained backbone and added a linear classification head to the model. We then trained the classifier for 100 epochs on a total of 500 annotated images.

‍

From PyTorch to Keras to TensorFlow Lite

Moving the pre-trained PyTorch model to TensorFlow Lite turned out to be the most difficult part of our endeavor.

‍

We tried out several tricks with ONNX to export our model. A simple library called pytorch2keras worked fine for a model only consisting of linear layers but not for our conv + linear model.

	import torch
	from pytorch2keras import pytorch_to_keras

	dummy_input = torch.ones(1, 3, 64, 64)

	# pytorch to keras
	keras_model = pytorch_to_keras(
	classifier,
	dummy_input,
	[(3, 64, 64)],
	change_ordering=True
	)

	keras_model.save('keras_model.h5')

view raw pytorch_classifier_to_keras.py hosted with ❤ by GitHub

The main problem we encountered, was that PyTorch uses the CxHxW (channel, height, width) format for tensors whereas TensorFlow uses HxWxC. This meant that, after transforming our model to TensorFlow Lite, the output of the layer just before the classifier was permuted, and hence, the output of the classifier was incorrect. In order to address this problem, we considered manually permuting the weights of the linear classifier.

	# permuting the weights of the last linear layer
	# allows the model to have the same outputs eventhough
	# pytorch and keras perform a different kind of flatten.
	def get_permutation(h, w, c):
	indices = []
	for i in range(h):
	for j in range(w):
	for k in range(c):
	indices.append(ih + j + kh*w)
	return np.array(indices)

	permutation = get_permutation(1, 1, 16)
	# get weights of last layer (linear layer)
	temp_weights, temp_bias = k_model.layers[-1].get_weights()
	# rearrange the weights using the permuatation
	temp_weights = temp_weights[permutation]
	k_model_perm.layers[-1].set_weights((temp_weights, temp_bias))

view raw permute_weights_keras.py hosted with ❤ by GitHub

However, we decided to go for a simpler solution. We pooled the output of the last convolutional layer into a Cx1x1 shape. That way, changing the order of the channels does not affect the output of the neural network.

‍

The final step is to quantize and export the Keras model to TensorFlow Lite. In our case quantization reduces the model size and speeds up running the model in inference at the cost of a few percent lower accuracy.

	import tensorflow.compat.v1.lite as lite

	# initialize converter
	converter = lite.TFLiteConverter.from_keras_model_file('keras_model.h5')
	# we want to use 8-bit quantization
	converter.inference_type = lite.constants.QUANTIZED_UINT8
	input_arrays = converter.get_input_arrays()
	converter.quantized_input_stats = {
	input_arrays[0]: (
	-25, # we experimented with various values and +-25 worked best for our model
	25,
	)
	}
	converter.default_ranges_stats = (
	0,
	255,
	)

	# converte and save checkpoint
	tflite_model = converter.convert()
	open('quantized_model.tflite', 'wb').write(tflite_model)

view raw keras_to_tflite.py hosted with ❤ by GitHub

Special thanks to our collaborators at Arm and Philipp Wirth from Lightly for making this project possible. The full source code is available here. You can easily train your own classifier and run it on an embedded device. Feel free to reach out or leave a comment if you have any questions!

‍

Igor, co-founder
Lightly.ai

‍

See Lightly in Action

Curate data, train foundation models, deploy on edge today.

Book a demo

Get Started with Lightly

Talk to Lightly’s computer vision team about your use case.

Book a Demo

Stay ahead in computer vision

Get exclusive insights, tips, and updates from the Lightly.ai team.

Embedded COVID mask detection on an Arm Cortex-M7 processor using PyTorch

Table of contents

Share blog post

The Results to Expect

The OpenMV-H7 Board

Data Collection

Data Selection and Annotation

From PyTorch to Keras to TensorFlow Lite

See Lightly in Action

Get Started with Lightly

Stay ahead in computer vision

Related Articles

Reinforcement Learning Guide: Algorithms, Applications, and Techniques

Understanding 3D Object Detection And Its Applications

Scanning Luggage for Dangerous Items with LightlyTrain