DEFADAS
FPGA-Accelerated DNNs for Automotive Accelerators Systems
Summary
Autonomous vehicles are already a technical reality that is evolving very fast. The secret for their growing intelligence relies on the use of deep learning neural networks (DNNs) requiring the use of high performance, but energy-effective, computing platforms. Recently, reconfigurable logic devices, such as FPGAs, have provided means to meet such requirements and accelerate the execution of DNN.
What is more, these devices provide flexible hardware solutions that can be exploited for system improvement, error correction, or readjustments that may be required attending to foreseen or unforeseen vehicle operational conditions. However, the ever-growing needs for reducing costs and time-to-market in the automotive sector, and the human inability to offer complex and error-free solutions, pose a serious challenge to the provision of safe and secure FPGA-accelerated DNNs.
On the one hand, the goal of DEFADAS is to design and implement adaptive fault tolerance (FT) strategies based on the use of reconfigurable logic. It is not simply a matter of tolerating faults that may affect hardware accelerators, but also of considering their evolution to change both their functional capabilities and/or FT mechanisms, depending on the needs and without reconfiguring the entire system each time. In this way, we will be able to offer higher levels of protection to reduce the impact on DNNs processing of increasingly frequent hardware faults, especially those affecting FPGAs.
On the other hand, the adoption of novel adaptive FT mechanisms by the industry will largely depend on DEFADAS capacity to verify, and later certify, their correct operation under changing fault hypotheses and operational conditions. In this context, fault injection techniques are privileged assessment instruments. However, most existing solutions assume that hardware is immutable and, thus, they are not apt for DEFADAS research and should be revised and renewed. Likewise, it must be noted that resulting accelerators must not only be more dependable, but they must also feature a better throughput, and lower power consumption and cost. Therefore, dependability benchmarking solutions are required to enable the comparison and optimisation of available implementation alternatives.
All these challenges are aligned with those currently addressed by the ever-changing automotive industry. At the design level, the need for flexible and evolving hardware platforms has recently promoted the change of ownership of major reconfigurable logic manufacturers, and some companies, like Tesla, are already integrating FPGA-based chips for autonomous driving in its vehicles. However, provided FT mechanisms are still static and based the use of redundancy, so, DEFADAS has a great chance to innovate and offer more effective solutions based on partial dynamic reconfiguration. At the certification level, there is a need for existing standards, like ISO 26262, to evolve and take into consideration all current innovations in the field. When this happens, DEFADAS will be in unbeatable position to point out problems, provide solutions and propose certification strategies inspired in research carried out in fault injection and benchmarking.
This work has been supported by the Grant PID2020-120271RB-I00, funded by:
Publications:
- I. Tuzov, D. de Andrés, J. C. Ruiz, “Reversing FPGA architectures for speeding up fault injection: does it pay?“, EDCC 2022, European Dependable Computing Conference, Zaragoza, España, Septiembre 12–15, 2022, ISBN: 978-1-6654-7402-3, pp. 81-88.
- I. Tuzov, D. de Andrés, J. C. Ruiz, C. Hernández, “BAFFI: a bit-accurate fault injector for improved dependability assessment of FPGA prototypes“, DATE 2023, Design, Automation and Test in Europe Conference, Antwerp, Bégica, Abril 17–19, 2023, ISBN: 978-3-9819263-7-8.
- J. Gracia-Morán, L.J. Saiz-Adalid, “Análisis del impacto de la inclusión de Códigos Correctores de Errores en un Sistema Empotrado basado en Arduino“, VI Jornadas de Computación Empotrada y Reconfigurable (JCER2022), Jornadas SARTECO, pp. 713-718, ISBN: 978-841302185-0, Alicante, España, Septiembre 2022.
- J. Gracia-Morán, J.C. Baraza-Calvo, D. Gil-Tomás, P.J. Gil-Vicente, L.J. Saiz-Adalid, “Evaluación de un Microprocesador RISC con capacidad de tolerancia a fallos“, VI Jornadas de Computación Empotrada y Reconfigurable (JCER2022), Jornadas SARTECO, pp. 727-734, ISBN: 978-841302185-0, Alicante, España, Septiembre 2022.
- J. Gracia-Morán, A. Vicente-García, L.J. Saiz-Adalid, “Protección de comunicaciones entre vehículos autónomos mediante el uso de códigos de corrección de errores”, VII Jornadas de Computación Empotrada y Reconfigurable (JCER2023), Jornadas SARTECO, pp. 683-688, ISBN: 9788409544660, Ciudad Real, España, Septiembre 2023.
- Juan Carlos Ruiz García, David de Andrés Martínez y Joaquín Gracia Morán, “Evaluación de la robustez de una red neuronal desarrollada para generar un acelerador HW”, VII Jornadas de Computación Empotrada y Reconfigurable (JCER2023), Jornadas SARTECO, pp. 689-698, ISBN: 9788409544660, Ciudad Real, España, Septiembre 2023.
- J. Gracia-Morán, J.C. Ruiz-García, L.J. Saiz-Adalid, “Protección de comunicaciones entre vehículos autónomos mediante el uso de códigos de corrección de errores”, VII Jornadas de Computación Empotrada y Reconfigurable (JCER2023), Jornadas SARTECO, pp. 699-706, ISBN: 9788409544660, Ciudad Real, España, Septiembre 2023.
- J. Gracia-Morán, P. Martín-Tabares, C. Martínez-Ruiz, L.J. Saiz-Adalid, “Analysis of overheads caused by adding Error Correction Codes in Embedded Systems“, Workshop on Innovation on Information and Communication Technologies (WIICT 2022), pp. 8-15, ISBN: 978-84-09-46075-5, Valencia, España, Julio 2022.
- J. Gracia-Morán, L.J. Saiz-Adalid, J.C. Baraza-Calvo, D. Gil-Tomás, P.J. Gil-Vicente, “Tolerating Double and Triple Random Errors with Low Redundancy Error Correction Codes“, Workshop on Innovation on Information and Communication Technologies (WIICT 2022), pp. 16-30, ISBN: 978-84-09-46075-5, Valencia, España , Julio 2022.
- J. Gracia-Morán, L.J. Saiz-Adalid, J.C. Baraza-Calvo, D. Gil-Tomás, P.J. Gil-Vicente, “Comparison of the overheads provoked by the inclusion of different Error Correction Codes in Embedded Systems”, Workshop on Innovation on Information and Communication Technologies (WIICT 2023), Valencia, España, Julio 2023.
- D. de Andrés, J.–C. Ruiz, ” Hardware Accelerating a Convolutional Neural Network Using High-Level Synthesis”, Workshop on Innovation on Information and Communication Technologies (WIICT 2023), Valencia, España, Julio 2023.
- Juan Carlos Ruiz, David de Andrés, Luis José Saiz-Adalid and Joaquín Gracia-Morán, ““Zero-Space In-Weight and In-Bias Protection for Floating-Point-based CNNs”, Accepted, 19th European Dependable Computing Conference (EDCC), Leuven, Belgium, April 2024.
- J. Gracia-Morán, L.J. Saiz-Adalid, J.C. Baraza-Calvo, D. Gil-Tomás, P.J. Gil-Vicente, “A Proposal of an ECC-based Adaptive Fault-Tolerant Mechanism for 16-bit data words”, IEEE Latin American Transaction, Vol. 22, nº 5, pp. 418-427, May 2024.
- D. Gil-Tomás, L. J. Saiz-Adalid, J. Gracia-Morán, J. Carlos Baraza-Calvo and P. J. Gil-Vicente, “A Hybrid Technique Based on ECC and Hardened Cells for Tolerating Random Multiple-Bit Upsets in SRAM Arrays,” in IEEE Access, vol. 12, pp. 70662-70675, 2024, doi: 10.1109/ACCESS.2024.3402532.J. Gracia-Morán, L.J. Saiz-Adalid, “Protección mediante Códigos de Corrección de Errores de los pesos de una Red Neuronal implementada en Arduino”, VIII Jornadas de Computación Empotrada y Reconfigurable (JCER2024), Jornadas SARTECO, pp. 813-822, June 2024.
- Juan Carlos Ruiz, David de Andrés, Luis-J. Saiz-Adalid y Joaquín Gracia-Morán, “Tolerancia a fallos múltiples en redes convolucionales en coma flotante de 16 bits utilizando códigos correctores de errores”, VIII Jornadas de Computación Empotrada y Reconfigurable (JCER2024), Jornadas SARTECO, pp. 823-832, June 2024.
- J. Gracia-Morán, L.J. Saiz-Adalid, J. C. Ruiz-García, D. de Andrés Martínez, “Estudio de la confiabilidad de una red neuronal convolucional cuantizada”, VIII Jornadas de Computación Empotrada y Reconfigurable (JCER2024), Jornadas SARTECO, pp. 925-931, June 2024.
- J. Gracia-Morán, J. Bazán-Andría, Juan Carlos Ruiz, David de Andrés, L.J. Saiz-Adalid, “Analysis of the impact of faults in a Convolutional Neural Network implemented in a Raspberry Pi”, Proceedings of the Workshop on Innovation on Information and Communication Technologies (ITACA-WIICT 2024), pp. 29-36, ISBN: 978-84-09-66630-0, July 2024.
- J. Gracia-Morán, L.J. Saiz-Adalid, J.C. Baraza-Calvo, D. Gil-Tomás, P.J. Gil-Vicente, “Improving the efficiency of Matrix Codes using Hsiao Codes”, Proceedings of the Workshop on Innovation on Information and Communication Technologies (ITACA-WIICT 2024), pp. 12-28 ISBN: 978-84-09-66630-0, July 2024.
- Juan Carlos Ruiz, David de Andrés, Luis-J. Saiz-Adalid, and Joaquín Gracia-Morán, “In-Memory Zero-Space Floating-Point-based CNN Protection Using Non-Significant and Invariant Bits”, Ceccarelli, A., Trapp, M., Bondavalli, A., Bitsch, F. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2024. Lecture Notes in Computer Science, vol 14988, pp 3-17, Springer, Cham., September 2024.
- Joaquín Gracia-Morán, Juan Carlos Ruiz, David de Andrés, Luis-J. Saiz-Adalid, “Allocating ECC parity bits into BF16-encoded CNN parameters: A practical experience report”, 13th Latin-American Symposium on Dependable and Secure Computing (LADC 2024), pp. 75-80, ISBN: 979-8-4007-1740-6, 2024.
Tools and models:
All developed open source tools, libraries and models are publicly available at the Gitlab repository of DEFADAS (https://git.upv.es/defadas) under an MIT license.
- MiniLenetPython (https://git.upv.es/defadas/MiniLenetPython): Defines the PyTorch model of the LeNet-5 CNN (floating point and quantized versions), which can be trained and tested. Once trained, the weights and biases, as well as the output of all layers for a set of images (batch_size) and their associated predictions and labels can be exported to human readable text files. These files can be automatically processed to generate C++ header files to ease the implementation and testing of a C++ model of the CNN.
- CnnFaultInjectionLibracyC (https://git.upv.es/defadas/CnnFaultInjectionLibraryC): Library that provides the required functions to study the behaviour of CNNs described in C++ in the absence and in the presence of faults in the parameters that characterize their layers (weights, biases, zero points, …). How to create the C++ models for target CNNs is described in the accompanying README file. Several models already exist for the following CNNs:
- LeNet-5 (float version): https://git.upv.es/defadas/Lenet5Float
- LeNet-5 (quantized version- int8).
- LeNet-5 protected with a SEC(32, 26) ECC.
- LeNet-5 protected with a SEC(23, 18) ECC.
- LeNet-5 protected with a SEC(13, 9) ECC.
- LeNet-5 protected with a DEC(32, 21) ECC.
- LeNet-5 protected with a DEC(28, 18) ECC.
- LeNet-5 protected with a DEC(17, 9) ECC.
- The C++ CNN models assessed using the CnnFaultInjectionLibtaryC have been implemented using High Level Synthesis (HLS) tools on the Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC available at the Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit (https://www.xilinx.com/products/boards-and-kits/zcu104.html).
- LeNet-5 (float version) for HLS: https://git.upv.es/defadas/Lenet5FloatHLS
- LeNet-5 (quantized version- int8) for HLS.
- LeNet-5 protected with a SEC(32, 26) ECC for HLS.
- LeNet-5 protected with a SEC(23, 18) ECC for HLS.
- LeNet-5 protected with a SEC(13, 9) ECC for HLS.
- LeNet-5 protected with a DEC(32, 21) ECC for HLS.
- LeNet-5 protected with a DEC(28, 18) ECC for HLS.
- LeNet-5 protected with a DEC(17, 9) ECC for HLS.
- PyTorchFI (https://git.upv.es/defadas/PyTorchFI): Utility scripts to tests how to insert ECCs and manage fault injection campaigns in pre-trained models from PyTorch. Still under development, so it may be unstable.
Datasets:
All the collected data are available at the Zenodo repository following the FAIR principles.
- Robustness assessment of a C++ implementation of the LeNet-5 convolutional neural network (https://doi.org/10.5281/zenodo.10200323): Raw data obtained from running exhaustive fault injection campaigns on a C++ description of the LeNet-5 architecture for all considered fault models (single bit-flips, and single, double-adjacent, and triple-adjacent stuck-at-0 and stuck-at-1), targeting all considered locations (all bits from all weights and biases from all LeNet-5 layers) and for all the images in the workload (images 200-249 from the MNIST dataset).
- Robustness assessment of a C++ implementation of a quantized (int8) version of the LeNet-5 convolutional neural network (https://doi.org/10.5281/zenodo.10196616): Raw data obtained from running exhaustive fault injection campaigns on a C++ description of the quantized LeNet-5 architecture for all considered fault models (single, double-adjacent, and triple-adjacent bit-flips, stuck-at-0, and stuck-at-1), targeting all considered locations (all bits from all weights, biases, zero, zero points, and m, from all LeNet-5 layers) and for all the images in the workload (images 200-249 from the MNIST dataset).
- Determining non-significant bits on a C++ implementation of the LeNet-5 convolutional neural network to be used for storing error correcting codes to protect weights and biases. Robustness assessment of the network after integrating the proposed codes (https://doi.org/10.5281/zenodo.10201431): Raw data obtained from running directed fault injections campaigns on a C++ description of the LeNet-5 architecture to determine the least significant bits of weights and biases that could be used to store the proposed error correcting codes. Raw data obtained from statistical fault injection campaigns on the C++ description of the LeNet-5 architecture after deploying six different error correcting codes to protect weights and biases under diferent policies issuing from the previously obtained results.