20th IEEE International Conference on Dependable, Autonomic and Secure Computing, 20th IEEE International Conference on Pervasive Intelligence and Computing, 7th IEEE International Conference on Cloud and Big Data Computing, 2022 IEEE International Conference on Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2022, Falerna, İtalya, 12 - 15 Eylül 2022
Vulnerability of soft errors initiates various fault tolerance techniques on modern computing systems which can be implemented at hardware and software layers. While the fault tolerance techniques can improve the reliability, they introduce additional costs which may not be tolerable for some systems. There are several studies in the literature that target to reduce such additional costs. In this study, we monitor the soft error propagation throughout the execution and propose simple and relatively inexpensive methods to slow down the error propagation curves. Matrix multiplication is considered as the target multi-threaded application where we utilize parallelization-based versions including changing the number of threads and loop parallelization options. The fault injection experiments reveal that the utilized methods reshape the error propagation curves effectively. They can reshape the error propagation at runtime, where switching between different versions during operation helps balance reliability and performance and use the limited resources more efficiently at the same time.