Soft Error Characterization on Scientific Applications

Ozturk Z., TOPCUOĞLU H. R. , Arslan S., Kandemir M. T.

16th IEEE Int Conf on Dependable, Autonom and Secure Comp/16th IEEE Int Conf on Pervas Intelligence and Comp/4th IEEE Int Conf on Big Data Intelligence and Comp/3rd IEEE Cyber Sci and Technol Congress (DASC/PiCom/DataCom/CyberSciTech), Athens, Greece, 12 - 15 August 2018, pp.592-599 identifier identifier


Decreasing transistor sizes, aggressive power optimization techniques and higher operation frequencies lead to increase error rates. While researchers addressed reliable computing, there is still lack of study for providing the fundamental understanding of error propagation. In this work, we characterize error propagation at software level by utilizing error propagation speed metric. It is validated on a set of commonly used iterative solvers, where the speed of error propagation is modeled for the different input-output pairs. Additionally, we study two different methods for slowing down the error propagation. Firstly, the effect of various algorithmic choices of sorting on error propagation profiles is examined whether such choices have an impact on error propagation profiles. Experimental results show that sorting algorithms differ in error propagation patterns. Secondly, different loop transformation techniques are considered for slowing down the error propagation speed. Specifically, while loop tiling causes a significant change in error propagation, the impact of loop unrolling is negligible for the given applications.