Analyzing fault behavior of shared data in parallel applications


Oz I.

MICROPROCESSORS AND MICROSYSTEMS, vol.45, pp.67-80, 2016 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 45
  • Publication Date: 2016
  • Doi Number: 10.1016/j.micpro.2016.03.014
  • Title of Journal : MICROPROCESSORS AND MICROSYSTEMS
  • Page Numbers: pp.67-80

Abstract

Multicore architectures are becoming the most promising computing platforms thanks to their high performance. The soft error rate in multicore systems increases by the trend in the transistor sizes and the reduction of the voltage of the transistors. Evaluating the impact of soft errors on parallel applications is critical to understand the fault characteristics and to decide the fault tolerance strategies for the reliable execution. In this paper, we examine the soft error vulnerabilities of shared data in parallel Java applications. To analyze fault behavior of shared data in parallel programs, we design and implement a bytecode instrumentation based analysis and fault injection framework. We evaluate the fault behavior of shared data fields on a set of parallel applications from NAS benchmark suite. Our experimental evaluation demonstrates data type and access characteristics of the shared fields, and shows that shared data structures of parallel applications are more vulnerable to soft errors. While error rates for unshared local data stay around 20% in our target applications, the rate for shared data exceeds above 30% for some applications. We further discuss potential directions of our results and how shared data analysis can be employed to apply partial fault tolerance techniques. (C) 2016 Elsevier B.V. All rights reserved.