Paper Details
Reference:
Alex Depoutovitch and Michael Stumm,
"Software error early detection system based on run-time statistical analysis of function return values",
In Proceedings of the First Workshop on Hot Topics in Autonomous Computing, Dublin, Ireland, June, 2006, pp. 17–21.
Download:
Abstract:
Large software systems are extremely complex and based on code that is constantly changing with bug fixes and new features. As a result, these systems will likely never be free of bugs. The bugs typically don't expose themselves until they are triggered by a new workload, and when triggered, they are rarely immediately fatal, but result in a system that continues to run with corrupt internal state, deteriorating over time to the point where it becomes inoperable. Having a method to identify corrupt state early would allow the initiation of defensive actions such as flushing page caches or redirecting external requests to another service in the cluster.
In this paper, we propose a statistical method of detecting problems in software at run-time based on analyzing function return values. The methodology, at this time, requires the availability of source code, but does not require understanding the source code. Our experimental results indicate that our method can be effective in identifying problems early on, potentially allowing for defensive measures. The overhead is negligible at less than 1%.
Keywords:
fault prediction, fault tolerance, autonomic computing
Reference Info:
ACMid: 1973396
BibTeX:
@inproceedings(Depoutovitch-HotACI06, author = {Alex Depoutovitch and Michael Stumm}, title = {Software error early detection system based on run-time statistical analysis of function return values}, booktitle = {Proceedings of the First Workshop on Hot Topics in Autonomous Computing}, location = {Dublin, Ireland}, month = {June}, year = {2006}, pages = {17-21}, keywords = {fault prediction, fault tolerance, autonomic computing} )