Fine Grained Statistical Debugging for the Identification of Multiple Bugs

Download Full-Text PDF Cite this Publication

Text Only Version

Fine Grained Statistical Debugging for the Identification of Multiple Bugs

Pruthviraj P

Electrical And Electronics Enginerring PES Univesity

Bengaluru

Abstract:- Commercial software ships with undetected bugs despite the combined efforts of programmers, sophisticated bug detection tools and extensive testing. So, the identification and localization of the bugs in the software becomes essential issues in program debugging. Traditional software debugging is a difficult task to accomplish which requires a lot of time, effort and very good understanding of the source code. Given the scale and complexity of the job, automating the process of program debugging is very essential. Our approach aims at automating the process of program debugging. The earlier proposed approaches namely, statistical debugging and decision tree were able to identify only the most frequently occurring bugs and they failed to identify masked, simultaneously and non-frequently occurring bugs. We propose two approaches: one is decision tree based and the other uses bi-clustering for the task. The results obtained by our proposed approaches showed great improvements in the results in terms of purity, mis-classification rate and over splitting. Our proposed approaches were able to identify all the bugs present in the software including the masked and non- frequently occurring bugs.

Keywords Predicates, Predictors, Instrumentation, Purity, Mis- classification Rate, Oversplitting, Faulty Control Paths, Bug Localization, Bug Correction Introduction

  1. INTRODUCTION

    In a software cycle, more than 50% of the total cost may be spent on testing and debugging phases to ensure the quality of the software. So, developing effective and efficient debugging modules has become one of the important research topics. In standard terminology, program errors are defined as inappropriate actions committed by a programmer or a designer and bugs are the manifestations and results of errors during coding of the program. A program failure occurs when an unexpected result is obtained while executing the program due to the presence of bugs. In our setting, for the purpose of debugging, the programs are instrumented with predicates. A predicate is a boolean-valued expression over program variables. It can be a simple predicate or a compound predicate which is a boolean combination of simple predicates. We have used biclustering technique. The biclustering, co-clustering, or two-mode clustering is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix. Two major steps involved in program debugging process are: localizing and correcting bugs in the software.

      1. BUG LOCALIZATION AND CORRECTION

        Bug localization is a step towards automated debugging. Code which is not related with the bugs is filtered out and the remaining code is checked for the presence of bugs. Consider a huge software like Rhythmbox which is a graphical open source music player on Linux operating system which has

        approximately 60,000 lines of code. In such a huge software, if some error occurs, manual debugging becomes a tedious and very difficult job. It requires a lot of programmers time, effort and very good understanding of the source code. Effective bug localization techniques will save much of developers time and also they give insight on what caused the bug. In our setting, the bug localization module correctly finds bug predicting predicates. Then, the bug correcting module investigates the source code of the software to return the set of all faulty paths covered by these set of predicates. The set of faulty paths are then investigated to find out fault inducing transitions in the program and appropriate changes are made in the source code of the program to eliminate all of the bugs. The final bug correction is a manual process. The process of localizing the bugs is the most difficult among the two which needs to be automated. The focus of our work is to develop methods that automatically localize the bugs and thus help in debugging..

      2. Overview of the Framework

    Figure 1.1 shows the framework which we have used for program debugging. The debugging framework has several phases which are explained below. Instrumentation: This phase starts with a source-to-source transformation of a given buggy program. This transformation creates a lot of instrumentation sites in the program and instruments the program with predicates to collect data about the truth values of these predicates at particular program points. There are five categories of predicates that are looked for Avoid combining SI and CGS units, such as current in amperes and magnetic field in oersteds. This often leads to confusion because equations do not balance dimensionally. If you must use mixed units, clearly state the units for each quantity that you use in an equation.

    Figure 1.1: Program Debugging Framework

    • Branches: For every conditional statement two predicates indicating whether true or false branch was taken are tracked.

    • Loops: The predicate information related to while(condition), for(; ; ), do{}while(condition) loops is collected.

    • Returns: At each scalar returning function call site, six predicates are tracked indicating whether the return value is: 0, 0,=,, 0 or >0

    • Scalar Pairs: At each assignment statement x=< some_value>, identify each same-type in- scope variable yi and each constant expression cj . For each yi and cj , six predicates on the new value of x: <,

      0,=,, 0 or >0 are tracked. Each (x, yi) and (x, cj ) pair is treated as a new instrumentation site i.e. a single assignment statement is associated with multiple distinct instrumentation sites.

    • Pointer Null Test: Has two predicates and checks whether a pointer variable p_var==NULL or p_varNULL.

    During run time the truth values of these predicates are recorded. A feedback report is generated for each run of the program. Each feedback report is a binary vector with a bit reserved for each predicate indicating whether the predicate was observed or not. Each feedback report also has a bit indicating whether the run was successful or failed. Since the predicates instrumented are automatically generated, most of them do not provide any useful information regarding the bugs. So, the primary task is to select the most useful bug predicting predicates from the set of instrumented predicates.

    Shipping: The application is shipped to the end users who will then install the software on their systems based on the requirements. Feedback reports are collected and sent from the end users to the developers for both successful runs as well as for the failed runs. The feedback reports sent from the end users are stored in feedback database.

    Program Debugging: This is the heart of whole program debugging framework. It makes use of the feedback reports collected from thousands of users and uses data mining and statistical techniques to identify and localize the bugs in the software. Finally, it returns a set of predicates which are directly associated with bugs.

    Bug Localization: Program debugging phase will return a set of predicates which correctly identify the bugs. If the source code is huge, then the number of predicates instrumented will be high as well and the set of predicates returned by the program debugging phase will also be high. This again requires a lot of manual work to actually localize bugs. The bug localization phase uses data mining techniques to give a set of faulty paths coveredby these set of predicates containing fault inducing transitions in the program. This further reduces the amount of manual work needed for program debugging.

    Source Modification: Once the set of faulty control paths are found, appropriate changes are made in the source code to eliminate all the bugs and the software is redistributed. Now,

    new predicates may also be instrumented in the source code for observing changed behavior of the program.

  2. PROBLEM STATEMENT

    In the presence of multiple and simultaneously occurring bugs, the most frequently occurring bugs will dominate and they will mask other non frequently occurring bugs. Recent work in the field of program debugging aims at finding out the most frequently occurring bugs. The earlier approaches, namely statistical debugging and decision tree fail to identify multiple, simultaneously and non-frequently occurring bugs. Our current work aims at finding out all the bugs present in the software including multiple, simultaneously and non-frequently occurring bugs. Our approach makes use of data mining and statistical techniques in achieving the goal.

    We now introduce some terminology. P – Set of instrumented predicates

    R – Set of feedback reports where each report is a binary vector with a bit reserved for each predicate indicating whether the predicate was observed or not during the run.

    S/F – Each feedback report has a bit indicating whether the run was successful or failed. The final outcome of the debugging algorithm is a subset of the powerset of failed runs. Table 2.1 gives a sample set of feedback reports. Here, rows represent feedback reports, columns represent predicates and the last column represents the exit status of the program.

    We propose two algorithms: one is based on decision tree method and the other uses biclustering. The evaluation of the algorithms are done based on the following measures

    Table 2.1: Debugging Information

    Figure 2.1: Purity Checking

    Purity: Purity with respect to a set of failed runs tells whether the set has bugs from only one bug class. If the set has bugs from only one bug class, the set is treated as pure. Otherwise, it is treated as impure. In figure 2.1, Leaf2 is pure as it has reports only from F2, whereas Leaf3 is impure as it has reports from two bug classes namely, F1 and F3.

    Mis-classification Rate: Mis-classification rate with respect to a bug class gives the number of bug reports of the bug class wrongly classified as bug reports belonging to other bug classes. In figure 2.2, 70 reports from F1 and 30 reports from F2 are split across Leaf2 and Leaf3. There is a mis- classification of 20 reports of F1 and 10 reports of F2. Overall, there is a mis-classification of 30 reports out of 100 i.e. the mis-classification rate is 30%.

    Figure 2.2: Mis-classification Calculation

    Oversplitting: Oversplitting with respect to a bug class tells whether the bug reports of the bug class are distributed across several sets of failed runs or a single set. If the bug reports are distributed across two or more sets, then the bug class is said to be oversplit. Oversplitting measure (OM) is evaluated using the following simple formula.

    where, numSetsx denotes the number of sets of failed runs containing bug reports from the bug class x.

    If all the reports of the bug class are present in only one set, then its OM is zero which tells that the bug class is not oversplit. Otherwise, the bug class is said to be oversplit. In figure 2.2, each of bug classes F1 and F2 have an oversplitting measure of 1 which tells us that the bug classes have been oversplit, whereas in figure 2.1, each of the bug classes F1, F2, F3 have an oversplitting measure of 0 which tells us that the bug classes are not oversplit.

    2.3 Summary Of Results

    We carried out our experiments on Nokia Siemens suite which is available at Software Infrastructure Repository (SIR) and compared our results with earlier approaches namely statistical debugging proposed by Liblit and decision tree methods. The results obtained from these two approaches clearly showed the need of postprocessing so as to increase the purity of partitions, reduce the rate of mis-classification of bugs and to avoid oversplitting.

    The two proposed approaches: one based on decision tree method and the other using biclustering showed a great

    reports from a particular bug class can get distributed across several sets of failed runs signifying that they were bug reports from a single bug class but possibly triggered by different conditions in the program. The predicates returned by our approach were able to localize all the bugs present in the sofware including masked and non-frequently occuring bugs which statistical debugging and decision tree failed to find out.

    Table 2.2 summarizes the results obtained using statistical debugging and one of our proposed approach namely postprocessing using decision tree based method. Clearly, we can see that there is a drastic reduction in the mis-classification rate and most of the partitions are pure. Oversplitting measure signifies us that a bug can get triggered by different conditions in the program. So, its bug reports gets distributed across multiple bug partitions. SD in Table 2.2 represents statistical debugging technique and FPR represents one of our proposed approach for postprocessing, which is a decision tree based method. We have also conducted experiments using decision tree, postprocessing using biclustering. The detailed results of all the experiments are shown in results section.

    Table 2.2: SD Importance(P) vs Phase2 FPR

  3. PROPOSED ALGORITHMS

    1. Terminology

      Input to the debugging framework consists of a set of instrumented predicates and feedback reports from the end users. The notations used in the description of our approach are summarized in Table 3.1.

      Output from the debugging module is of the form: A = {(P1, F1),(P2, F2), …,(Pl , Fl)}

      s.t i, j, i j, Pi Pj = and Fi Fj = and i, Pi P and Fi RF . i, i 1 l,

      Fi represents the bug i and Pi represents the set of predicates which can identify and localize the bug i correctly. For every i, Pi represents a set of paths in the programs control flow graph where the bug Fi can be found.

      Figure 3.1 represents the final output in a decision tree format.

      improvement in the results. There was a drastic reduction in the

      mis-classification rate and most of the bug partitions were pure. The results showed a slight oversplitting. As each bug can get triggered by different control flow paths in the program, bug

    2. Algorithms

Phase 1, Statistical Debugging:

Algorithm 1 is the statistical debugging technique used for program debugging. It tries to associate each failed run with a predicate, which can identify the particular failed run. The predicate is said to be the predictor of the failed run.

Table 3.1: Notations and their meaning

Figure 3.1: Final output in Decision tree format.

Statistical debugging uses 3 different measures to rank predicates. They are listed below. p P, Rank(p)=

The statistical debugging approach proposed by Liblit is a decision list based technique. A decision list is a list of nodes, where each node is formed based on the decision taken at the earlier node.

In algorithm 1, a simple iterative elimination approach is used. This forms the phase 1 of our proposed approach.

The steps of the algorithm are explained below.

  1. The predicates are ranked based on Importance(p) measure. The predicates with

    Figure 3.2: The way reports are split based on p.

    Increase(p) 0 have no bug predicting power. So, all such predicates are eliminated. The top ranked predicate is then picked. Assume p as the top ranked predicate.

  2. Split sets Fi and Si (set of failed and successful runs in ith iteration of the algorithm), each into two sets LFi , RFi and LSi RSi respectively based on the top ranked predicate p. LFi and LSi are the failed and successful runs respectively having predicate p to be true. They form left child of the partition i. RFi and RSi are the failed and successful runs respectively having predicate p to be false. They form right child of the partition i. The top ranked predicate p is the bug predictor for the partition Fi . Figure 3.2 shows how the set i is split based on predicate p.

  3. Repeat steps 1-2 on right child of the partition i until the set of failed runs becomes empty or the set of predicates becomes empty.

  4. All the leaves are marked for postprocessing during phase 2.

Phase 1, Decision Tree:

. It uses information gain measure to rank the predicates. The goal of the decision tree is to get the maximum utility. In our case, the utility is to get as many pure bug partitions as possible. The steps of the algorithm are explained below.

  1. The predicates are ranked based on information gain measure. Assume p as the top ranked predicate.

  2. This step is same as step 2 of Algorithm 1.

  3. Repeat steps 1-2 on both left as well as the right child of partition i until the set of predicates becomes empty or the stopping criterion is met.

The above algorithms fail to capture multiple, simultaneously and non-frequently occurring bugs. They try to find only the most frequently occurring bugs. As already mentioned, in the presence of multiple and simultaneously occurring bugs, the most frequently occurring bugs will dominate and they will mask other non frequently occurring bugs. So, there is a need for postprocessing to find out multiple, simultaneously and non-frequently occurring as well as the masked bugs.

We propose two postprocessing approaches. They are described below.

Phase 2, Frequent Predicate Ranking:

Algorithm 2 is the first proposed approach for postprocessing. The following algorithm is a decision tree based technique which uses F(p) measure as the ranking function to capture all the masked and non-frequently occurring bugs

All the leaves marked for phase 2 in Algorithm 1 are processed here. The steps of the algorithm are explained below. For every leaf,

  1. If the users choice is to retain the predicates, to allow predicates used on the other subtree to be picked again as the top predicates in this subtree, then the procedure mark predicates is called. It marks the predicates on the path from leaf to root as used and all other predicates as unused. If the users choice is to discard predicates, a predicate once used is permanently eliminated from the set of predicates.

  2. The predicates are ranked based on F(p) measure. Assume p as the top ranked predicate.

  3. Split partitions Fi and Si each into two partitions LFi , RFi and LSi , RSi respectively based on the top ranked predicate p. LFi and LSi are the failed and successful runs respectively having predicate p to be true. They form left child of the partition i. RFi and RSi are the failed and successful runs respectively having predicate p to be false. They form right child of the partition i. The top ranked predicate p is the bug

    predictor for the partition Fi . Both, the left as well as the right child of the partition i are marked for further postprocessing.

  4. Repeat steps 1-2 on both left as well as the right child of partition i until the set of predicates becomes empty or the stopping criterion is met. In this approach height is used as the stopping criterion. The maximum height allowed is maximum height of statistical debugging approach plus (3 to 4 additional height). We chose 3 to 4 as the additional height because, the case study which we have used for conducting our experiments had multiple bugs, the number varying from 4 to 5 and three to four iterations of our proposed approach at each leaf node was able to identify the bugs.

  5. Finally, at each leaf partition i we will obtain the form (Pi , Fi) using sets Pi and Fi , where Pi contains all the predicates on the path from partition i to the root and Fi contains the set of failed reports at the leaf partition i which usually are bug reports from a single bug class. Predicates in the set Pi identify and localize the bug Fi correctly.

Phase 2, Bi-clustering:

Algorithm 3 is the second proposed approach for postprocessing. It tries to associate the set of failed runs with the set of predicates which can identify those runs correctly. It employs biclustering technique to simultaneously cluster the set of failed runs and the set of predicates. The final output is a set of clusters where each of the cluster has bug reports from a single bug class and the cluster also has predicates identifying the reports of the bug class correctly

All the leaves marked for phase 2 in Algorithm 1 are processed here. The steps of the algorithm are explained below. For every leaf,

  1. Use biclustering to cluster the set of failed runs and the set of predicates simultaneously. It uses only Fi , the set of failed runs at leaf partition i and only those predicates which have Increase(p) > 0 and are not used anywhere before as the top ranked predicates. As the information regarding the number of bugs and their types is not available, assuming there can be atmost two bug classes at each of the leaf partition, the number of clusters is restricted to two.

  2. The two clusters returned will form the left and right child of the partition i namely LFi and RFi . The set of failed reports in Fi which does not belong to any of the two newly formed clusters will form the partition WFi .

  3. For each cluster, pick the predicate with maximum Increase value as the representative of the cluster.

  4. Finally, at each leaf partition i we will obtain the form (Pi , Fi) using sets Pi and Fi , where Pi contains all the predicates on the path from partition i to the root and Fi contains the set of failed reports at the leaf partition i which usually are bug reports from a single bug class. Predicates in the set Pi identify and localize the bug Fi correctly.

Biclustering algorithm takes binary matrix Mmn and N, the number of clusters desired and returns N different clusters. The algorithm is described below.

  1. Using first row as a template, first the set of columns of input matrix M i.e. the set of predicates P is divided into two subsets PU and PV . The predicates having value 1 in the first row are moved to set PU and those predicates having the value 0 are moved to set PV .

  2. Now, the rows of matrix M i.e. set of runs R are rearranged: first come all the runs that have predicates only from set PU , then those runs that have predicates from sets PU and PV , and finally the runs that have predicates from set PV only. The corresponding sets of runs RU , RW and RV in combination with PU and PV forms submatrices U and V

  3. The idea behind the algorithm is to divide the input matrix into three matrices, one of which contain only 0-cells. So, it is discarded. The algorithm is then recursively applied to the remaining two submatrices U and V; the recursion ends if the current matrix represents a bicluster, i.e., contains only 1s. If U and V do not share any rows and columns of M, i.e., RW is empty, the two matrices can be processed independently from each other. However, if U and V have a set RW of rows in common, special care is necessary to only generate those clusters in V that share at least one common column with PV .

  4. The parameter Z serves this goal. It contains sets of columns that restrict the number of admissible clusters. A bicluster (R, P) is admissible, if (R, P) share one or more columns with each column set P + in Z, i.e., p+ Z : P P+ .

    Table 4.2: Dataset

    Table 4.3: Importance(P) output

    Table 4.4: Decision Tree output

  5. Finally, the program returns N number of clusters

4. NUERICAL EXPERIMENTS AND RESULTS Table 4.1 provides the implementation details. The process

of instrumentation and collecting feedback reports is fully automated. Statistical Debugging is the bug isolation project proposed by Liblit & Co. Decision Tree is the standard decision tree method which uses information gain measure for ranking. Frequent Predicate Ranking and Postprocessing using Bi-Clustering are our proposed approaches.

Table 4.1: Implementation Details

Table 4.5: Phase2 FPR output

Table 4.6: Phase2 BC output

Table 4.7 summarizes the results obtained using all the four approaches

6. REFERENCES

  1. ANSI/IEEE. IEEE Standard Glossary of Software Enginnering Terminology. IEEE Std 729-1983. IEEE, New York, 1983.

  2. Rhythmbox: http://developer.gnome.org/rhythmbox/.

  3. SIR: Software-artifact Infrastructure Repository. http://sir.unl.edu/portal/ index.html.

  4. T. Ball, M. Naik, and S. K. Rajamani. From symptom to cause: localizing errors in counterexample traces. In POPL: Principles of Programming Languages, pages 97-105, 2003.

  5. R. Caruana and A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Machine Learning, Proceedings of the Twenty-Third Interna tional Conference (eds. W.W. Cohen and

    Table 4.7: Summary of Results

    5. CONCLUSIONS

  6. A. Moore) 161-168 (ACM, New York, 2003).

    Y. Cheng and G. M. Church. Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB00), pages 93-103, 2000.

    We have described a suite of instrumentation and analysis techniques for diagnosing bugs in widely deployed software. The earlier approaches namely, statistical debugging and decision tree failed to capture the masked and non-frequently occurring bugs. We have proposed two

  7. T. M. Chilimbi, A. V. Nori, and K. Vaswani. Quantifying the effectiveness of testing via efficient residual path profiling. In FSE: Foundations of Software En gineering, pages 545-548, 2007.

  8. T. M. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. HOLMES: Effective statistical debugging via efficient path profiling. In Proceedings of the International Conference on Software

    postprocessing approaches for the task: one is based on decision tree method and the other uses biclustering . The results obtained showed great improvements in the results

  9. Engineering (ICSE), May 2009.

    H. Cleve and A. Zeller. Locating causes of program failures. In ICSE: International Conference on Software Engineering, pages 342-351, 2005.

    in terms of purity, mis-classification rate and oversplitting. Our proposed approaches were able to identify all the bugs present in the software including the masked and non- frequently occurring bugs. Our proposed approaches uses decision tree and biclustering methods. For biclustering , methods like Non-negative matrix factorization can be thought as a good alternative.

  10. L. Jiang and Z. Su. Context-aware statistical debugging: From bug

    predictors to faulty control flow paths. In ASE08: Automated Software Engineering, pages 184-193, 2007.

  11. A. Klivans and R. Servedio. Toward attribute-efficient learning of decision lists and parities. In Seventeenth Annual Conference on Computational Learning Theory (COLT), 2004, pp. 234-248.

  12. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, volume 13, pages 556-562, 2001.

  13. B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In J. James B. Fenwick and C. Norris, editors, Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI-03), volume 38, 5 of ACM SIGPLAN Notices, pages 141- 154. ACM Press, 2003.

  14. B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistiscal bug isolation. In PLDI: Programming Language Design and Implementation, pages 15-26, 2005.

  15. B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Instrumentation and sampling strategies for cooperative concurrency bug isolation. In OOPSLA 2010.

Leave a Reply

Your email address will not be published. Required fields are marked *