Reliability Estimation and Analysis of DDL MYSQL Server by using Generalized Gamma and Weibull Distribution

DOI : 10.17577/IJERTV3IS21496

Download Full-Text PDF Cite this Publication

Text Only Version

Reliability Estimation and Analysis of DDL MYSQL Server by using Generalized Gamma and Weibull Distribution

M. Y.

Haggag

Heba Nagaty Mohamed

Department of Mathematics, Department of Mathematics Faculty of Science, Al-Azhar University, Faculty of Science, El fayoum University

Abstract – The time between failures for different Operating Systems (Windows and Linux) of DDL MYSQL open source data base server are analyzed and compared. The purpose of this study is to estimate and compare the reliability of two Operating Systems (Windows and Linux) of DDL MYSQL server by using Generalized Gamma and Weibull Distribution which are the best distributions in their rankings. In the result the Reliability Estimation of two Operating Systems are evaluated and compared theoretically and graphically.

  1. INTRODUCTION

    Software reliability [1] is one of the important parameters of software quality. It is defined as the probability of failure- free software operation in a specified environment for a specified period of time. An earlier researcher as [3] has studied Reliability Estimation and Analysis of Linux Kernel and [4] has studied Estimation and Analysis of MYSQL Database Server Reliability using Beta and Generalized Gamma Distribution.

    Maximum Likelihood Estimation .Section 5 show the proposed Methodology Used Weibull++ tool [7] for mathematical and statistical calculation. Section 6 shows the Reliability Evaluation of two Operating Systems. Finally conclusion and References of this paper are shown.

  2. BACKGROUND

    Software Reliability is defined as: the probability of failure- free software operation for a specified period of time in a specified environment. There are two approaches for prediction of Software Reliability: early stage where reliability estimated during design phase and later stage where reliability estimated during operational stage.

    Software Reliability depends upon failure data of the software. Failure behaviour can be represented by various manners such as Probability Density Function (PDF) and Cumulative Distribution Function (CDF) which is derived from PDF and is given by equ. (1):

    The Mysql database [5] has become the most popular open source database in the world because of its high

    F (t) =P(x t) = t

    f x dx

    (1)

    performance, high reliability and ease of use. The purpose of this study is to compare between the DDL server database in operating systems (Windows and Linux), Where DDL (Data Definition Language) [2] is a language used by a database management system (like Mysql)that allows users to define the database and specify data types,

    There for f(t) is the rate of change of F(t). If the random variable T denotes the failure time then F (t) is the probability that the system will fail by time t. Then F(t) is the unreliability function and R(t) is the Reliability function and given by equation (2):

    structures and constraints on the data.

    R (t) =1- t

    f x dx

    (2)

    To find reliability two types of data can be used: time between failures and fault count. In case of time between failures the input parameter of study is the intervals of successful operations. A probability distribution model

    Another function that can be derived from PDF is the failure rate function (Hazard Rate Function) which is defined by equation (3):

    whose parameters are estimated by using appropriate mathematical technique reflects the pattern of these

    = f(t) =

    R(t) 1

    f(t)

    t

    f x dx

    (3)

    intervals. In case of fault count the input parameter of

    study is the number of faults in a specified period of time rather than the times between failures.

    In this paper we discussed in the following sections: Section

    2 provides some mathematical background of Reliability estimation. Section 3 concentrates on Bug Collection, Bug pre-processing and Bug analysis. Section 4 discussed Goodness of fit test and Parameter Estimation by using

  3. BUG COLLECTION, BUG PRE-PROCESSING AND BUG ANALYSIS

The approach to the Reliability estimation of the two operating systems Linux and Windows consists of three steps:

  1. Bug Collection: is associated with collecting failure data extracted directly from the following web site http://www.mysql.Bugs.org and Bugs of operating

    system Linux are collected from 5/7/2004 to 19/8/2013 and Bugs of operating system Windows are collected from 14/4/2004 to 11/8/2013.

  2. Bug preprocessing: in this step such noises are removed.

  3. Bug analysis: the preprocessed data is stored in MYSQL database, where MYSQL is an open source database system.

    Before applying Goodness of fit test on data collected for each operating system bug frequency corresponding to time

    Maximum Likelihood Estimation

    Maximum Likelihood Estimation [8] is used to estimate distribution parameters by maximizing the value of Likelihood function. This Likelihood function is based on the probability density function (PDF) for a given distribution, i.e. if (PDF) is f (xi, 1, 2,, k ) ,where x represents the data (times-to-failure) and 1, 2,, k are the parameters which is to be estimated. Then Likelihood function is given by equation (4):

    to failure in month is plotted and show in Figure 1, Figure 2. The total of Bugs recorded for Windows are 39, and for

    n

    L =

    i=1

    f (xi, 1, 2, , k)

    (4)

    Linux are 75.

    Where n: is the number of failure data points. Then taking

    log-likelihood function which is defined by equation (5):

    100

    BUG

    50 FREQUENCY

    0

    0 10

    i=1

    ln L = n ln f (xi, 1, 2, , k ) (5)

    Finally, parameters are estimated by using the following partial derivatives given by equation (6):

    50

    40

    30

    20

    ln L = 0, j = 1,2, , k (6)

    MONTH

    j

    Figure 1. Monthly Bug Frequency of Windows

    100

    BUG

    50 FREQUENCY

    0

    0 20

    MONTH

    60

    80

    40

    Figure 2. Monthly Bug Frequency of Linux

  4. GOODNESS OF FIT TEST AND PARAMETER ESTIMATION

    The Goodness of fit test is used to identify whether of the following distributions which are the most commonly used life distributions is suitable for collecting data or not.

    • 1 and 2 parameter exponential distributions.

    • 1, 2 and 3 parameter Weibull distributions.

    • Normal distribution.

    • Lognormal distribution.

    • Generalized Gamma (G-Gamma) distribution.

    • Logistic distribution.

    • Log Logistic distribution.

    • Gumbel distribution.

    There are a lot of methods for The Goodness of fit test but the method of Maximum Likelihood Estimation is considered as the best method of Parameter estimation.

  5. THE PROPOSED METHODOLOGY

    In this research Weibull++ tool is used for mathematical and statistical calculation.

    First: Parameters for all life data distribution are estimated by maximum likelihood estimation and presented in the following table (1) and table (2) for two operating system Linux and Windows respectively.

    Second: Calculate Log-Likelihood Function By using parameters estimation and presented in the following table

    1. and table (4) for two operating system Linux and Windows respectively.

      Third: A distribution having maximum LKV is considered as best distribution fitted the given data.

      p>Table 1 . Parameter Estimation for Linux

      Distribution

      Parameters

      Exponential 1

      =48.742639

      Exponential 2

      =46.8096,=1.933

      Weibull 2

      =2.32103,=54.57108

      Weibull 3

      =6.167074,=121.856165 ,=-64.2574

      Normal

      =48.74263,std=21.99186

      Lognormal

      mean=3.71399,Std=0.71288

      G-Gamma

      =4.222010,=0.255851, =2.583114

      Gamma

      =2.77007,k=3.05409

      Logistic

      =49.90922,=12.8883

      Log-Logistic

      =3.8324,=0.351556

      Gumbel

      =59.25009,=18.792409

      Table 2. Parameter Estimation for Windows

      in [9] to determine the Weibull three parameters Distribution

      1. Construction of Reliability Model using the Weibull Distribution

    The Probability Density Function of Weibull Distribution is given by:

    Distribution

    Parameters

    Exponential 1

    =43.56212

    Exponential 2

    =38.16212,=5.400

    Weibull 2

    =1.96948,=49.16702

    Weibull 3

    =1.89676,=47.79413 ,=1.305

    Normal

    =43.56212,std=23.41969

    Lognormal

    mean=3.58866,Std=.67598

    G-Gamma

    =4.28027,=0.226218, =3.640866

    Gamma

    =2.72656,k=2.85087

    Logistic

    =43.05427,=14.122369

    Log-Logistic

    =3.647028,=0.386143

    Gumbel

    =55.23509,=21.667332

    f T = (T)1e(T) ,

    > 0, > 0

    Where

    (7)

    Table 3 . Log-Likelihood Value for Linux

    is the shape parameter, also known as the Weibull slope.

    is the scale parameter

    is the location parameter

    The cumulative Distribution Function of Weibull Distribution is given by:

    Distribution

    LKV

    Rank

    G-Gamma

    -332.51

    1

    Weibull 3

    -335.93

    2

    Gumbel

    -336.9

    3

    Normal

    -337.7

    4

    Logistic

    -339.9

    5

    Weibull 2

    -340

    6

    Gamma

    -347.1

    7

    Log-Logistic

    -353

    8

    Lognormal

    -359.08

    9

    Exponential 2

    -363.4

    10

    Exponential 1

    -366.4

    11

    (T )

    F T = e (8)

    The Reliability Function of Weibull Distribution is given by:

    ( )

    = 1 (9)

    Table 4 . Log-Likelihood Value for Windows

    By substituting from Table (1), Table (2) of Parameter Estimation in the equations (7), (9), we get:

    f T =

    47.79413

    1.89676 ( T1.305 ).89676 e( T 1.305 )1.89676 (Windows)

    Distribution

    LKV

    Rank

    G-Gamma

    -178.6

    1

    Weibull 3

    -180.5

    2

    Weibull 2

    -180.79

    3

    Gamma

    -181.67

    4

    Normal

    -182.4

    5

    Lognormal

    -184.14

    6

    Logistic

    -184.28

    7

    Gumbel

    -184.58

    8

    Log-Logistic

    -184.68

    9

    Exponential 2

    -185.6

    10

    Exponential 1

    -190.9

    11

    47.79413 47.79413

    f T =

    6.167074 (T 64.2574 )5.167074 e(T 64.2574 )6.167074 (Linux)

    121 .85616

    and

    121 .85616

    ( T 1.305 )1.89676

    121 .85616

    (10)

    R T = 1 e

    47.79413

    (Windows)

    121 .85616 )

    = 1

    ( 64.2574 6.167074 ()

    From the previous tables (3), (4), its clear that Generalized Gamma and Weibull Distribution is best suited and may be considered for reliability estimation.

    We used the web site in [8] to determine the Generalized Gamma three parameters Distribution. We used the web site

    (11)

    B. Construction of Reliability Model using Generalized Gamma Distribution:

    The Probability Density Function of Generalized Gamma Distribution is given by:

  6. RELIABILITY EVALUATION:

It is clear from the goodness of fit section that best distribution appropriate for collected sample are

f T =

Where

(k).

T

k1

T

e , > 0, > 0, > 0 (12)

Generalized Gamma Distribution with three parameters and Weibull Distribution with three parameters.

In this section the PDF and Reliability of Generalized

are the shape parameter. is the scale parameter.

Gamma Function has the formula:

0

x = tx1 etdt

But, Weibull++ uses a reparameterization with parameters k, and as shown in the following:

  • = ln + 1 . ln 1 , > 0 is the location

    Gamma Distribution with three parameters and Weibull Distribution are evaluated in the following tables (5),(6) by using equations (10),(11),(16),(17) ,and the corresponding graphs of the PDF and Reliability for each distributions are show in the following graphs (3),(4),(5),(6).

    Table (5) The PDF of Weibull Distribution and G-Gamma Distribution for Windows &Linux

    2

    Month No.

    Windows

    Linux

    f(t)-G-Gamma

    f(t)- weibull

    f(t)-G-Gamma

    f(t)- weibull

    5

    0.008114516

    0.003965289

    0.0046843

    0.002649

    10

    0.009412931

    0.008275655

    0.006685

    0.003735

    15

    0.010266748

    0.011784083

    0.008231

    0.00511

    20

    0.010919108

    0.014449878

    0.0095402

    0.006786

    25

    0.01145353

    0.016241065

    0.0106974

    0.008749

    30

    0.011909544

    0.0171763

    0.0117461

    0.010935

    35

    0.012309227

    0.017326563

    0.0127111

    0.013224

    40

    0.012666219

    0.016805063

    0.0136055

    0.015426

    45

    0.012989341

    0.015752378

    0.0144304

    0.017289

    50

    0.013283552

    0.01432049

    0.015166

    0.018526

    55

    0.013547534

    0.012658195

    0.015759

    0.018864

    60

    0.01376302

    0.010899465

    0.0160854

    0.018121

    65

    0.013862474

    0.009155594

    0.0159207

    0.016285

    70

    0.013647303

    0.007511239

    0.0149169

    0.013558

    75

    0.012631968

    0.006023932

    0.012689

    0.010341

    80

    0.009967771

    0.004726287

    0.009142

    0.007134

    85

    0.005317035

    0.00362999

    0.0049902

    0.004388

    88

    0.002481168

    0.003067554

    0.0028354

    0.003082

    Average

    0.011030166

    0.010342088

    0.011102

    0.010346

    parameter.

    1

  • = , > 0 is the scale parameter.

    k

  • = 1

k

is the shape parameter. (13)

The Cumulative Function of Generalized Gamma Distribution is given by:

F T =

t

(k, )

(k)

(14)

The Reliability Function of Generalized Gamma Distribution is given by:

t

R T = (k,

(k)

(15)

By substituting from Table (1), Table (2) of Parameter Estimation in the equations (13), we get the following values in two operating systems (Windows and Linux) , respectively

K = .0754383, = 16.094475, = 84.8469

(Windows)

k = 0.149869, = 10.096149, = 82.269615

(Linux) And then we get the following equations (16):

(.0754383 ,( t )16.094475 )

f T = 84.846 9

0.0754383

(.149869 ,( t )10.096149 )

f T = 82.269615

0.149869

(Windows)

(Linux) (16)

PDF-f(t)

And

R T = 1

16.094475

T 14.880335

T 16.094475

.0754383

R T = 1

10.096149

84.8469

T

8.58304

e 84.8469

T 10.096149

(Windows)

f(t) in

Windows f(t) in linux

0.02

0.015

0.01

0.149869

82.269615

e 82.269615

(Linux)

(17)

0.005

0

5 15 25 35 45 55 65 75 85 90

Month NO.

Figure 3. PDF of Weibull Distribution for Windows &Linux

5 15 25 35 45 55 65 75 8M5onth NO.

R(t) in

Windows

R(t) in Linux

Reliability

1.2

1

0.8

0.6

0.4

0.2

0

Month NO.

5 15 25 35 45 55 65 75 85

f(t) in

Windo ws

PDF f(t)

0.02

0.015

0.01

0.005

0

Figure 4. PDF-of G-Gamma Distribution for Windows &Linux

Table 6. The Reliability of Weibull and G-Gamma Distribution for Windows &Linux

Figure (6) show compare between the Reliabilities of Windows and Linux by using G-Gamma Distribution

7. CONCLUSIONS:

Month No.

Windows

Linux

R(t)-G-

Gamma

R(t)-

weibull

R(t)-G-

Gamma

R(t)- weibull

5

0.966583

0.99224524

0.984521

0.969795

10

0.922473

0.96130479

0.95582

0.953951

15

0.87316

0.91081577

0.918403

0.931965

20

0.820134

0.84486975

0.873899

0.90235

25

0.764164

0.76777766

0.823253

0.863623

30

0.705729

0.68388866

0.767105

0.814486

35

0.645162

0.59732532

0.705931

0.754098

40

0.582708

0.51174416

0.640111

0.68239

45

0.518556

0.43016062

0.56999

0.600403

50

0.452862

0.35485219

0.495953

0.510544

55

0.38577

0.28733889

0.418559

0.416648

60

0.317465

0.22842969

0.3388

0.323717

65

0.248327

0.17831797

0.258517

0.237271

70

0.179349

0.13670636

0.180988

0.162362

75

0.113162

0.10294244

0.111402

0.102512

80

0.0557885

0.07614977

0.0563443

0.058935

85

0.0169778

0.0553431

0.0210462

0.030398

88

0.00544105

0.04531424

0.00943495

0.019264

Average

0.476322853

0.43184461

0.507226525

0.492029

In the above study a detail methodology to estimate reliability of two Operating System (Windows and Linux) are discussed and it has been analyzed by using two fitted Distributions: Weibull Distribution and G-Gamma Distribution.

The average value of Probability Density Function and Reliability of (Weibull Distribution and G-Gamma Distribution) for each Operating System are calculated, and it has shown that:

Operating system Linux is most Reliable than Operating system Windows for each two Distributions Weibull Distribution and G-Gamma Distribution.

REFERENCES:

1.2

1

0.8

0.6

0.4

0.2

0

Reliability

5 15 25 35 45 55 65 75 85 90

R(t) in windows

R(t) in linux

Month NO.

  1. Lyu, Michael R. "Handbook of software reliability engineering." (1996).

  2. IEEE Reliability Society, IEEE recommended practice on software reliability, IEEE Std 1633-2008, June 2008.

  3. Sanjeev Kumar Jha, Dr. A.K.D.Dwivedi, Dr. Amod Tiwari. Reliability Estimation and Analysis of Linux Kernel(2011) IJCST Vol. 2, Issue 2

  4. Jha, Sanjeev Kumar, Pankaj Kumar, and A. K. D. Dwivedi. Estimation and Analysis of MYSQL Database Server Reliability using Beta and Generalized Gamma Distribution. (IJCET) Journal Impact Factor 3.2 (2012):pp. 354-371.

  5. Operating Systems Source of Bugs: http://bugs.mysql.com

  6. F. S. G. RICHARDS, A Method of Maximum-likelihood Estimation, http://www.jstor.org/pss/2984037

  7. Weibull++, Reliability Function, [Online] Available: http://www.weibull.com and http://www.reliasoft.com

  8. en.wikipedia.org/wiki/Generalized_gamma_distribution

  9. http://reliawiki.org/index.php/The_Weibull_Distribution

  10. H. Pham, Software Reliability. Springer Verlag, 2000.

  11. Gavin E. Crooks (2010), The Amoroso Distribution, Technical Note, Lawrence Berkeley National Laboratory

  12. Guo, Huairui, Jin, Tongdan, and Mettas, Adamantios. "Design Reliabilty Demonstration Tests for One-Shot Systems Under Zero Component Failure," IEEE Transactions on Reliability, Vol. 60, No. 1, pp. 286-294, (March 2011).

Figure 5. Show compare between the Reliabilities of Windows and Linux by using Weibull Distribution

Leave a Reply