 Open Access
 Total Downloads : 341
 Authors : Desai S. S
 Paper ID : IJERTV2IS90310
 Volume & Issue : Volume 02, Issue 09 (September 2013)
 Published (First Online): 11092013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
On Identification of Relevant Variables to Predict Output of Oilfield Using Support Vector Regression
Desai S. S. Department of Statistics,
Gopal Krishna Gokhale College, Kolhapur.
Maharashtra State, India.
Pin code : 416012
Abstract:
The production of oil has great significance as a world energy source. Broadly speaking, factors affecting output of oilfield can be classified into two groups namely human factors and geological factors. Each group consists of number of factors affecting output in oilfield. Identifying a prediction model with relevant factors (predictors) is a difficult task in the absence of prior knowledge. This could be done by using subset selection techniques in regression. Mostly, such techniques are based on least squares method (LS). Regression model is fitted under certain assumptions like, independence of predictors; error variable follows normal distribution with constant variance etc. Oilfield output data may not satisfy some of these assumptions and model selection techniques based on LS fail to select parsimonious model. As an alternative we use support vector regression. In this article, we study performance of different model selection techniques for oilfield data when predictors are linearly related.
Keywords: Oilfield output prediction; Least Squares Method; Multiple linear regression; Support Vector Regression; Subset selection; Mallows Cp.; Prediction risk.

Introduction:
Nowadays petroleum products have become necessary commodities in the day to day works of life. Oilfield is mother of petroleum products. Production in the oilfield plays a significant role in the economy of a nation. An oilfield is an area under the sedimentary rock with abundance of petroleum or crude oil. Typically an oilfield extends over a large area encompassing hundreds of kilometers with a large number of oil wells. Therefore, prediction of oilfield output based on factors affecting it is essential for the oil industry. Moreover, identifying the relevant factors for the accurate prediction is a serious problem. Regression analysis is a widely used tool for this purpose. Usually, multiple linear regression is employed in such cases. A multiple linear regression model is defined as
= + (1.1)
where is known as response variable and is a vector of observations,
= 0, 1, , 1 is a vector of unknown regression coefficients, is a matrix of order ( Ã— ) of observations on 1 predictors (regressors) X1, X2,,Xk1 with 1s in the first column and e is a vector of errors with following assumptions.
Assumptions:

Observations on response variable are independently distributed.

E(e) = 0 and V(e) = 2 In, where, In is an identity matrix of order n.

e ~ N( 0, 2 In )

In regression, the least squares estimation method is mostly used for parameter estimation. The least squares estimator (LS) of (Montgomery et al 2006) is given by
( X ' X )1 X 'Y . (1.2)
This method performs well under above assumptions on errors. If these assumptions are violated, estimator in (1.2) may not perform well. Moreover, if the assumptions are satisfied but some of the predictors are linearly related, then data may exhibit problem of multicollinearity. In such situation, estimator given in (1.2) will have large standard error and inference based on it will be misleading. Generally oilfield output depends on previous and current information of some variables. Consequently these variables may be highly correlated to each other.
Oilfield Data :
Let us consider the oilfield output data analyzed by Mustafar et al. (2011). The data contains oilfield output (Y) as response variable and eight different predictor variables as follows.
X1 : the total number of wells, X5 : the oil moisture content of previous year, X2 : the startup number of wells, X6 : the oil production rate of previous year, X3 : the number of new adding wells, X7 : the recovery percent of previous year, X4 : the injected water volume last year, X8 : the oil output of previous year.
The multiple linear regression equation fitted to the above data by LS method is
Y = 2019687 + 178 X1 + 218 X2 + 194 X3 + 0.0768 X4 54502 X5 983461 X6
+ 271927 X7 + 0.026 X8 (1.3)
The predictor variable X8 : the oil output of previous year may have some linear relationship with the other predictors. Also, it seems that the predictors X1 and X2 may be related. To investigate these relationships, we obtained the correlation matrix for predictors.
Correlation Matrix:
X1 
X2 
X3 
X4 
X5 
X6 
X7 X8 

X1 
1 

X2 
0.9844 
1 

X3 
0.9286 
0.9345 
1 

X4 
0.9887 
0.9552 
0.9204 
1 

X5 
0.8391 
0.7578 
0.7170 
0.8831 
1 

X6 
0.8646 
0.7995 
0.7171 
0.8989 
0.9724 
1 

X7 
0.9014 
0.8357 
0.7674 
0.9221 
0.9613 
0.9422 
1 

X8 
0.9946 
0.9714 
0.9253 
0.9916 
0.8481 
0.8689 
0.9169 
1 
The correlation matrix reveals that there exists strong linear relationship between any two predictors used. This may not confirm the presence of multicollinearity. So we obtained condition indices and variance inflation factor (VIF) for each predictor. The VIFs for X1, X2, ., X8 are 386.1, 112.0, 20.6, 245.0, 114.6, 41.4, 85.7, 391.6 respectively. The condition indices are 1, 4.1321, 7230.6, 6.6945×10+05, 1.9345×10+06, 3.3209×10+07, 4.2189×10+012, 8.9033×10+015 and
condition number is 8.9033×10+015.
We observe that severe multicollinearity is present in the data as indicated by high VIFs (e.g. 391.6, 386.1, 245) and condition indices (e.g. 8.9033×10+015,
4.2189×10+012). The performance of LS estimator is poor in the presence of multicollinearity in the data. This is pointed out by many researchers. The effects of the presence of multicollinearity on LS estimator are discussed in the standard texts like Montgomery et al. (2006) and Draper and Smith (2003).
In the literature, many techniques are available for dealing with the problem caused by multicollinearity. Ridge regression (Hoerl and Kennard, 1970) and Principal component regression (Marquardt, 1970) are suggested for the estimation purpose. Among these Ridge estimator is widely used for estimating parameters in the presence of multicollinearity. Support vector regression method can also be independently used.
Rest of the paper is organized as; Section 2 gives meaning of subset selection and also describes some methods for subset selection. The performance evaluation of these methods using oilfield data is done in Section 3. Section 4 ives discussion.

Variable selection in regression:
One of the main objectives of regression analysis is to predict the future value of the response variable using the given values of X1, X2, , Xk1 regressors. In practice, the data contains large number of variables for instance, rainfall data, oilfield data, micro array data, socio economic data, etc. A model based on a smaller subset of variables gives more accurate prediction than a model based on a large set (Miller, 2002). A large number of variables are introduced in the earlier stage of analysis and to enhance the predictive ability of the model, some variables are deleted by using some variable selection techniques. Hence, variable selection plays a vital role in regression analysis.
The problem of subset selection is that of searching for the best subset of size p from the all possible subsets such that the selected subset gives an accurate prediction. The literature of variable selection techniques in regression is very rich. An appropriate technique should be used for better results. When the data satisfies all the assumptions mentioned in Section 1, it is said to be clean data. Mallows Cp
(Mallows, 1973), R2, Adjusted R2, Sequential procedures (stepwise selection, forward
selection and backward elimination), etc. are some of the methods used for variable
selection in clean data. These methods are based on LS estimation procedure. In the literature few methods are available for variable selection in presence of multicollinearity based on ridge estimator such as Rp (Dorugade and Kashid, 2010a) and RGp (Dorugade and Kashid, 2010b).
In this study, we consider model (1.1) as full model. The fitted equation ( in vector notation) is
= (2.1)
and the residual sum of squares is defined as,
= 2
(2. 2)
We can write model (1.1) as
=1
= 11 + 22 + .
where X1 is an n Ã— p matrix of the observations on p ( k) predictors and 1
p Ã— 1 vector of the regression coefficients. Here, we consider the subset model as
is a
= 11 + . (2.3)
The fitted equation (vector notation) for subset model is
= 1 (2. 4)
and the residual sum of squares for subset model is defined as
= 2
(2.5)
=1
In this article, we consider following variable selection techniques which are used in different scenarios. First two methods are used for clean data and remaining methods are used in the presence of multicollinearity. Below we discuss these methods in brief.

Mallows Cp criterion:
Mallows Cp (1973) is one of the most popular variable selection methods, it is defined as
2
2
p
p
C = RSS p (n 2p) (2.6)
where, RSSp is the residual sum of squares of subset model, 2 is error variance replaced by its suitable estimate (RSSk / nk), n is the number of observations and p is the number of parameters in subset model. This method is based on LS estimation
method and as stated earlier, its performance is poor in case of collinear data. However, for clean data its performance is good.

Method based on significance of regression coefficients:
Another approach to variable selection is to select the variables to be included in the model on the basis of pvalues of test for significance of individual coefficients. This approach is suitable in case of clean data. In presence of multicollinearity, pvalues may signal in opposite direction.
Variable selection with collinear data
Presence of multicollinearity in the data introduces serious distortions in the analysis. Thus, the data with multicollinearity should be handled carefully. There are two approaches for variable selection in such case.

Variable selection after removing multicollinearity:
This method is explained in Chatterjee and Hadi (2006). In this method, we delete judiciously the set of variables responsible for multicollinearity in the data, so that the resultant set is free from multicollinearity. Based on the values of variance inflation factor (VIF), variables responsible for multicollinearity are decided and
deleted. VIF for predictor is defined as reciprocal of (1 2), where, 2 is the
multiple correlation coefficient obtained by regressing on all the remaining predictors.
The other approach is to use ridge regression (Hoerl and Kennard, 1970) based method for variable selection in the presence of multicollinearity, which is discussed below.

Rp criterion:
The problem of multicollinearity has attracted several researchers. Some of them have developed alternative estimators when the multicollinearity is severe. Hoerl and Kennard (1970) proposed the ridge estimator which is widely used because of its optimality properties (see Vinod and Ullah, 1981). It is defined as,
R ( X ' X rI)1 X 'Y , (2.7)
where, r is the ridge constant or ridge parameter. Hoerl, Kennard and Baldwin (1975) have recommended,
r = (k 1) 2
'
(2.8)
where
is the LS estimator of the and
2 (Y 'Y ' X 'Y )
n k
(2.9)
Recently, Dorugade and Kashid (2010a) proposed statistic for subset selection based on ridge estimator of in the presence of multicollinearity. It is defined as
=
=1
2
2 + 1 + (2.10)
1
where is the number of parameters in the subset model, 2 is error variance and is
replaced by its suitable estimate
[(Y 'Y ' X 'Y )n k],
= + 1 ,
=
+ 1 ,
R
R
is ridge constant or ridge parameter for subset
1
model. Note that the matrix and 1 are equivalent to hat matrix when LS estimator is used.

Support Vector Regression and Sp criterion :
An alternative to above methods is to use a data dependent method such as Support Vector Machine (SVM). The SVM methodology is fast growing area in machine learning. SVM has been introduced by Boser et al. (1992) in COLT. The basic task of SVM is to explore data (inputoutput pairs) and provide optimally accurate predictions for unseen data. A version of SVM for regression has been proposed in 1997 by Vapnik, Golowich and Smola. This method is called Support Vector Regression (SVR).
In SVR, the goal is to estimate an unknown function based on data ( xi, yi ), i = 1, 2, .., n of input vectors xi Rk1 and associated targets yi R, of the form,
yi = f (xi) + ei, (2.11)
where, f (xi) is unknown regression function and ei is error term.
In case of linear regression, the function f (xi) is described as follow,
(xi) = b + xiw , (2.12)
where, w = ( w1, w2, ..,wk1)' Rk1, b R is bias and xi = ( x1, x2, ..,xk1) . Therefore, Equation (2.11) becomes,
yi = b + xiw + ei , i = 1, 2, ., n. (2.13) In matrix notation, we write
Y = X + e
where, = ( b, w1, w2, ..,wk1)', Y, X and e are the same as defined in Section 1. This equation is equivalent to Equation (1.1).
In SVR, for formulation of optimization problem we use the following insensitive loss function proposed by Vapnik (1995)
L( yi, (xi)) = Max{ (xi) yi  , 0 } (2.14) where, > 0 is a predefined constant which controls the noise tolerance. The goal of SVR is to find a function (x) that has at most deviations from the actually obtained targets yi for all training data at the same time as flat as possible.
Using the insensitive loss function, the regression problem can be written in the form of convex optimization problem (Smola and SchÃ¶lkopf, 2004) as follows:
Minimize
1 w2 (2.15)
2
Subject to: i (xi w + b ) , i = 1, 2,.,n. (2.16)
(xi w + b ) yi , i = 1, 2,.,n (2.17)
The above optimization problem is feasible in case function f actually exists and approximates all pairs ( xi, yi ) without error with precision,
To cope with infeasible constraints of above problem, we introduce non
negative slack variables and
i
* , which measure the deviations of training samples
i
outside – insensitive zone. The above optimization problem becomes (Vapnik, 1995),
Minimize
1 w2 + C ( + * ) (2.18)
n
n
2 i1 i i
Subject to : yi ( xi w + b ) +
i
i
i
( xi w + b ) yi + *
, i = 1, 2,.,n. (2.19)
, i = 1, 2,.,n (2.20)
and
, *
0 , i = 1, 2,.,n
i i
The constant C > 0 determines the tradeoff between the flatness of f and the amount up to which deviations larger than are tolerated.
Using Lagranges multipliers method and exploiting the optimum constraints, the weight vector is given by (Vapnik, 1997, Gunn, 1998 ),
nnsv
i i i
i i i
w ' (
i1
*)x (2.21)
and the regression function is given by
nnsv
i i i
i i i
f (x) (
i1

*)x x b
(2.22)
i
i
where, , *
i
for i = 1, 2,.,n are Lagranges multipliers and nnsv number of
support vectors. The value of bias b is given by (Gunn, 1998),
b 1 ( x
2 r

x )w
s
(2.23)
where, xr and xs are the support vectors (i.e. any input vector which has nonzero value
of either or *
respectively).
i i
The role of meta parameters C and :
The performance of SVR (estimation accuracy) strongly depends on proper setting of regularization parameter (C) and width of insensitive zone (). Such parameters are called as meta parameters. Parameter controls the width of the – insensitive zone used to fit the training data. The decides the level of accuracy of the regression function through number of support vectors. To achieve certain accuracy, we need to choose a smaller value of to have maximum number of support vectors. Parameter C determines the tradeoff between the model complexity (flatness) and the degree to which deviations larger than are tolerated in optimization formulation. Existing methods for selection of meta parameter C are
A priori knowledge and/or user expertise, C = Range (Mattera and Haykin,1999),
C = Max( y 3 y , y 3 y ) (Cherkassky and Ma,2004),
C = PR (Percentile Range) = (P(100+)/2 P(100)/2 ) (Desai and Kashid, 2013) and C= Max (Me 3Q.D., Me + 3Q.D.) (Desai and Kashid, 2013)
Sp criterion:
Kashid and Kulkarni (2002) proposed the more general Sp – criterion based on Mestimator (Montgomery et al., 2006, chap. 11) for outlier data. It is defined as,
=
=
Sp = ( y ik y ip ) 2 (k 2p) (2.24)
Where, y ik and y ip are predicted values based on full model and subset model respectively. Also k and p are the parameters of the full and subset model respectively. Further, note that 2 is usually unknown and so it has to be replaced by its suitable estimate.

Comparison of subset selection methods:
To compare the performance of various subset selection methods, we obtain the mean absolute percentage error (MAPE) defined as,
=
=
MAPE = 1[( ) 100]
(3.1)
In this section, we demonstrate numerically how variable selection methods give misleading results if they are applied without considering the nature of data. We reconsider the oilfield data discussed in section 1 and analyze it by using the methods mentioned in above section.

Variable selection using Mallows Cp:
We obtain the values of Cp statistic for all possible (2k 1= 255) subsets. In the following table, we list two values of Cp statistic which are minimum in same size of subsets.
Table 1: Values of Cp statistic.
Predictors in the Model
Cp
p
X1
60.4
2
X8
65.3
X1X7
27.7
3
X1X8
33.4
X2X7X8
8.3
4
X1X5X7
18.8
X2X4X5X7
5.7
5
X1X2X7X8
8.1
X2X4X5X6X7
5.7
6
X1X2X4X5X7
6.3
X1X2X4X5X6X7
6.1
7
X2X4X5X6X7X8
7.0
X1X2X3X4X5X6X7
7.0
8
X1X2X4X5X6X7X8
7.8
X1X2X3X4X5X6X7X8
9.0
9
Cp
Cp
Fig. 1 Plot of Cp vs p
Cp vs p
Cp vs p
20
18
16
14
12
10
8
6
4
2
0
20
18
16
14
12
10
8
6
4
2
0
0 1 2 3 4
5
p
6 7 8 9 10
0 1 2 3 4
5
p
6 7 8 9 10
Value of Cp corresponding to predictors {X2, X4, X5, X7} is 5.7 which is close to 5. Hence, according to this method {X2, X4, X5, X7} is proper subset. This fact is also demonstrated through graphically. In Fig. 1, the dotted line represents Cp = p and point denotes value of Cp. Naturally, if Cp is close to p, the corresponding points will be close to the line Cp = p. From Fig. 1, it is clear that the subsets for which Cp is close to p are proper subsets. Among these, {X2, X4, X5, X7} is of the smallest size.

Variable selection using method based on pvalue:
In this method, we remove predictors one by one corresponding to larger p value (of test for significance of individual predictor). The same method is used by Mustafar et al. (2011). Here we fix the significance indicator 0.05 and apply this method to oilfield data. The largest pvalue is 0.901, which corresponds to X8, so we remove X8 from the model and regress Y on remaining predictors. In the same way, X3 (p value = 0.307), X1 (p value = 0.210), X6 (p value = 0.613) are removed in subsequent stages. Finally, the predictors X2, X4, X5 and X7 remain in the regression model whose p values are significant. So, this method selects the set {X2, X4, X5, X7} as proper subset. The same subset is selected for significance indicator 0.1 and 0.01. The regression coefficients, p values and VIF values for the subset model are computed and presented in the following table.
Table 2 : p and VIF values for significant predictors.
Predictor
Coeff.
p
VIF
Constant
259910
0.313
—
X2
352.21
0
19.5
X4
0.12302
0
35.7
X5
36606
0.001
19
X7
277024
0
20.5
It is clear that p values in the above table indicate the significance of individual predictors, but VIF values indicate that still the multicollinearity is present in the predictors selected by this method. Mallows Cp and the p value based mthod agree on the importance of same subset because both are based on LS estimator.

Variable selection after removing multicollinearity:
We obtained VIF values corresponding to all predictors. The VIF value corresponding to predictor X8 is 319.6, which is larger; we remove X8 from the model. We obtained VIF values corresponding to remaining predictors Xi, i= 1,2, 3,
, 7 and maximum VIF is 337.2 which corresponds to X1. So, we remove X1 in second stage. On the same way we remove X5 (VIF = 57.7 ) in third stage, X4 (VIF =
47.6 ) in fourth stage and regressed Y on remaining variables X2, X3, X6, X7. The fitted regression equation is
Y = 1430935 + 535 X2 + 398 X3 206908 X6 + 259257 X7 (3.2)
The VIF values for remaining variables X2, X3, X6, X7 are 11.1, 8.1, 9.1, 10.8 respectively. This indicates that the data doesnt contain severe multicollinearity. Here, we consider model in (3.2) as fitted full model and apply Mallows Cp for variable selection. Following table presents values of Cp statistic for all possible subset models when full model contains the predictors X2, X3, X6, X7.
Table 3 : The values of Cp for non collinear data.
Predictors in the Model 
Cp 
p 
X2 
115.5 
2 
X3 
471.4 

X6 
703.8 

X7 
453.9 

X2X3 
116.9 
3 
X2X6 
37 

X2X7 
3.4 

X3X6 
161.4 

X3X7 
112.3 

X6X7 
453.4 

X2X3X6 
35 
4 
X2X3X7 
3.2 

X2X6X7 
5.3 

X3X6X7 
109.9 

X2X3X6X7 
5 
5 
From above table it is clear that Cp statistic selects the set of predictors {X2, X7} as proper subset. This method selects the subset model with smaller size as compared to the methods given in 3.1 and 3.2.

Variable selection using Rp criterion:
We apply method based on Rp statistic to oilfield data and obtain the values of Rp statistics for all possible (2k 1= 255) subsets. Two values of Rp statistic, which are minimum in each size of subsets are presented in Table 4.
Table 4 :The values of Rp statistic
Predictors in the Model
Rp
p
X1
56.23302
2
X8
58.52541
X1X7
25.60112
3
X1X8
28.60801
X2X7X8
6.491387
4
X2X5X8
16.46874
X1X2X7X8
5.62053
5
X2X4X7X8
6.702801
X2X4X5X7X8
6.58359
6
X1X2X5X7X8
7.125915
X1X2X4X5X7X8
6.848504
7
X1X2X3X4X6X7
7.227894
X1X2X4X5X6X7X8
7.388682
8
X1X2X3X4X5X6X7
8.458443
X1X2X3X4X5X6X7X8
9.0
9
Rp
Rp
Fig. 2 Plot of Rp vs p
Rp vs p
20
18
16
14
12
10
8
6
4
2
0
0 1 2 3 4 5 6 7 8 9 10
p
Rp vs p
20
18
16
14
12
10
8
6
4
2
0
0 1 2 3 4 5 6 7 8 9 10
p
The Value of Rp corresponding to predictors {X1, X2, X7, X8} is 5.62053, which is close to 5. Hence {X1, X2, X7, X8} is proper subset. Fig. 2 demonstrates this fact. The VIF values for the selected subset of variables are 229.9, 62.0, 10.6 and
120.5. This indicates that multicollinearity is present in the selected subset. Since, ridge regression is used, it does not affects the prediction ability of the model.

Variable selection Using Support Vector Regression and SpStatistic:
In this method, we use support vector regression to estimate regression coefficients for oilfield data. To perform SVR, we have used meta parameter
C = Max (Me 3Q.D., Me + 3Q.D.) suggested by Desai and Kashid (2013) and
= CÃ—106 (see Gunn,1998). For subset selection we have used more general Sp criterion ( Kashid and Kulkarni, 2002). Obtained the values of Sp statistics for all possible subsets using SVR. Two values of Sp statistic which are minimum in each group of equal number of predictors are presented in Table No.5.
Table 5: Values of Sp statistic.
Predictors in the Model 
Sp 
p 
X1 
58.72623939 
2 
X8 
61.10392123 

X2X8 
26.50206833 
3 
X1X8 
46.70830141 

X2X4X8 
22.19079591 
4 
X1X2X8 
22.43180781 

X2X5X6X8 
11.54141709 
5 
X2X3X5X8 
11.76939773 

X2X3X4X5X7 
7.886057118 
6 
X2X4X5X6X8 
13.65096426 

X2X3X4X5X6X7 
7.485941498 
7 
X2X3X4X5X7X8 
9.534545572 

X2X3X4X5X6X7X8 
7.077579478 
8 
X1X2X3X4X5X6X7 
9.950798094 

X1X2X3X4X5X6X7X8 
9 
9 
Value of Sp corresponding to predictors {X2, X3, X4, X5, X6, X7} is 7.485941498, which is closer to 7 in small subsets. Hence {X2, X3, X4, X5, X6, X7} is proper subset.
Interestingly, different methods selected different subsets. In order to compare these subsets and consequently the methods which select them, we assess the mean absolute percentage error. We generated 10000 bootstrap samples each for sample size 5, 10, 15, 20 and 24 from the oilfield data. The MAPEs corresponding to each selected subset are reported in the Table No. 6.
Table 6 : Mean Absolute Percentage Error.
Method 
Proper Subset 
Bootstrap Sample Size 

n = 5 
n = 10 
n = 15 
n = 20 
n = 24 

Pvalue 
{ X2, X4, X5, X7} 
33.08442 
33.07843 
33.10258 
33.09600 
33.12569 
Cp 
{ X2, X4, X5, X7} 
33.08442 
33.07843 
33.10258 
33.09600 
33.12569 
Rp 
{ X1, X2, X7, X8} 
3.493844 
3.484232 
3.476870 
3.484011 
3.465856 
Chat. 
{ X2, X7} 
4.815408 
4.793653 
4764784 
4.774153 
4.757813 
SVR 
{ X2, X3, X4, X5, X6, X7} 
2.777376 
2.762925 
2.770807 
2.775243 
2.750963 
The subsets { X1, X2, X7, X8} and { X2, X3, X4, X5, X6, X7} give least MAPE among those considered. Rp criterion and Sp criterion selected the corresponding subsets. Thus, Rp criterion and Sp criterion using SVR estimates perform better than other criteria in the presence of multicollinearity.

Discussion:
In this article, we discussed the use of subset selection methods for the purpose of building a model less complex in nature but giving higher prediction accuracy in the contest of oilfield data. As discussed earlier, there are many subset selection methods available in the literature. The user of statistics may find difficult to choose one of them. Naturally, if one uses a subset selection method without knowing the nature of the data, then the results may be misleading. It is important to understand the nature of the data and problems associated with it. Based on the problems in the data, an appropriate method of subset selection should be used.
References:

Boser, B. E., Guyon, I. M. and Vapnik, V. N. (1992): A Training Algorithm for Optimal Margin Classifiers, ACM COLT 92, Pittsburgh, PA, pp. 144152.

Chatterjee S. and Hadi A. S. (2006): Regression Analysis by Example,
Forth Edition, John Wiley and Sons Inc, New York.

Cherkassky, V. and Yunqian, Ma (2004): Practical Selection of SVM Parameters and Noise Estimation for SVM Regression, Neural Networks Vol17, n1, 113126.

Desai S. S. and Kashid D. N. (2013): Estimation of Regression Parameters Using SVM with New Methods for Meta Parameter. (Accepted for publication in), International Journal of Data Mining, Modeling and Management

Dorugade A. V., Kashid D. N. (2010a): Variable Selection in Linear Regression based on Ridge Estimator. Journal of Statistical Computation and Simulation. 80 (11), 12111224.

Dorugade A. V., Kashid D. N. (2010b): Subset Selection in Linear Regression Using Generalized Ridge Estimator, Journal of Statistical Theory and Practice, 4, 375389.

Draper N. R. and Smith H. (2003): Applied Regression Analysis. Third edition – John Wiley and Sons Inc, New York.

Gunn S. R. (1998): Support Vector Machines for Classification and Regression . Technical Report. School of Electronics and Computer Science, University of Suthoampton.

Hoerl A. E. and Kennard R. W. (1970): Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, 12, 5567.

Hoerl, Kennardand Baldwin (1975): Ridge Regression: Some Simulation,
Computation in Statistics 4, 105123.

Kashid, D. N. and Kulkarni, S. R. (2002): A more General Criteria for Subset Selection in Multiple Linear Regressions, Communication in Statistics Theory and Method. 31(5), 795811.

Mallows C. L. (1973): Some Comments on Cp, Technometrics, 15, 661675.

Marquardt D. W. (1970): Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation, Technometrics, 12, 591612.

Mattera, D. and Haykin, S. (1999): Support Vector Machines for Dynamic Reconstruction of a Chaotic System, in : B. SchÃ¶lkopf, J. Burges, A. Smola, Eds., Advances in Kernel Methods: Support Vector Learning (pp 211242), Cambridge, MA, MIT Press.

Miller A. J. (2002): Subset Selection in Regression, Chapman and Hall.

Montgomery D. C., Peck E. A. and Vining G. G. (2006): Introduction to Linear Regression Analysis. Third edition – John Wiley and Sons Inc.

Mustafar I. B., Razali R. (2011): A Study on Prediction of Output in oilfield Using Linear Regression. International Journal of Applied Science and Technology, Vol. 1, No. 4, 107113.

Smola, A. J. and SchÃ¶lkopf, B. (2004): A Tutorial on Support Vector Regression. Statistics and Computing – 14, pp. 199 – 222.

Vapnik, V., Golowich, S. and Smola, A. (1997): Support Vector Method for Function Approximation, Regression Estimation and Signal Processing, In Mozer M., Jordan M. and Petshe T. editors, NIPS, Vol. 9, pp. 281287, Cambridge, MA, MIT Press.

Vapnik V. (1998): The Nature of Statistical Learning Theory. Second Edition, Springer, New York.

Vinod H. D. and Ullah A. (1981): Recent Advances in Regression Methods,

Marcel Dekker, New York.