Robust Estimation in Stratification Sampling

The estimation of any variable of interest such as the one considered in this study: monthly allowance and expenditure of students in the university depends on the sampling scheme used. In this research, estimation using simple random sampling and stratification sampling is considered. Statistical Package for Social Sciences (SPSS) was used for easier computation of the estimates needed. It was found that stratification sampling scheme gives a better variance and it is therefore recommended.


INTRODUCTION
Sample design has two aspects: a selection process, the rules and operations by which some members of the population are included in the sample; and an estimation process (or estimator) for computing the sample statistics, which are sample estimates of population values (Hidiroglou and Rao,1983). Kish (1965) opined that survey objectives should determine the sample design; but the determination is actually a two-way process, because the problems of sample design often influence and change the survey objectives. We shall encounter examples of the ways in which survey objectives and sample design interact to produce overall survey designs. A dialogue between the researcher and the sampler must occur before any aspect of survey design is "frozen," because a change in one aspect may dictate a change in others. Instead of a dialogue, the decisions may involve a larger cast: sampler, researcher, and consumer; and the last, perhaps the grantor of the project, may feel behind him the silent pressure of the "ultimate consumers" of the datathe members of a profession or, perhaps, a wider public. The dialogue may occur silently within one head, if the researcher and sampler are one; but the dialogue should nevertheless take place (Kuk, 1988 andOkafor, 2002). Most samples are prepared by statistician and other researchers who are not primarily sampling specialists. Nevertheless, it is helpful, although sometimes difficult, to separate sampling design from the related activities involved in survey research. The sample design covers the tasks of selection and estimation for making inference from sample value to the population value. Beyond this are the problems of making inferences from the survey population to another and generally broader population, with measurements free from error (Rao, 1994).
AIM AND OBJECTIVES The aim of this study is to compare variances in linear combination using simple random sampling and stratification sampling. The objectives are to: (i) Estimate the mean monthly income and expenses of students in the university (ii) Present a better procedure for estimation when the mean monthly incomes and expenses of students in the university are involved. Characteristics of population elements are transformed to variables Yi by the survey operations of measurement. Some literature deals directly with the statistical populations of the variables Yi. But I prefer to say that the ith element has the variable Yi. This permits us to talk of the many variables (Yi, Xi, Zi, Wi, Pi and so on) of the same element (Rao, Kovar and Mantel, 1990). We can also consider relationships between variables of an element, changes of variables, and accuracy of measurements of variables. A statistic based on the variables found in a sample results in a random variable is what we call a variate (Kendall and Buckland, 1957). [ y1 + y2 + ... + yn ] (4) The results of an SRS selection may be used for other estimators also, for example, with post-stratification or with a ratio estimator (Sarndal and Wright, 1992). But we treat those separately as other designs. Simple random sampling is a sample design specifying both the SRS selection and the simple mean estimate. The variance of the SRS mean ȳ0 is computed as The standard error of ȳ0 is the root of its variance: Sometimes we may want to estimate Y = NȲ, the aggregate or total of the Yi variable in the population. A simple estimator of Y is the Nȳ 0 and its standard error is estimated by We can also point out that the expected value of S 2 in SRS is This is shown as the expected value of the sample estimate of the variance of the mean is For the difference (ȳ − x − ) of two means, the variance is simply the sum of the two variances if the two samples are independent (Thompson, 1992). But if the two means are not independent, a covariance term must be subtracted from the sum of the variances: var For n pairs of values, each pair selected with SRS, the difference has the variance: Note also the use of covariance of the two variables. The statistics resembles the variance. But contains cross-product terms instead of the squared terms of the variance Note also that for the pairs of elements Hence we may treat this as the mean of a sample of n element (yjxj) = dj . The variance can also be computed as which numerically equal to (15). The covariance is absent for two independent samples, but present for two overlapping samples. The variance of the difference becomes more complicated if the two samples are neither completely independent nor completely overlapping.
The subclass mean ȳm = ∑ m j yj / m from an SRS of n elements can be treated as an SRS of m elements. That is, we consider the variance of the sample conditional on obtaining a sample of m elements: We The formula with the possible exception of the factor (1 -f ) = (1 -n/N). This factor is usually called the finite population correction, briefly fpc. When sampling without replacement, it appears as a correction factor to the main portion of the variance terms, which is s 2 / n for SRS. If we think of a fixed sample size n being applied to larger and larger populations, the sampling fraction g = n/N tends to zero, and the factor (1f) approaches 1. Multiplication by one has no effect, and the fpc can be omitted when the population is much larger than the sample. For an "infinite population" the factor disappears from the variance formula; hence, its name. Also, when selecting with replacement, the factor (1f) becomes 1 and disappears. The effect is similar to selection from an infinite population.
The sampling fraction is usually small, because the population is large. The aims of research generally concern inferences from about large populations or confined to a small population. This often is hopefully considered a "sample" for making inference about some much larger actual population or theoretical universe. But census aimed specifically as small populations do occur, and sometimes these run into larger fractions of 10 percent and more. In these rare cases the fpc is needed. Note that the variance can be written as var (ȳ) = S 2 / nꞌ where nꞌ = n/(1n/N) = nN/(N -n). From this we easily note that n = nꞌ/(1 + nꞌ/N). In the words, the effect of (1n/N) is to increase the "effective sample size" from n to nꞌ. It might be convenient to write all the variance formulas with this convention.

RELATIVE ERROR
In some situation it is useful to consider some relative measures instead of the absolute measures of the variation. The absolute measures, the standard deviation and the standard error, appear in the units of measurement of the variable, and this causes difficulties in some comparisons. Common relative measures are the coefficients of variation, in which the unit of measurement is cancelled by dividing the mean. The element coefficient of variation is derived from the standard deviation: The coefficient of variation of the mean (ȳ) is derived similarly from the standard error: The squares of these quantities correspond, respectively, to the variances of the element and of the mean: is a relative variance of the mean (ȳ).
Coefficients of variation are useful for variables that are always or mostly positive; these occur frequently in surveys, especially as "count data". Comparison of the variability of these items often becomes more meaningful when expressed in relative terms. For example, in comparing the "income spread" in two countries, the use of the two standard deviations would be confused by the different monetary units as well as by different standard of living; but coefficients of variation may provide a reasonable comparison in term of average income.
cv(Nȳ) = cv(ȳ) The general expression holds for different sample designs. Specifically, for SRS samples we can use In some situations the coefficients of variation should be used only with caution, or not at all for the following reasons: (1) If the mean of the variable is close to zero, the coefficients of variances are large and unusable and (2) For binomial variable, the element variance is the same P (P -1) for both P and 1 -P; but the coefficients of variation differ, depending on the arbitrary decision of which side of the binomial but is regarded as P and which as Q. That is: The element relative variance C v 2 = (1 -P)/P = 1 for P = 0.05. It increases rapidly for small values of P.

Mean of Linear Combinations Using Stratification Sampling
In stratified sampling where the population of N units is first divided into subpopulations of N1, N2, ..., NL units respectively. These subpopulations are non overlapping and together they comprise the whole of the population, so that N1, N2, ..., NL = N Then; is a linear function of the ȳ h with fixed weights W h .

Variances for Linear Combinations Using Stratification Sampling
We obtain variances for some more complicated linear combinations that we shall need later. (22) The formulas for variances and covariance of linear combinations were developed for population values, written with capital letters as Var and Cov. But they apply also to their sample estimates, which we write with lower case letters as var and cov. Summation of the estimated variances and covariances for sample totals within strata is simple and frequently needed. Therefore, we employ the brief notation dy h 2 , dx h 2 and dy h dx h . When the y h and x h represent two variates for selections that are independent between strata, we have Here, an estimation of mean monthly allowance of students in Mathematics and Statistics, Department of Federal University of Technology, Minna is considered. The set of data gathered is stratified into two strata: male (stratum 1) and female (stratum 2). We present the summary of results generated as follows for simple random sampling and stratification sampling in tables 1 and 2 respectively:

DISCUSSION OF RESULTS
From tables 1 and 2, it was found that the variance and the corresponding standard error in the case of stratification sampling is less than that of the variance and the corresponding standard error in the case of simple random sampling. For a simple random sampling, the means of x and y are ₦19,750.00 and ₦18,170.00 with standard errors of 6,357.58 and 5,848.97 respectively. That is the average monthly allowance for a student is ₦19,750.00 while the average monthly expenditure for a student is ₦18,170.00. Also, for a stratification sampling, the means of x and y are ₦12,263.16 and ₦11,282.11 with standard errors of 1,282.16 and 1,179.59 respectively. That is the average monthly allowance for a student is ₦12,263.16 while the monthly expenditure for a student is ₦11,282.11. CONCLUSIONS AND RECOMMENDATIONS From the findings above, it was observed that the estimation procedure using stratification sampling is better than linear combination of simple random sampling. An approach that is better in sampling technique is always being adopted when there is a need for computation involving such variable of interest. The study shows that the estimation from stratification sampling scheme gives the minimum variance and standard error of mean. Hence, estimation using stratification sampling scheme is recommended.