Summary
The jackknife method is often used for variance estimation in sample surveys but has only been developed for a limited class of sampling designs. We propose a jackknife variance estimator which is defined for any without-replacement unequal probability sampling design. We demonstrate design consistency of this estimator for a broad class of point estimators. A Monte Carlo study shows how the proposed estimator may improve on existing estimators.
Inclusion probabilities, Linearization, Pseudovalues, Smooth function of means, Stratification
1. Introduction
Jackknife methods are widely used for standard error estimation in sample surveys (e.g. Wolter (1985) and Shao and Tu (1995)). Tukey's (1958) original idea of jackknife variance estimation has been developed to handle stratified multistage sampling by Lee (1973), Jones (1974), Kish and Frankel (1974) and Krewski and Rao (1981), among others, and the properties of various forms of the jackknife estimator for this case have been studied both theoretically and empirically (e.g. Krewski and Rao (1981), Rao and Wu (1985), Kovar et al. (1988), Rao et al. (1992) and Shao and Tu (1995)). The restriction of the jackknife method to stratified multistage designs constrains its applicability compared, for example, with linearization estimators, which have been defined for any unequal probability sampling design without replacement (Särndal et al. (1992), section 5.5). In this paper we address this constraint by proposing, in Section 3, a jackknife variance estimator, which is applicable for the same general class of sampling designs.
Our approach is based on the analogy between the jackknife and linearization methods, in which the analytic derivative in linearization is replaced by a numerical approximation (Davison and Hinkley (1997), page 50). The estimator that is proposed is a jackknife analogue of a standard linearization variance estimator for unequal probability designs. The same estimator was effectively also proposed by Campbell (1980) in an impressively general paper, which seems unfortunately to have received little attention in the subsequent survey sampling literature. This paper goes beyond Campbell (1980) by investigating the properties of this estimator both theoretically and numerically.
The class of point estimators, for which the variance estimator proposed is defined, is set out in Section 2. We demonstrate in Section 4 that the estimator is consistent for the same asymptotic variance as the linearization estimator. We support this result with a small simulation study in Section 6 comparing the sampling properties of our estimator with three existing jackknife variance estimators that are described in Section 5.
2. The class of point estimators
Before considering variance estimation, it is necessary to define the point estimator, the variance of which is to be estimated. We consider a finite population 𝒰={1,…,i,…,N} containing N units and suppose that values yqi, q=1,…,Q, for Q survey variables are associated with the unit that is labelled i. We assume that a sample 𝒮⊂𝒰 is selected according to a probability sampling design and that there is no non-response.
We motivate the class of point estimators by first defining a class of population parameters θ of interest. We assume that this parameter can be expressed as a function of means, θ=g(μ1,…,μQ), where g(·) is a smooth function (see Appendix A) from ℝQ to ℝ and μq is the finite population mean, μq=N−1Σi ∈ 𝒰yqi. This definition of θ includes most parameters of interest arising in common survey applications, such as ratios, subpopulation means and correlation and regression coefficients. We assume that θ is a scalar for simplicity although the approach could be generalized to multivariate θ.
We now define the point estimator as the substitution estimator , where
is the Hájek (1981) ratio estimator of μq, the weight wi is given by
(1)
is an unbiased estimator of N and πi denotes the first-order inclusion probability of unit i. Many parameters of interest in surveys, e.g. ratios and correlation coefficients, are invariant to multiplication of each μq in g(μ1,…,μQ) by a common constant; in such cases the specification of in equation (1) is arbitrary and could be viewed alternatively as a function of estimated totals.
3. The proposed jackknife variance estimator
We adopt a design-based approach and consider the estimation of the variance of with respect to the sampling design. We propose to estimate this variance by
(2)
where πij denotes the probability that both units i and j are selected,
(3)
, , , 𝒮−j consists of 𝒮 with the jth unit deleted and n is the size of the sample 𝒮.
The estimator in equation (2) takes the form of the variance estimator of Horvitz and Thompson (1952) for the sample sum of empirical influence values (Davison and Hinkley (1997), chapter 2), where these empirical influence values are numerically approximated by the jackknife pseudovalues. This is analogous to the linearization variance estimator (Särndal et al. (1992), page 175) which takes the same form but with the empirical influence values obtained by analytic differentiation. This perspective was first set out by Campbell (1980), who noted how both these estimators could be constructed but did not evaluate their properties in detail.
The factor 1−wi is a correction for unequal πi, reducing the contribution of observations which have higher πi-values and thus make smaller contributions to the variance. The inclusion of this factor ensures that equation (2) reduces to the usual linearization variance estimator (Särndal et al. (1992), page 182) when is the Hájek estimator , say, in which case ɛ(i) reduces to . The (1−wi)-correction was suggested by Campbell (1980), who noted an algebraic equivalence with the weighted jackknife method of Hinkley (1977).
4. Consistency
In this section we consider the design consistency of the variance estimator proposed. Building on the analogy between linearization and jackknife variance estimation, we follow the approach of Särndal et al. (1992), who treated the linearization variance estimator under an unequal probability design as an estimator of an approximate linearized variance and then referred to other evidence that this approximate variance agrees well with the actual variance in large samples (Särndal et al. (1992), page 175). The approximate linearized variance (Robinson and Särndal, 1983) var in our case (using expressions (5.5.10) and (5.7.4) in Särndal et al. (1992)) is given by
(4)
where
yi=(y1i,…,yQi)T, ∇(x) denotes the gradient of g(·) at x ∈ ℝQ and it is assumed that g(·) is continuous and differentiable at μ=(μ1,…,μQ)T.
To demonstrate the consistency of the proposed variance estimator for the approximate linearized variance, we first define our asymptotic framework. Let {𝒮t} be a sequence of samples selected from the sequence of nested finite populations {𝒰t} of sizes Nt by a sequence of sampling designs, such that 𝒮t is composed of a fixed number nt of distinct elements selected from 𝒰t (nt<Nt) for t=1,2,…. For simplicity of notation, the index t will be suppressed in what follows and all limiting processes will be understood to be as t→∞. We shall denote by →p and respectively convergence in probability and in distribution when t→∞.
Theorem 1
Provided that the linearization variance estimator (11) is design consistent and under regularity assumptions that are given in Appendix A, the proposed variance estimator (2) is also design consistent, i.e.
(4)
The proof of theorem 1 is given in Appendix A.
It follows as a corollary of theorem 1 that if
(6)
i.e. if appropriate conditions hold for the linearization variance estimator to generate asymptotically valid confidence intervals, then by slu*tsky's lemma
Confidence intervals based on will then be asympotically valid.
The key requirement for condition (6) to hold is that the Horvitz–Thompson estimators underlying the definition of are asymptotically normal. Sufficient conditions for asymptotic normality have been investigated to a limited extent in the survey sampling literature, but some examples of conditions are given by Hájek (1964) and Rosén (1972).
5. Alternative jackknife variance estimators
For comparison with the variance estimator proposed, we now consider some alternative jackknife estimators that have been proposed in the literature. The standard jackknife variance estimator of (Tukey, 1958) is defined by
(7)
where . If we ignore the finite population correction and if we assume that the sample is selected with simple random sampling without replacement, equation (2) reduces to equation (7). The variance estimator in equation (7) has been shown to be consistent for independent and identically distributed observations (e.g. Shao (1989, 1993) and Shao and Tu (1995)).
For the case of stratified simple random sampling without replacement, Lee (1973) (see also Kish and Frankel (1974)) proposed the variance estimator
(8)
where 𝒮h is the sample of size nh in the hth stratum Uh. For comparison, equation (2) reduces under this design to
(9)
where . Ignoring the finite population correction, equation (9) is the jackknife estimator that was proposed by Jones (1974). Thus, when and the finite population correction is negligible, equation (8) is close to equation (9). It is worth noting that equation (9) naturally includes a finite population correction which is absent in equation (8).
Rao et al. (1992) described a customary ‘delete cluster’ jackknife variance estimator for a general weighted point estimator in stratified multistage designs. For the case when the clusters are single units and the weights are Horvitz–Thompson weights , their estimator reduces to
(10)
where is computed by omitting unit i ∈ 𝒮h and by modifying the weights so that is replaced by for all j ∈ 𝒮h and the weight stays unaltered for all other j.
6. Monte Carlo study
In this section, the proposed variance estimator (2) is compared numerically with the alter- native jackknife estimators (7), (8) and (10). We use a population frame given in Valliant et al. (2000), appendix B, and available at the John Wiley World Wide Web site ftp://ftp.wiley.com/public/sci_tech_med/finite_populations. This population frame is extracted from the September 1976 Current Population Survey in the USA. We duplicate this population frame five times to create an artificial population of N=2390 individuals from which samples will be selected. This population is stratified into H=3 strata. The variables that are of interest are the number of hours worked per week (y1i) and the weekly wages (y2i). The population parameter that is considered is the finite population correlation coefficient between these two variables where σ12=Σi ∈ 𝒰(y1i−μ1)(y2i−μ2) and (k=1,2). The population value is ρ=0.49. We propose to estimate ρ by
where and .
We consider a stratified sampling design with proportional allocation with at least two units selected per stratum, using the Chao (1982) sampling design for selection within each stratum. The πi are proportional to a skewed size variable correlated with the y2i, with a correlation coefficient of 0.83. The size variable has a coefficient of variation of 1.22, a Fisher coefficient of skewness of 3.13 and a kurtosis of 14.7. The πij are computed exactly by using an expression given by Chao (1982).
For each simulation, 10 000 samples were selected to compute the empirical relative bias
where bias and the empirical relative root-mean-square error
of equations (2), (7), (8) and (10). The variance is the empirical variance of the 10 000 observed values of .
The relative bias for the various estimators is given in Table 1 for several sampling fractions f=n/N. The second column gives the relative bias of , RB. Estimators (7), (8) and (10) seriously overestimate the variance. For all the sampling fractions that were considered, the proposed estimator (2) has negligible bias. Table 2 gives RRMSE for equations (2), (7), (8) and (10). We see that the proposed estimator (2) has the smallest RRMSE for almost every value of f.
Table 1
Relative bias with and without finite population correction (FPC)
f | RB() | Relative bias (%) for the following variance estimators: | ||||||
---|---|---|---|---|---|---|---|---|
Equation (2) | Equation (7) | Equation (7) with FPC | Equation (8) | Equation (8) with FPC | Equation (10) | Equation (10) with FPC | ||
0.03 | −6.16 | 1.18 | 20.18 | 16.56 | 16.86 | 13.34 | 18.66 | 15.09 |
0.05 | −4.30 | −1.08 | 12.85 | 7.23 | 11.05 | 5.52 | 12.22 | 6.63 |
0.07 | −2.76 | −2.34 | 9.33 | 1.65 | 8.12 | 0.52 | 8.99 | 1.33 |
0.10 | −2.08 | 0.43 | 11.39 | 0.25 | 10.53 | −0.52 | 11.20 | 0.08 |
0.12 | −1.93 | −0.01 | 10.58 | −2.69 | 9.88 | −3.31 | 10.46 | −2.81 |
0.15 | −1.30 | 1.70 | 12.69 | −4.24 | 12.11 | −4.73 | 12.60 | −4.31 |
0.20 | −0.88 | 0.77 | 12.96 | −9.63 | 12.53 | −9.98 | 12.91 | −9.67 |
0.40 | −0.45 | −1.16 | 22.68 | −26.39 | 22.44 | −26.53 | 22.66 | −26.40 |
f | RB() | Relative bias (%) for the following variance estimators: | ||||||
---|---|---|---|---|---|---|---|---|
Equation (2) | Equation (7) | Equation (7) with FPC | Equation (8) | Equation (8) with FPC | Equation (10) | Equation (10) with FPC | ||
0.03 | −6.16 | 1.18 | 20.18 | 16.56 | 16.86 | 13.34 | 18.66 | 15.09 |
0.05 | −4.30 | −1.08 | 12.85 | 7.23 | 11.05 | 5.52 | 12.22 | 6.63 |
0.07 | −2.76 | −2.34 | 9.33 | 1.65 | 8.12 | 0.52 | 8.99 | 1.33 |
0.10 | −2.08 | 0.43 | 11.39 | 0.25 | 10.53 | −0.52 | 11.20 | 0.08 |
0.12 | −1.93 | −0.01 | 10.58 | −2.69 | 9.88 | −3.31 | 10.46 | −2.81 |
0.15 | −1.30 | 1.70 | 12.69 | −4.24 | 12.11 | −4.73 | 12.60 | −4.31 |
0.20 | −0.88 | 0.77 | 12.96 | −9.63 | 12.53 | −9.98 | 12.91 | −9.67 |
0.40 | −0.45 | −1.16 | 22.68 | −26.39 | 22.44 | −26.53 | 22.66 | −26.40 |
Open in new tab
Table 1
Relative bias with and without finite population correction (FPC)
f | RB() | Relative bias (%) for the following variance estimators: | ||||||
---|---|---|---|---|---|---|---|---|
Equation (2) | Equation (7) | Equation (7) with FPC | Equation (8) | Equation (8) with FPC | Equation (10) | Equation (10) with FPC | ||
0.03 | −6.16 | 1.18 | 20.18 | 16.56 | 16.86 | 13.34 | 18.66 | 15.09 |
0.05 | −4.30 | −1.08 | 12.85 | 7.23 | 11.05 | 5.52 | 12.22 | 6.63 |
0.07 | −2.76 | −2.34 | 9.33 | 1.65 | 8.12 | 0.52 | 8.99 | 1.33 |
0.10 | −2.08 | 0.43 | 11.39 | 0.25 | 10.53 | −0.52 | 11.20 | 0.08 |
0.12 | −1.93 | −0.01 | 10.58 | −2.69 | 9.88 | −3.31 | 10.46 | −2.81 |
0.15 | −1.30 | 1.70 | 12.69 | −4.24 | 12.11 | −4.73 | 12.60 | −4.31 |
0.20 | −0.88 | 0.77 | 12.96 | −9.63 | 12.53 | −9.98 | 12.91 | −9.67 |
0.40 | −0.45 | −1.16 | 22.68 | −26.39 | 22.44 | −26.53 | 22.66 | −26.40 |
f | RB() | Relative bias (%) for the following variance estimators: | ||||||
---|---|---|---|---|---|---|---|---|
Equation (2) | Equation (7) | Equation (7) with FPC | Equation (8) | Equation (8) with FPC | Equation (10) | Equation (10) with FPC | ||
0.03 | −6.16 | 1.18 | 20.18 | 16.56 | 16.86 | 13.34 | 18.66 | 15.09 |
0.05 | −4.30 | −1.08 | 12.85 | 7.23 | 11.05 | 5.52 | 12.22 | 6.63 |
0.07 | −2.76 | −2.34 | 9.33 | 1.65 | 8.12 | 0.52 | 8.99 | 1.33 |
0.10 | −2.08 | 0.43 | 11.39 | 0.25 | 10.53 | −0.52 | 11.20 | 0.08 |
0.12 | −1.93 | −0.01 | 10.58 | −2.69 | 9.88 | −3.31 | 10.46 | −2.81 |
0.15 | −1.30 | 1.70 | 12.69 | −4.24 | 12.11 | −4.73 | 12.60 | −4.31 |
0.20 | −0.88 | 0.77 | 12.96 | −9.63 | 12.53 | −9.98 | 12.91 | −9.67 |
0.40 | −0.45 | −1.16 | 22.68 | −26.39 | 22.44 | −26.53 | 22.66 | −26.40 |
Open in new tab
Table 2
Relative root-mean-square error with and without finite population correction (FPC)
f | RRMSE (%) for the following variance estimators: | ||||||
---|---|---|---|---|---|---|---|
Equation (2) | Equation (7) | Equation (7) with FPC | Equation (8) | Equation (8) with FPC | Equation 10 | Equation (10) with FPC | |
0.03 | 91.13 | 126.78 | 122.52 | 123.71 | 119.61 | 124.46 | 120.29 |
0.05 | 74.95 | 97.67 | 92.28 | 96.44 | 91.20 | 96.86 | 91.55 |
0.07 | 66.67 | 81.56 | 75.34 | 80.84 | 74.78 | 81.10 | 74.95 |
0.10 | 59.25 | 71.24 | 63.29 | 70.74 | 62.96 | 71.00 | 63.10 |
0.12 | 55.35 | 64.88 | 56.39 | 64.50 | 56.18 | 64.74 | 56.29 |
0.15 | 50.08 | 58.15 | 48.41 | 57.83 | 48.29 | 58.03 | 48.33 |
0.20 | 43.24 | 50.36 | 40.11 | 50.13 | 40.09 | 50.30 | 40.07 |
0.40 | 28.67 | 40.17 | 33.05 | 40.00 | 33.15 | 40.14 | 33.05 |
f | RRMSE (%) for the following variance estimators: | ||||||
---|---|---|---|---|---|---|---|
Equation (2) | Equation (7) | Equation (7) with FPC | Equation (8) | Equation (8) with FPC | Equation 10 | Equation (10) with FPC | |
0.03 | 91.13 | 126.78 | 122.52 | 123.71 | 119.61 | 124.46 | 120.29 |
0.05 | 74.95 | 97.67 | 92.28 | 96.44 | 91.20 | 96.86 | 91.55 |
0.07 | 66.67 | 81.56 | 75.34 | 80.84 | 74.78 | 81.10 | 74.95 |
0.10 | 59.25 | 71.24 | 63.29 | 70.74 | 62.96 | 71.00 | 63.10 |
0.12 | 55.35 | 64.88 | 56.39 | 64.50 | 56.18 | 64.74 | 56.29 |
0.15 | 50.08 | 58.15 | 48.41 | 57.83 | 48.29 | 58.03 | 48.33 |
0.20 | 43.24 | 50.36 | 40.11 | 50.13 | 40.09 | 50.30 | 40.07 |
0.40 | 28.67 | 40.17 | 33.05 | 40.00 | 33.15 | 40.14 | 33.05 |
Open in new tab
Table 2
Relative root-mean-square error with and without finite population correction (FPC)
f | RRMSE (%) for the following variance estimators: | ||||||
---|---|---|---|---|---|---|---|
Equation (2) | Equation (7) | Equation (7) with FPC | Equation (8) | Equation (8) with FPC | Equation 10 | Equation (10) with FPC | |
0.03 | 91.13 | 126.78 | 122.52 | 123.71 | 119.61 | 124.46 | 120.29 |
0.05 | 74.95 | 97.67 | 92.28 | 96.44 | 91.20 | 96.86 | 91.55 |
0.07 | 66.67 | 81.56 | 75.34 | 80.84 | 74.78 | 81.10 | 74.95 |
0.10 | 59.25 | 71.24 | 63.29 | 70.74 | 62.96 | 71.00 | 63.10 |
0.12 | 55.35 | 64.88 | 56.39 | 64.50 | 56.18 | 64.74 | 56.29 |
0.15 | 50.08 | 58.15 | 48.41 | 57.83 | 48.29 | 58.03 | 48.33 |
0.20 | 43.24 | 50.36 | 40.11 | 50.13 | 40.09 | 50.30 | 40.07 |
0.40 | 28.67 | 40.17 | 33.05 | 40.00 | 33.15 | 40.14 | 33.05 |
f | RRMSE (%) for the following variance estimators: | ||||||
---|---|---|---|---|---|---|---|
Equation (2) | Equation (7) | Equation (7) with FPC | Equation (8) | Equation (8) with FPC | Equation 10 | Equation (10) with FPC | |
0.03 | 91.13 | 126.78 | 122.52 | 123.71 | 119.61 | 124.46 | 120.29 |
0.05 | 74.95 | 97.67 | 92.28 | 96.44 | 91.20 | 96.86 | 91.55 |
0.07 | 66.67 | 81.56 | 75.34 | 80.84 | 74.78 | 81.10 | 74.95 |
0.10 | 59.25 | 71.24 | 63.29 | 70.74 | 62.96 | 71.00 | 63.10 |
0.12 | 55.35 | 64.88 | 56.39 | 64.50 | 56.18 | 64.74 | 56.29 |
0.15 | 50.08 | 58.15 | 48.41 | 57.83 | 48.29 | 58.03 | 48.33 |
0.20 | 43.24 | 50.36 | 40.11 | 50.13 | 40.09 | 50.30 | 40.07 |
0.40 | 28.67 | 40.17 | 33.05 | 40.00 | 33.15 | 40.14 | 33.05 |
Open in new tab
To see whether the difference between the bias of equations (2), (7), (8) and (10) is due to the finite population correction, we have multiplied the variance estimators (7), (8) and (10) by 1−f. The RB- and the RRMSE-values are given in the columns that are headed by ‘with FPC’ in Tables 1 and 2. We see that, for large sampling fractions, this correction tends to lead to underestimation of the variance. For small sampling fractions, the finite population correction cannot eliminate the large positive bias. This may be caused by the skewness of the πi and the small sample size.
7. Discussion
The jackknife variance estimator that is proposed in equation (2) is applicable to general unequal probability designs and is design consistent in circ*mstances where the linearization variance estimator is consistent. A Monte Carlo study shows that the estimator proposed can demonstrate clear improvements compared with existing jackknife estimators. It naturally includes a finite population correction which is usually absent in the standard jackknife methods and may be of particular use for surveys with large sampling fractions.
The jackknife method proposed may be extended in various ways. Point estimators, such as calibration estimators (e.g. Deville and Särndal (1992)), which employ auxiliary population information may often be expressible as functions of means if the function g(·) may be specified in terms of this auxiliary finite population information. The method may in principle be extended to other point estimators which may be expressed as differentiable functionals (Hampel, 1974; Campbell, 1980), although it is well known that the consistency result will not extend to all non-smooth functions of means, such as quantiles.
The practical advantage of the method proposed is its breadth of applicability. A potential disadvantage is that it is constructed by deleting one sample element at a time in contrast with the usual deletion of clusters and this may lead to a major increase in computation. Furthermore, the method assumes that joint inclusion probabilities πij for sample units are available. If not, then various approximations to these joint inclusion probabilities may be used (e.g. Hájek (1964) and Berger (1998)). Multistage sampling with unequal probability sampling without replacement at each stage merits particular further research. The application of the method proposed when the first- and second-order inclusion probabilities are available for each stage of sampling and the potential use of equation (2) at each stage could be considered and compared with standard jackknife methods which delete primary sampling units.
Acknowledgements
The authors are grateful to J. N. K. Rao (Carleton University, Canada) and to two referees for helpful comments.
References
1
Berger Y. G.
1998
)
Rate of convergence to asymptotic variance for the Horvitz–Thompson estimator
.
J. Statist. Planng Inf.
,
74
,
149
–
168
.
2
Campbell C.
1980
)
A different view of finite population estimation
.
Proc. Surv. Res. Meth. Sect. Am. Statist. Ass.
,
319
–
324
.
OpenURL Placeholder Text
3
Chao M. T.
1982
)
A general purpose unequal probability sampling plan
.
Biometrika
,
69
,
653
–
656
.
4
Davison A. C. Hinkley D. V.
1997
)
Bootstrap Methods and Their Application
. Cambridge:
Cambridge University Press
.
5
Deville J. C. Särndal C. E.
1992
)
Calibration estimators in survey sampling
.
J. Am. Statist. Ass.
,
87
,
376
–
382
.
6
Hájek J.
1964
)
Asymptotic theory of rejective sampling with varying probabilities from a finite population
.
Ann. Math. Statist.
,
35
,
1491
–
1523
.
7
Hájek J.
1981
)
Sampling in Finite Population
. New York:
Dekker
.
OpenURL Placeholder Text
8
Hampel R. F.
1974
)
The influence curve and its role in robust estimation
.
J. Am. Statist. Ass.
,
69
,
383
–
393
.
9
Harville D. A.
1997
)
Matrix Algebra from a Statistician's Perspective
. New York:
Springer
.
10
Hinkley D. V.
1977
)
Jackknife in unbalanced situations
.
Technometrics
,
19
,
285
–
292
.
11
Horvitz G. G. Thompson D. J.
1952
)
A generalization of sampling without replacement from a finite universe
.
J. Am. Statist. Ass.
,
4
,
663
–
685
.
OpenURL Placeholder Text
12
Isaki C. T. Fuller W. A.
1982
)
Survey design under the regression superpopulation model
.
J. Am. Statist. Ass.
,
77
,
89
–
96
.
13
Jones H. L.
1974
)
Jackknife estimation of functions of stratum means
.
Biometrika
,
61
,
343
–
348
.
OpenURL Placeholder Text
14
Kish L. Frankel M. R.
1974
)
Inference from complex samples (with discussion)
.
J. R. Statist. Soc. B
,
36
,
1
–
37
.
OpenURL Placeholder Text
15
Kovar J. G. Rao J. N. K. Wu C. F. J.
1988
)
Bootstrap and other methods to measure errors in survey estimates
.
Can. J. Statist.
,
16
,
25
–
45
.
16
Krewski D. Rao J. N. K.
1981
)
Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods
.
Ann. Statist.
,
9
,
1010
–
1019
.
17
Lee K.
1973
)
Variance estimation in stratified sampling
.
J. Am. Statist. Ass.
,
68
,
336
–
342
.
18
Rao J. N. K. Wu C. F. J.
1985
)
Inference from stratified samples: second-order analysis of three methods for nonlinear statistics
.
J. Am. Statist. Ass.
,
80
,
620
–
630
.
19
Rao J. N. K. Wu C. F. J. Yue K.
1992
)
Some recent work on resampling methods for complex surveys
.
Surv. Methodol.
,
18
,
209
–
217
.
OpenURL Placeholder Text
20
Robinson P. M. Särndal C. E.
1983
)
Asymptotic properties of the generalized regression estimator in probability sampling
.
Sankhya
B,
45
,
240
–
248
.
OpenURL Placeholder Text
21
Rosén B.
1972
)
Asymptotic theory for successive sampling with varying probabilities without replacement, I
.
Ann. Math. Statist.
,
43
,
373
–
397
.
22
Särndal C. E. Swenson B. Wretman J. H.
1992
)
Model Assisted Survey Sampling
. New York:
Springer
.
23
Shao J.
1989
)
The efficiency and consistency of approximation to the jackknife variance estimator
.
J. Am. Statist. Ass.
,
84
,
114
–
119
.
24
Shao J.
1993
)
Differentiability of statistical functionals and consistency of the jackknife
.
Ann. Statist.
,
21
,
61
–
75
.
25
Shao J. Tu D.
1995
)
The Jackknife and Bootstrap
. New York:
Springer
.
26
Tukey J. W.
1958
)
Bias and confidence in not-quite large samples (abstract)
.
Ann. Math. Statist.
,
29
,
614
.
OpenURL Placeholder Text
27
Valliant R. Dorfman A. H. Royall R. M.
2000
)
Finite Population Sampling and Inference: a Prediction Approach
. New York:
Wiley
.
OpenURL Placeholder Text
28
Wolter K. M.
1985
)
Introduction to Variance Estimation
. Berlin:
Springer
.
OpenURL Placeholder Text
29
Yates F. Grundy P. M.
1953
)
Selection without replacement from within strata with probability proportional to size
.
J. R. Statist. Soc.
B,
15
,
253
–
261
.
OpenURL Placeholder Text
Appendix A: Assumptions and proof of theorem 1
The following assumptions will be made.
- (a)
, where is the linearization variance estimator that is given by
(11)
where
with
- (b)
|1−wi|α>0 for all i ∈ 𝒰, where α is a constant (free of t).
- (c)
- (d)
, for all τ2, where ║·║ denotes the Euclidean norm defined by ║A║=tr(ATA)1/2.
- (e)
, where
(12)
- (f)
, where
(13)
- (g)
∇(x) is Lipschitz continuous of order δ>0 (e.g. Shao and Tu (1995), page 43) in the sense that
for a constant λ>0, where x1 and x2 are in the neighbourhood of μ.
- (h)
.
Assumption (a) states that the linearization variance estimator is consistent. An example of sufficient conditions for this assumption to hold can be found in Krewski and Rao (1981). Assumption (b) ensures that none of the weights (1) can approach 1, which would represent a degenerate design. Assumption (c) holds in the standard circ*mstances where the linearized variance decreases with rate n−1 (Shao and Tu (1995), page 260). It holds when , where ν is a positive constant. This inequality is similar to the Cramér–Rao lower bound. Assumption (d) is an assumption about the behaviour of the weights and the existence of moments of the yi, which would hold, for example, if the nwi and the yi were bounded. Assumptions (e) and (f) are mild assumptions on the design, similar to ones in Isaki and Fuller (1982). For example, with simple random sampling without replacement, Gs1−n/N=Op(1) and Hs=0. Moreover if the condition of Yates and Grundy (1953) holds, Dij<0 for all i and j, implying that Hs=0. Assumptions (g) and (h) are smoothness requirements of the function g(·).
A.1. Proof of theorem 1
From the mean value theorem, we have
where ξi is a point between and and is the remainder given by
Thus,
where
(14)
It can be shown that
(15)
implying that
(16)
Thus, by substituting equation (16) into equation (2), we obtain
with
(17)
(18)
Hence, theorem 1 follows if we may show
(19)
(20)
(21)
Assumption (a) implies expression (19). It is therefore only necessary to show expressions (20) and (21). We start by showing expression (20). From equation (17),
Furthermore, by definition of and in expressions (12) and (13), we have
(22)
where
and
By the Cauchy inequality,
Now, as
with , we have
(23)
where
(24)
Moreover,
(25)
(26)
Thus, assumption (e) and inequality (23) imply that , if and . The Cauchy inequality (e.g. Harville (1997), page 62) further implies that
Combining this last inequality with equation (15), we obtain
(27)
Assumption (g) implies that there are constants λ>0 and δ>0 such that
(28)
As ξi is a point between and ξi, we have . Combining this last inequality with equation (15), we obtain
which combined with inequality (28) gives
Now, using assumption (b), we have
(29)
Thus, inequalities (27) and (29) imply that
(30)
First, we show that . Combining inequalities (25) and (30), we obtain
(31)
Assumption (c) implies that
(32)
Now assumption (d) and expressions (31) and (32) imply that , i.e.
(33)
Secondly, we show that . Combining inequalities (26) and (30), we obtain
(34)
Now assumption (d) and expressions (34) and (32) imply that , i.e.
(35)
Thirdly, assumption (e) and expressions (23), (33) and (35) imply that
(36)
Now, we show that . We have by the Cauchy inequality
Thus, assumption (f) and expressions (33) and (35) imply that
(37)
Consequently, expression (20) follows from expressions (36) and (37). To complete the proof we need to show expression (21). By the triangle inequality, equation (18) implies that
with , where
and
By the Cauchy inequality, and , with
(38)
Thus, expression (21) follows from assumptions (e) and (f), if we can show that . The Cauchy inequality implies that . By substituting the last inequality and inequality (30) into equation (38), we obtain
(39)
Now, from assumptions (c) and (d) and expressions (32) and (39), we have
which implies expression (21), completing the proof.
© 2005 Royal Statistical Society
This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)