Next Article in Journal
Intelligent Evaluation System of Water Inrush in Roadway (Tunnel) and Its Application
Next Article in Special Issue
A Copula-Based Bayesian Network for Modeling Compound Flood Hazard from Riverine and Coastal Interactions at the Catchment Scale: An Application to the Houston Ship Channel, Texas
Previous Article in Journal / Special Issue
Application of Copula Functions for Rainfall Interception Modelling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Confidence Regions for Multivariate Quantiles

by
Maximilian Coblenz
1,*,
Rainer Dyckerhoff
2 and
Oliver Grothe
1
1
Karlsruhe Institute of Technologie (KIT), Institute of Operations Research, 76131 Karlsruhe, Germany
2
Institute of Econometrics and Statistics, University of Cologne, 50923 Cologne, Germany
*
Author to whom correspondence should be addressed.
Water 2018, 10(8), 996; https://doi.org/10.3390/w10080996
Submission received: 26 June 2018 / Revised: 20 July 2018 / Accepted: 25 July 2018 / Published: 27 July 2018

Abstract

:
Multivariate quantiles are of increasing importance in applications of hydrology. This calls for reliable methods to evaluate the precision of the estimated quantile sets. Therefore, we focus on two recently developed approaches to estimate confidence regions for level sets and extend them to provide confidence regions for multivariate quantiles based on copulas. In a simulation study, we check coverage probabilities of the employed approaches. In particular, we focus on small sample sizes. One approach shows reasonable coverage probabilities and the second one obtains mixed results. Not only the bounded copula domain but also the additional estimation of the quantile level pose some problems. A small sample application gives further insight into the employed techniques.

1. Introduction

The track record of multivariate quantiles in hydrology is long and started with the papers by [1,2,3]. Quickly, a growing amount of literature on this topic with an application focus arose (see, e.g., [4,5,6]). A thorough overview of the current state of the art can be found in [7]. The notion of multivariate quantile we use in this paper is based on copulas. It has the nice feature that a 100 % · p multivariate quantile separates the copula domain into two sets, one comprising p, the other comprising 1 p of the total probability mass. Some theoretical aspects can be found in e.g., [8,9,10].
Not only the estimation of multivariate quantiles is important, but also an assessment of the estimation uncertainty. Confidence regions can be an essential tool for doing this. In contrast to pointwise confidence bands, confidence regions provide a holistic precision analysis of multivariate quantiles. For example, Refs. [11,12] construct confidence regions for multivariate quantiles based on highest density regions [13]. However, in principle, any approach for constructing confidence regions of level sets is applicable since the multivariate quantiles considered are specific level sets.
We attempt to fill this research gap and contribute to the existing literature on multivariate quantiles in several ways. First, we extend two recently developed approaches for construction of level set confidence regions by Mammen and Polonik [14] and Chen et al. [15] to the estimation problem at hand. Note that the multivariate quantiles considered here are level sets at specific levels of the copula. However, in contrast to the cited works, where the levels are known and fixed in advance, the level of the multivariate quantile has to be estimated. Second, we check the coverage probabilities of the extended methods by a simulation study in order to investigate their reliability. Finally, we apply the methods on a small sample of flood data to gain further insights.
The paper is structured as follows: The next section introduces copulas and the notion of multivariate quantiles used here. The confidence region approaches by Mammen and Polonik [14] and Chen et al. [15] are discussed in Section 3. Moreover, they are extended to multivariate quantile estimation. In Section 4, a simulation study is conducted in order to explore the strengths and weaknesses of the considered methods. The paper is concluded by an application on a small sample of flood data and a discussion of some further aspects.

2. Copulas and Multivariate Quantiles

This section introduces both copulas and the notion of multivariate quantiles we use throughout the paper. Additionally, the notation and some preliminaries are covered. According to Sklar’s theorem [16], every distribution function F of a continuous d-variate random variable X can be decomposed into a copula C and its univariate marginal distributions F 1 , , F d by
F ( x 1 , , x d ) = C ( F 1 ( x 1 ) , , F d ( x d ) ) .
This allows for separating the marginal distributions and the overall dependence structure of X . The copula itself is a distribution function of the random variable U = ( U 1 , , U d ) = ( F 1 ( x 1 ) , , F d ( x d ) ) . Note that the univariate components of U are uniformly distributed. A good theoretical introduction to copulas can be found, e.g., in [17,18,19]. An introduction for practical purposes can be found, e.g., in [8,20].
Let X 1 , , X n be an i.i.d. sample of a random vector X and let x i j be the ith component of the vector X j . The copula C of X may be estimated based on the so called pseudo observations U ^ j = u ^ 1 j , , u ^ d j , j = 1 , , n . These can be obtained either by estimation of the marginal distributions F ^ i ( x ) , i.e.,
U ^ j = F ^ 1 ( x 1 j ) , , F ^ d ( x d j ) ,
or by rank transformation of the data, i.e.,
U ^ j = 1 n + 1 ( vector of component wise ranks of X j in X 1 , , X n ) .
Note that estimation of the marginal distributions is prone to model misspecification [20]. Hence, a rank transformation is often preferable.
Using the pseudo observations U ^ j , j = 1 , , n , the copula can be estimated in different fashions. The estimator C ^ is called empirical copula and obtained by the empirical distribution of the pseudo observations
C ^ ( u ) = 1 n j = 1 n 1 { U ^ j u } .
A second estimator C ^ h that we use later on is based on kernel estimation. It is obtained by
C ^ h ( u ) = 1 n i = 1 n K h ( Φ 1 ( u ) Φ 1 ( U ^ i ) ) ,
where K h ( x ) = K ( x / h ) is the scaled version of a suitable multivariate kernel K and Φ 1 is the inverse standard normal cumulative distribution function (CDF) applied component-wise. Using a multiplicative kernel, this estimator is investigated in [21]. The transformation Φ 1 circumvents potential boundary issues that can arise in the copula domain [ 0 , 1 ] d . It is also recommended in [18]. Apart from choosing a kernel, the estimator also requires a bandwidth parameter h. in this paper, we choose K h to be a multiplicative multivariate Gaussian kernel and h = 4 d + 2 n 1 d + 4 , which is Silverman’s rule of thumb [22]. As will become clear later, these choices are particularly easy to work with and let us generate confidence regions in the original copula domain.
The Kendall distribution function K C : [ 0 , 1 ] [ 0 , 1 ] [23,24,25] gives the probability that the copula C stays at or below a given level p, i.e.,
K C ( p ) = P ( C ( u 1 , , u d ) p ) .
Barbe et al. [23] show that the Kendall distribution function can be estimated non-parametrically from a sample of size n by
K ^ C ( p ) = 1 n j = 1 n 1 { V j p } ,
where V j = # { k j | X k X j } n 1 . In addition to that, we need an estimator of the inverse of K C ( p ) . For 0 < p < 1 , this is obtained by
K ^ C 1 ( p ) = inf { t | K ^ C ( t ) p } .
Furthermore, we define K ^ C 1 ( 0 ) = 0 and K ^ C 1 ( 1 ) = 1 . Let plim denote convergence in probability. It can be shown that not only plim K ^ C ( p ) = K C ( p ) for p [ 0 , 1 ] [23], but also plim K ^ C 1 ( p ) = K C 1 ( p ) for p < 1 [26]. Moreover, K ^ C is strongly consistent [27].
Using the previous concepts of copulas and the Kendall distribution function, we can now define the notion of multivariate quantiles we use in this paper. It has been used previously in, e.g., [7,9] and in a similar fashion in [10]. in the following, let C denote the class of copulas for which K C is strictly increasing and continuous.
Definition 1
([9]). For a copula C C and p [ 0 , 1 ] a multivariate quantile is defined as
S p ( C ) : = { u [ 0 , 1 ] d : C ( u ) K C 1 ( p ) } .
We can now write P ( S p ( C ) ) = P ( C ( u ) K C 1 ( p ) ) = 1 P ( C ( u ) K C 1 ( p ) ) = 1 K C ( K C 1 ( p ) ) = 1 p . Hence, the boundary of the p · 100 % multivariate quantile partitions the copula domain into a set comprising probability mass p and a set comprising probability mass 1 p , which is a nice feature of this particular definition. Furthermore, the shape of the boundary is determined by the shape of the level curve of the copula. The level curve reflects the distribution of the probability mass and the strength of dependence between the involved variables (see, e.g., [28]) which transfers to the quantile definition here. For further motivation and theoretical considerations of this approach, see [7,10]. Note that, because R d , d > 1 has no total ordering, there are many other notions of multivariate quantiles (see, e.g., [4,29,30,31,32]). However, we do not consider these further here.
S p can be estimated either by
S ^ p ( C ^ ) = { u R d | C ^ ( u ) K ^ C 1 ( p ) } ,
or by
S ^ p ( C ^ h ) = { u R d | C ^ h ( u ) K ^ C 1 ( p ) } ,
where K ^ C 1 is as defined in Equation (4), C ^ is the empirical copula (1), and C ^ h is the kernel estimated copula (2). The estimator S ^ p ( C ^ ) is consistent [10]. An algorithm to construct the estimator on a given bivariate copula sample can be found in [10].
We want to point out that the estimators (5) and (6) can be used for cases three and four in [7]. in addition, the estimators cover the multivariate quantiles used in [10]. Furthermore, note that we use a non-parametric approach for multivariate quantile estimation here. Parametric and semi-parametric estimators can be found, e.g., in [8,9].
A further concept we employ is the Hausdorff distance δ H . It plays a key role in one of the approaches to construct confidence regions (see Section 3.2). Let the Euclidean distance between two points x , y R d be denoted by δ ( x , y ) = x y . We can then define the distance between a point x R d and a set A R d as δ ( x , A ) = inf y A δ ( x , y ) . If the set A is closed, min instead of inf can be used. The Hausdorff distance δ H can then be defined as follows.
Definition 2.
For non-empty subsets A , B R d the Hausdorff distance δ H ( A , B ) is defined by
δ H ( A , B ) = max { sup x A δ ( x , B ) , sup x B δ ( x , A ) } .
In general, the Hausdorff distance may be infinite. However, since we consider only subsets of the compact set [ 0 , 1 ] d , the Hausdorff distance is always finite. in the next section, we introduce two approaches to construct confidence regions for level set estimation and extend them to multivariate quantiles.

3. Confidence Regions for Multivariate Quantiles

In this section, we introduce the approaches by Mammen and Polonik [14] and Chen et al. [15]. These construct confidence regions for estimated level sets. The Mammen and Polonik [14] method is applicable to level sets of any functions. On the other hand, the method by Chen et al. [15] is developed for densities. Both approaches assume a fixed level for which the level set is estimated. This is different from what we need in the context of multivariate quantiles. Thus, we make necessary extensions to the approaches in order to make them applicable to estimated multivariate quantiles. We introduce each method in turn and extend it. Moreover, we address some computational aspects.
We want to point out that there are other methods to construct confidence regions for multivariate quantiles. For example, Refs. [11,12] follow a quite different approach based on highest density regions [13]. This method constructs confidence regions which are centered at the distribution of points on a level set. in contrast to that, the approaches by Mammen and Polonik [14] and Chen et al. [15] yield confidence regions that bound the multivariate quantile. Thus, the techniques are principally incomparable to one another. Therefore, we do not consider approaches based on highest density regions further.

3.1. Approach by Mammen and Polonik (2013)

The approach by Mammen and Polonik [14] is based on the supremum distance between a function and its estimate on a specific set of points. It can be used to construct confidence regions for level sets of the form L = { x R d : f ( x ) 0 } of an arbitrary function f : R d R . Note that, by using the function h ( x ) = f ( x ) λ , instead, one can construct confidence regions for level sets of f at any level λ . In the following, let L = { x R d : f ( x ) > 0 } and let n denote the sample size. The approach seeks to find sets L ^ and L ^ u such that
P ( L ^ L and L L ^ u ) n 1 α ,
where 1 α is the confidence level. The sets L ^ and L ^ u are estimated by
L ^ = { x R d : f ^ ( x ) > b ^ n } and L ^ u = { x R d : f ^ ( x ) b ^ n } ,
where f ^ is an estimator of f and b ^ n is an estimator of the 1 α quantile of Z = sup x R d : | f ( x ) | β | f ^ ( x ) f ( x ) | . Since the distribution of Z is unknown, Mammen and Polonik [14] suggest using a bootstrap.
The approach above is not directly applicable to the multivariate quantiles S p since it assumes the level λ to be fixed. in contrast to that, estimation of S p requires estimation of the level K C 1 ( p ) . Thus, we have to extend the method. Let U 1 , , U n be a d-dimensional copula sample. Then, the approach by Mammen and Polonik [14] is extended and applied in Algorithm 1.
Note that, by incorporating the estimation of K C 1 ( p ) into Step 5, we account for the estimation uncertainty of S ^ p ( C ^ ) and K ^ C 1 ( p ) simultaneously. Thus, we propose to use h ( x ) = C ( x ) K C 1 ( p ) and we can write
sup x R d h ^ ( x ) h ( x ) = sup x R d C ^ ( x ) K ^ C 1 ( p ) C ( x ) K C 1 ( p ) = sup x R d C ^ ( x ) C ( x ) K ^ C 1 ( p ) K C 1 ( p ) sup x R d C ^ ( x ) C ( x ) + K ^ C 1 ( p ) K C 1 ( p ) .
Recall that both the empirical copula and the Kendall distribution function are strongly consistent [27,33,34] and that K C 1 ( p ) is strongly consistent when K C ( p ) is continuous and strictly monotone [10]. Hence, the term above converges to 0 for n and we expect the approach to be a valid extension of [14]. Furthermore, the approach is easy to implement and is computationally very efficient. An accompanying MATLAB implementation of Algorithm 1 can be found in the Supplementary Material.
Algorithm 1 Extension of Mammen and Polonik (2013).
1:
Choose the level p and the confidence level 1 α .
2:
Estimate K C 1 ( p ) and S p on U 1 , , U n according to Equation (4) and Equation (5), respectively.
3:
Determine Δ n = { u { U 1 , , U n } : β n C ^ ( u ) K ^ C 1 β n } , where β n = n 1 / 2 and C ^ is the empirical copula.
4:
Draw n b s bootstrap samples U 1 , , U n . Repeat Step 2 on each of these.
5:
Let C ^ i and K ^ C , i 1 be the empirical copula and estimated inverse Kendall function of the ith bootstrap sample. Determine Z i = max u Δ n | C ^ i ( u ) K ^ C , i 1 C ^ ( u ) + K ^ C 1 | for each i = 1 , , n b s .
6:
Estimate b n as the empirical 1 α -quantile of Z = ( Z 1 , , Z n b s ) .
7:
The confidence region of S ^ p ( C ^ ) is determined by the two sets S ^ = { v [ 0 , 1 ] d : C ^ ( v ) > K ^ C 1 ( p ) b ^ n } and S ^ u = { v [ 0 , 1 ] d : C ^ ( v ) K ^ C 1 ( p ) + b ^ n } .
Figure 1 shows an exemplary application of the approach on a bivariate Clayton copula sample (left panel) and a bivariate Gumbel copula sample (right panel) of size 100 each, where we have bootstrapped 1000 times. The blue line depicts the boundary of the estimated multivariate quantile S ^ p ( C ^ ) , whereas the gray line depicts the theoretical boundary of S p with p = 0.9 . The orange and green lines are the boundaries of the sets S ^ and S ^ u for α = 0.1 and α = 0.05 , respectively.

3.2. Approach by Chen et al. (2017)

The approach by Chen et al. [15] is based on the Hausdorff distance δ H between an estimated level set and its theoretical counterpart. Note that, in the following, we present Method 1 in [15]. The second method in [15] is very similar to the approach in the previous section. Additionally, Chen et al. [15] state that the approach by Mammen and Polonik [14] should yield better results compared to their second method.
Chen et al. [15] focus on confidence regions for density level sets of the form L = { x R d : f h ( x ) = λ } , where f h is the convolution of a density f and a kernel K . Given a sample, L can be estimated with a kernel density estimator f ^ h of f h as L ^ = { x R d : f ^ h ( x ) = λ } . Let W be the Hausdorff distance between L and L ^ , i.e., W = δ H ( L , L ^ ) . The confidence region of L ^ is then
R ^ = x L ^ { y : | | x y | | w n } ,
where w n is the 1 α quantile of W. This amounts to drawing a sphere of radius w around each point in L ^ . It can be shown that
P ( L R ^ ) 1 α ,
where 1 α is the confidence level. Since the distribution of W is unknown, bootstrapping is suggested by [15].
Similar to the approach in the previous section, the method by Chen et al. [15] is not directly applicable to multivariate quantiles. However, not only the estimation of K C 1 has to be considered, but also that copulas are distribution functions and not densities. Additionally, the method of Chen et al. [15] assumes an unbounded domain which is not the case in a copula context. Again, let U 1 , , U n be a d-dimensional copula sample. We extend the approach in Algorithm 2.
Algorithm 2 Extension of Chen et al. (2017).
1:
Choose the level p and the confidence level 1 α .
2:
Estimate K C 1 ( p ) on U 1 , , U n according to Equation (4).
3:
Estimate S p based on the kernel density estimate C ^ h on U 1 , , U n using Silverman’s rule of thumb [22] and a Gaussian kernel according to Equation (6).
4:
Draw n b s bootstrap samples U 1 , , U n . Repeat Step 2 and Step 3 on each of these.
5:
Determine the Hausdorff distance δ H between S ^ p ( C ^ h ) of the original sample and each bootstrapped S ^ p i ( C ^ h i ) , i = 1 , , n b s , where C ^ h i is the kernel density estimated copula on bootstrap sample i.
6:
Estimate w n as the empirical 1 α -quantile of W ^ = ( δ H ( S ^ p , S ^ p 1 ) , , δ H ( S ^ p , S ^ p n b s ) ) .
7:
The confidence region is x S ^ p ( C ^ h ) B ( x , w ^ n ) , where B ( x , w ^ n ) = { y : | | x y | | w ^ n } .
This method is computationally more demanding than the approach by Mammen and Polonik [14]. Note that issues caused by the bounded copula domain are circumvented by using the Probit transformation in C ^ h (cf Equation (2)). Thus, standard kernel density estimation can be used, which is readily available in pertinent statistical software. The result of Step 3 in Algorithm 2 is a set of finitely many points x R d which make up the boundary of the multivariate quantile S ^ p ( C ^ h ) in the space R d . By using a Gaussian kernel and Silverman’s rule of thumb [22] for the bandwidth h, a point x on the boundary can be transformed back to the copula domain [ 0 , 1 ] d by
Φ x 1 + h 2 u ,
where Φ is the standard normal CDF applied component-wise. This allows us not only to compute the Hausdorff distance in Step 5 on the bounded copula domain but also to construct subsequently the confidence regions in [ 0 , 1 ] d . Note that we interpolate the points on the multivariate quantile boundary of the kernel density estimation linearly, which introduces a small numerical imprecision to the Hausdorff distance calculation. By incorporating the estimation of K C 1 ( p ) in Step 4, we account for its estimation uncertainty simultaneously. An accompanying MATLAB implementation of Algorithm 2 can be found in the Supplementary Material.
Figure 2 shows exemplary confidence region estimation results on a bivariate Clayton copula sample (left panel) and bivariate Gumbel copula sample (right panel) of size 100 each. We have used 1000 bootstrap samples for each plot. The color coding is as in Figure 1 above: The blue line depicts the boundary of the estimated multivariate quantile S ^ p ( C ^ h ) , whereas the gray line depicts the theoretical boundary of S p for p = 0.9 . The orange and green lines are the boundaries of the confidence region x S ^ p ( C ^ h ) B ( x , w ^ n ) for α = 0.1 and α = 0.05 , respectively.
Note that we have extended the approaches by Mammen and Polonik [14] and Chen et al. [15] in several aspects to make them applicable for the estimation of multivariate quantiles. It is not quite clear whether they retain their statistical properties and how they behave on small sample sizes. In particular, it is interesting to investigate the proposed confidence level 1 α via coverage probabilities. We do this with a simulation study in the next section.

4. Simulation Study

We investigate in a simulation study whether the extended approaches introduced in Section 3.1 and Section 3.2 hold their proposed confidence level 1 α via coverage probabilities. In particular, we focus on small sample sizes as they are found in hydrology applications. For both approaches, we consider the same simulation settings. We simulate samples of sizes n = 100 , 200 from Gauss, Clayton, and Gumbel copulas, where we restrict ourselves to the bivariate case. The Gauss copula has a parameter ρ corresponding to a Kendall’s τ of 0.8 , 0.5 , 0 , 0.5 , 0.8 , whereas, for the Clayton and the Gumbel copula settings, the parameters correspond to a Kendall’s τ of 0.3 , 0.5 , 0.8 . Note that in the Gauss case a Kendall’s τ of 0 corresponds to independence.
For each setting, we estimate confidence regions for the p = 0.1 , 0.5 , 0.9 multivariate quantile to get a better picture of the performance on the whole copula domain. Confidence regions are estimated at the 90 % and 95 % confidence levels. For this, we use 1000 bootstraps for the Mammen and Polonik [14] approach and 200 bootstraps for the Chen et al. [15] approach, due to the high computation times of the latter. Each simulation setting is repeated 1000 times to obtain reliable results. The coverage probability is calculated by checking whether the theoretical multivariate quantile boundary lies within the estimated confidence region in each simulation run. For example, Figure 1 and Figure 2 show cases where the theoretical multivariate quantile is covered by the confidence region.
The coverage probabilities for the extended Mammen and Polonik [14] approach can be found in Table 1. The first sanity check which can be made is that the 95 % confidence region exhibits higher coverage probabilities than the 90 % confidence region, which is the case throughout. Most of the settings for the 10 % and 50 % multivariate quantiles show more conservative coverage probabilities than the respective confidence level would suggest. In contrast to that, particularly the negative dependence settings for p = 0.9 exhibit too low coverage probabilities. Too high and too low coverage probabilities could be due to the estimation uncertainty of K 1 and the bounded copula domain. The results over the different sample sizes are very similar. We conclude from this that the estimator works quite well for small sample sizes. Overall, the results for the Mammen and Polonik [14] approach are reasonable.
The results of the extended Chen et al. [15] approach can be found in Table 2. Most of the coverage probabilities are too low. In particular, confidence regions for high dependence seem to be problematic. In contrast to that, the results are reasonable for low to medium strong dependence, i.e., τ [ 0.5 , 0.5 ] . This could be due to several effects. First, the bounded copula domain could be an issue. Second, the original approach by Chen et al. [15] was developed for densities and not for copulas which are distribution functions. In addition, the estimation of K C 1 is present in the approach. However, we do not think that the latter plays an important role since the results for the Mammen and Polonik [14] approach are good where estimation of K C 1 is also necessary. Finally, we calculate the coverage probabilities by checking whether the level curve at level K C 1 ( p ) of the underlying copula C is within the boundaries of the constructed confidence set since we are actually interested in the level curves of the copula C. In contrast to that, the approach of Chen et al. [15] aims to estimate confidence regions for the level curves of a convolution of the copula C and the kernel K h , whereby a certain smoothness and limit behavior of the results in ensured. This could lead to the biased coverage probabilities in our case.
In conclusion, the simulation study shows a reasonable performance of the extended Mammen and Polonik [14] method. On the other hand, results for the extended Chen et al. [15] method are mixed. They are, however, reasonably precise for low to medium strong dependence. In summary, we advise practitioners to use the Mammen and Polonik [14] approach for construction of multivariate quantile confidence regions. In the next section, we apply the introduced methods on a small hydrology related data set to gain further insights.

5. Application

We apply the two confidence region approaches on a small data set with a hydrology context. The data can be found in [35]. It comprises 33 yearly maximum values of flood peak and flood volume of the Ashuapmushuan basin in Quebec, Canada. The observations were collected in the period 1963–1995. In a first step, we rank-transform the data to obtain the pseudo observations in the copula domain [ 0 , 1 ] 2 . Figure 3 shows a scatterplot of the original data and the rank-transformed data. The data exhibit positive dependence with a Kendall’s τ of approximately 0.41 .
In a second step, we estimate the 90 % (i.e., p = 0.9 ) multivariate quantile with the two estimators S ^ p ( C ^ ) and S ^ p ( C ^ h ) . The estimation results are shown in Figure 4. As can be seen, the two estimated boundaries nicely overlap. For comparison purposes, we additionally estimate a parametric copula model. A Clayton copula with parameter θ ≈ 1.4 fits the data best among Gumbel, Frank, Gauss-, and t-copulas. The estimated boundary is shown in Figure 4 as a red line and is close to the non-parametric estimates.
In a third step, we apply the extended method of Mammen and Polonik [14] as introduced in Section 3.1 to the data. The result of this can be seen in Figure 4. The orange and green step curves depict the confidence region boundaries for confidence levels 90 % and 95 % , respectively. Recall that the boundary of the 90 % multivariate quantile partitions the copula domain into a set comprising 10 % of the probability mass, which lies to the upper right of the boundary, and a set comprising 90 % of the probability mass, which lies to the lower left of the boundary. Counting the points within the confidence region boundaries, we obtain between 33 % and 3 % of the points for the 90 % confidence region and between 36 % and 3 % of the points for the 95 % confidence region. Thus, the confidence regions seem wide, which has to be related to the small sample size though.
Next, we also apply the extended method of Chen et al. [15] as introduced in Section 3.2. Figure 4 shows the results. The orange and green smooth lines depict the confidence region boundaries for confidence levels 90 % and 95 % , respectively. With the same calculations as above, both the 90 % confidence region and the 95 % confidence region enclose between 21 % and 3 % of the points. Thus, the confidence regions are tighter than those of the Mammen and Polonik [14] method. This can also be seen in Figure 4. Clearly, the approach of Chen et al. [15] gives a tighter confidence region on the lower end, whereas the two approaches give similar results on the upper end. This has to be put in light of the simulation study, which shows too liberal coverage probabilities for the method of Chen et al. [15] in the considered case p = 0.9 and moderate positive dependence.
Furthermore, we analyze the secondary return period as defined in [36]. The estimated secondary return period given by the multivariate quantile is 1 1 K ^ C ( P ) = 10 years. For the Mammen and Polonik [14] approach, the confidence regions suggest a secondary return period between 3 and 33 years and between 2.75 and 33 years for the 90 % and 95 % confidence levels, respectively. The confidence regions of the Chen et al. [15] approach suggest a secondary return period between 4.7 and 33 years and between 4.1 and 33 years for the 90 % and 95 % confidence levels, respectively. Thus, the confidence regions can also be used to assess the precision of the implied secondary return period of the multivariate quantile.
Finally, we want to stress again the advantages of having confidence regions for multivariate quantiles in a hydrology context. Not only do confidence regions give a statistical insight into the estimation uncertainty present, e.g., Figure 4 shows that these are very wide and more data would be needed for a reliable estimate of the multivariate quantile, but they are also helpful to the design of infrastructures. Since the true multivariate quantile boundary lies within the confidence region boundaries at the specified confidence level, the points within the confidence region should be considered when planning, e.g., new dams. In particular, a point from within the region between the lower boundary of the confidence region and the multivariate quantile boundary could actually be a point with a (true) secondary return period of 10 years and thus would be rarer than the estimated multivariate quantile suggests. Conversely, a point from within the region between the upper boundary of the confidence regions and the multivariate quantile boundary could have a lower (true) secondary return period and thus would occur more often than might be expected from considering the estimated multivariate quantile boundary only.

6. Conclusions

We extend the two approaches by Mammen and Polonik [14] and Chen et al. [15] for construction of confidence regions for level sets to make them applicable in a multivariate quantile context. This involves incorporating the estimation of the quantile level via the inverse Kendall distribution function K C 1 and also adjusting for the bounded copula domain. Accompanying MATLAB code can be found in the Supplementary Material.
The simulation study shows reasonable coverage probabilities for the extended Mammen and Polonik [14] method. Some of the coverage probabilities are too conservative. However, in particular for negative dependence and high quantile levels, the approach yields too low coverage probabilities. On the other hand, the extended Chen et al. [15] method shows mixed results. Overall, the coverage probabilities are too liberal. However, they show a reasonable precision for low to medium strong dependence. An application on a small hydrology-related data set illustrated some further aspects of the approaches.
On a final note, we want to point out that we tried to keep the extension of the methods as simple as possible. The approaches could be extended in several further ways. For example, a smoothed bootstrap along the lines of [10] can be incorporated into the analysis. However, we will leave this for future research. We hope that practitioners in hydrology and other fields find the considered approaches helpful and easy to apply to their problems at hand.

Supplementary Materials

The following are available online at https://www.mdpi.com/2073-4441/10/8/996/s1.

Author Contributions

The paper was conceptualized by all three authors. M.C. wrote the paper as well as the software for the simulation study and the application. R.D. conducted the formal analysis. R.D. and O.G. reviewed the paper and supervised all steps.

Funding

This research received no external funding.

Acknowledgments

We thank the editor and two anonymous referees for helpful comments, which improved the paper. We acknowledge support by Deutsche Forschungsgemeinschaft and the Open Access Publishing Fund of Karlsruhe Institute of Technology.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yue, S.; Rasmussen, P. Bivariate frequency analysis: Discussion of some useful concepts in hydrological application. Hydrol. Process. 2002, 16, 2881–2898. [Google Scholar] [CrossRef]
  2. Salvadori, G. Bivariate return periods via 2-Copulas. Stat. Methodol. 2004, 1, 129–144. [Google Scholar] [CrossRef]
  3. Salvadori, G.; De Michele, C. Frequency analysis via copulas: Theoretical aspects and applications to hydrological events. Water Resour. Res. 2004, 40, 1–17. [Google Scholar] [CrossRef]
  4. Chebana, F.; Ouarda, T. Multivariate quantiles in hydrological frequency analysis. Environmetrics 2011, 22, 63–78. [Google Scholar] [CrossRef] [Green Version]
  5. Salvadori, G.; Tomasicchio, G.R.; D’Alessandro, F. Practical guidelines for multivariate analysis and design in coastal and off-shore engineering. Coast. Eng. 2014, 88, 1–14. [Google Scholar] [CrossRef]
  6. Salvadori, G.; Durante, F.; Tomasicchio, G.R.; D’Alessandro, F. Practical guidelines for the multivariate assessment of the structural risk in coastal and off-shore engineering. Coast. Eng. 2015, 95, 77–83. [Google Scholar] [CrossRef]
  7. Salvadori, G.; Durante, F.; De Michele, C.; Bernardi, M.; Petrella, L. A multivariate copula-based framework for dealing with hazard scenarios and failure probabilities. Water Resour. Res. 2016, 52, 3701–3721. [Google Scholar] [CrossRef]
  8. Salvadori, G.; De Michele, C. On the Use of Copulas in Hydrology: Theory and Practice. J. Hydrol. Eng. 2007, 12, 369–380. [Google Scholar] [CrossRef]
  9. Salvadori, G.; Durante, F.; Perrone, E. Semi-parametric approximation of Kendall’s distribution function and multivariate Return. J. Soc. Fr. Stat. 2013, 154, 151–173. [Google Scholar]
  10. Coblenz, M.; Dyckerhoff, R.; Grothe, O. Nonparametric estimation of multivariate quantiles. Environmetrics 2018, 29, e2488. [Google Scholar] [CrossRef]
  11. Serinaldi, F. An uncertain journey around the tails of multivariate hydrological distributions. Water Resour. Res. 2013, 49, 6527–6547. [Google Scholar] [CrossRef] [Green Version]
  12. Serinaldi, F. Can we tell more than we can know? The limits of bivariate drought analyses in the United States. Stoch. Environ. Res. Risk Assess. 2016, 30, 1691–1704. [Google Scholar] [CrossRef]
  13. Hyndman, R.J. Computing and Graphing Highest Density Regions. Am. Stat. 1996, 50, 120–126. [Google Scholar]
  14. Mammen, E.; Polonik, W. Confidence regions for level sets. J. Multivar. Anal. 2013, 122, 202–214. [Google Scholar] [CrossRef] [Green Version]
  15. Chen, Y.C.; Genovese, C.R.; Wasserman, L. Density Level Sets: Asymptotics, Inference, and Visualization. J. Am. Stat. Assoc. 2017, 112, 1684–1696. [Google Scholar] [CrossRef] [Green Version]
  16. Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 1959, 8, 229–231. [Google Scholar]
  17. Nelsen, R.B. An Introduction to Copulas; Springer Series in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
  18. Joe, H. Dependence Modeling with Copulas; Chapman & Hall: Boca Raton, FL, USA, 2015. [Google Scholar]
  19. Durante, F.; Sempi, C. Principles of Copula Theory; CRC/Chapman & Hall: Boca Raton, FL, USA, 2016. [Google Scholar]
  20. Genest, C.; Favre, A.-C. Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask. J. Hydrol. Eng. 2007, 12, 347–368. [Google Scholar] [CrossRef] [Green Version]
  21. Omelka, M.; Gijbels, I.; Veraverbeke, N. Improved kernel estimation of copulas: Weak convergence and goodness-of-fit testing. Ann. Stat. 2009, 37, 3023–3058. [Google Scholar] [CrossRef]
  22. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman & Hall: London, UK, 1986. [Google Scholar]
  23. Barbe, P.; Genest, C.; Ghoudi, K.; Remillard, B. On Kendall’s Process. J. Multivar. Anal. 1996, 58, 197–229. [Google Scholar] [CrossRef]
  24. Genest, C.; Rivest, L.-P. On the multivariate probability integral transform. Stat. Prob. Lett. 2001, 53, 391–399. [Google Scholar] [CrossRef]
  25. Nelsen, R.B.; Quesada-Molina, J.J.; Rodríguez-Lallena, J.A.; Úbeda-Flores, M. Kendall distribution functions. Stat. Prob. Lett. 2003, 65, 263–268. [Google Scholar] [CrossRef]
  26. Serfling, R. Approximation Theorems of Mathematical Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1980. [Google Scholar]
  27. Ghoudi, K.; Rémillard, B. Empirical processes based on pseudo-observations. In Asymptotic Methods in Probability and Statistics, A Volume in Honour of Miklós Csörgö; Szyskowicz, B., Ed.; Elsevier: New York, NY, USA, 1998; pp. 171–197. [Google Scholar]
  28. Coblenz, M.; Grothe, O.; Schreyer, M.; Trutschnig, W. On the length of copula level curves. J. Multivar. Anal. 2018, 167, 347–365. [Google Scholar] [CrossRef]
  29. Tibiletti, L. On a new notion of multidimensional quantile. Metron 1993, 51, 77–83. [Google Scholar]
  30. Chaudhuri, P. On a Geometric Notion of Quantiles for Multivariate Data. J. Am. Stat. Assoc. 1996, 91, 862–872. [Google Scholar] [CrossRef]
  31. Serfling, R. Quantile functions for multivariate analysis: Approaches and applications. Stat. Neerl. 2002, 56, 214–232. [Google Scholar] [CrossRef]
  32. Di Bernardino, E.; Laloë, T.; Maume-Deschamps, V.; Prieur, C. Plug-in estimation of level sets in a non-compact setting with applications in multivariate risk theory. ESAIM Prob. Stat. 2013, 17, 236–256. [Google Scholar] [CrossRef] [Green Version]
  33. Deheuvels, P. La fonction de dépendance empirique et ses propriétés. un test non paramétrique d’indépendance. Académie Royale de Belgique. Bulletin de la Classe des Sciences 1979, 65, 274–292. [Google Scholar]
  34. Deheuvels, P. Nonparametric test of independence. In Statistique non Paramétrique Asymptotique; Lecture Notes in Mathematics; Raoult, J.-P., Ed.; Springer: Berlin, Germany, 1980; Volume 821, pp. 95–107. [Google Scholar]
  35. Yue, S.; Ouarda, T.; Bobée, B.; Legendre, P.; Bruneau, P. The Gumbel mixed model for flood frequency analysis. J. Hydrol. 1999, 226, 88–100. [Google Scholar] [CrossRef]
  36. Salvadori, G.; De Michele, C.; Durante, F. On the return period and design in a multivariate framework. Hydrol. Earth Syst. Sci. 2011, 88, 1–14. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Example of the extended Mammen and Polonik [14] approach. The blue line shows the estimated boundary of the multivariate quantile, the gray line shows the theoretical multivariate quantile boundary for p = 0.9 . The orange and green lines depict the confidence regions for α = 0.1 and α = 0.05 , respectively. Left Panel: Clayton copula sample with θ = 3 (i.e., Kendall’s τ = 0.6 ) and n = 100 ; Right Panel: Gumbel copula sample with θ = 2.5 (i.e., Kendall’s τ = 0.6 ) and n = 100 .
Figure 1. Example of the extended Mammen and Polonik [14] approach. The blue line shows the estimated boundary of the multivariate quantile, the gray line shows the theoretical multivariate quantile boundary for p = 0.9 . The orange and green lines depict the confidence regions for α = 0.1 and α = 0.05 , respectively. Left Panel: Clayton copula sample with θ = 3 (i.e., Kendall’s τ = 0.6 ) and n = 100 ; Right Panel: Gumbel copula sample with θ = 2.5 (i.e., Kendall’s τ = 0.6 ) and n = 100 .
Water 10 00996 g001
Figure 2. Example of the extended Chen et al. [15] approach. The blue line shows the estimated boundary of the multivariate quantile based on a kernel density estimated copula, the gray line shows the theoretical multivariate quantile boundary for p = 0.9 . The orange and green lines depict the boundaries of the confidence region for α = 0.1 and α = 0.05 , respectively. Left Panel: Clayton copula sample with θ = 3 (i.e., Kendall’s τ = 0.6 ) and n = 100 . Right Panel: Gumbel copula sample with θ = 2.5 (i.e., Kendall’s τ = 0.6 ) and n = 100 .
Figure 2. Example of the extended Chen et al. [15] approach. The blue line shows the estimated boundary of the multivariate quantile based on a kernel density estimated copula, the gray line shows the theoretical multivariate quantile boundary for p = 0.9 . The orange and green lines depict the boundaries of the confidence region for α = 0.1 and α = 0.05 , respectively. Left Panel: Clayton copula sample with θ = 3 (i.e., Kendall’s τ = 0.6 ) and n = 100 . Right Panel: Gumbel copula sample with θ = 2.5 (i.e., Kendall’s τ = 0.6 ) and n = 100 .
Water 10 00996 g002
Figure 3. Left Panel: 33 flood peak and flood volume observations from the Ashuapmushuan basin in Quebec, Canada. Right Panel: The same data, but rank-transformed to the copula domain [ 0 , 1 ] 2 .
Figure 3. Left Panel: 33 flood peak and flood volume observations from the Ashuapmushuan basin in Quebec, Canada. Right Panel: The same data, but rank-transformed to the copula domain [ 0 , 1 ] 2 .
Water 10 00996 g003
Figure 4. Combined estimation results of the multivariate quantile with both confidence region methods. Boundaries of the estimated multivariate quantiles S ^ p ( C ^ ) and S ^ p ( C ^ h ) are shown as a blue step curve and a blue smooth curve, respectively. The red curve refers to the boundary of the multivariate quantile of a Clayton copula that is parametrically estimated on the data. Confidence regions of S ^ p ( C ^ ) for confidence levels 90 % and 95 % are depicted as orange and green step curves, whereas the confidence regions of S ^ p ( C ^ h ) are shown as orange and green lines for the respective confidence levels.
Figure 4. Combined estimation results of the multivariate quantile with both confidence region methods. Boundaries of the estimated multivariate quantiles S ^ p ( C ^ ) and S ^ p ( C ^ h ) are shown as a blue step curve and a blue smooth curve, respectively. The red curve refers to the boundary of the multivariate quantile of a Clayton copula that is parametrically estimated on the data. Confidence regions of S ^ p ( C ^ ) for confidence levels 90 % and 95 % are depicted as orange and green step curves, whereas the confidence regions of S ^ p ( C ^ h ) are shown as orange and green lines for the respective confidence levels.
Water 10 00996 g004
Table 1. Simulation results for the extended Mammen and Polonik [14] approach. The overall coverage probabilities are reasonable.
Table 1. Simulation results for the extended Mammen and Polonik [14] approach. The overall coverage probabilities are reasonable.
1 α = 90 % 1 α = 95 %
n = 100 n = 200 n = 100 n = 200
Copula p 0.1 0.5 0.9 0.1 0.5 0.9 0.1 0.5 0.9 0.1 0.5 0.9
τ
Gauss 0.8 10097.782.610097.880.710099.291.910098.889.1
0.5 10093.382.010093.184.310096.390.810097.292.3
099.489.685.299.489.886.099.894.493.110095.193.0
0.5 97.192.091.398.189.689.698.796.196.199.594.994.7
0.8 96.592.794.596.790.693.098.096.697.298.895.597.4
Clayton 0.3 98.591.689.698.191.986.499.596.095.299.195.893.8
0.5 97.191.190.297.488.988.798.794.995.698.994.295.5
0.8 97.293.792.997.293.692.798.296.596.798.497.297.2
Gumbel 0.3 98.589.488.898.389.987.899.793.795.399.795.194.6
0.5 98.191.691.798.289.790.399.496.096.699.194.395.4
0.8 96.893.194.996.589.994.198.797.197.998.495.297.2
Table 2. Simulation results for the extended Chen et al. [15] approach. The overall results are mixed.
Table 2. Simulation results for the extended Chen et al. [15] approach. The overall results are mixed.
1 α = 90 % 1 α = 95 %
n = 100 n = 200 n = 100 n = 200
Copula p 0.1 0.5 0.9 0.1 0.5 0.9 0.1 0.5 0.9 0.1 0.5 0.9
τ
Gauss 0.8 0.00.593.00.00.076.80.22.796.80.00.093.2
0.5 96.187.093.989.269.093.098.194.896.995.481.997.0
086.191.594.789.291.993.191.595.997.894.396.297.1
0.5 79.583.190.078.580.289.687.990.095.986.786.695.6
0.8 71.363.079.762.851.770.980.174.689.074.365.581.6
Clayton 0.3 77.287.893.475.584.493.384.493.097.485.691.796.7
0.5 69.877.594.065.373.491.979.585.097.376.382.996.3
0.8 58.151.787.643.837.087.871.065.494.057.750.592.9
Gumbel 0.3 84.789.391.488.488.190.291.094.596.793.593.895.2
0.5 83.882.888.382.880.686.490.190.294.790.589.492.9
0.8 74.167.671.567.553.866.384.077.482.277.967.378.3

Share and Cite

MDPI and ACS Style

Coblenz, M.; Dyckerhoff, R.; Grothe, O. Confidence Regions for Multivariate Quantiles. Water 2018, 10, 996. https://doi.org/10.3390/w10080996

AMA Style

Coblenz M, Dyckerhoff R, Grothe O. Confidence Regions for Multivariate Quantiles. Water. 2018; 10(8):996. https://doi.org/10.3390/w10080996

Chicago/Turabian Style

Coblenz, Maximilian, Rainer Dyckerhoff, and Oliver Grothe. 2018. "Confidence Regions for Multivariate Quantiles" Water 10, no. 8: 996. https://doi.org/10.3390/w10080996

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop