Improving initial condition perturbations in a convection‐permitting ensemble prediction system

One of the main challenges presented by a limited‐area model ensemble prediction system (LAMEPS) concerns the limited capacity for its initial condition perturbations to correctly represent large‐scale flow uncertainties due to its limited‐size domain and deficiencies in formulating lateral boundary conditions. In addition, a mismatch between LAMEPS (initial condition) and host EPS lateral boundary perturbations can form spurious gravity waves at the boundaries. In the present work, an ensemble Jk blending method is proposed for improving representation of large‐scale uncertainties and for addressing consistent initial conditions and lateral boundary perturbations. Our approach involves employing Jk blending within a framework of three‐dimensional variation (3D‐Var) ensemble data assimilation (EDA). In such a system, small‐scale perturbations are generated from 3D‐Var EDA, while large‐scale perturbations are generated from the host ensemble via Jk blending.

systems (EPSs). At first, only uncertainties in initial conditions (ICs) were considered. The first two methods used to generate IC perturbations operationally were the breeding method (Toth and Kalnay, 1993;, developed by the National Centers for Environmental Prediction (NCEP), and singular vectors (Buizza and Palmer, 1995;Molteni et al., 1996), developed by the European Centre for Medium-range Weather Forecasts (ECMWF). These methods, although different, are based on the same main principle -perturbations should be added only in directions of phase space in which errors amplify the most. Later, additional methods were introduced (e.g. the Ensemble Transform (ET: Wei et al., 2008), Ensemble Transform Kalman Filter (ETKF: Wang and Bishop, 2003) and Ensemble Kalman Filter (EnKF: Evensen, 2003;Houtekamer and Mitchell, 2005). These methods are advantageous in that the perturbations that they produce are more consistent with the initial (analysis) probability distribution function (PDF) (Wei et al., 2008). All of the above methods are similar in that they only focus on IC perturbations. However, over the years it has been recognized that, to develop an optimal EPS, all sources of uncertainty must be perturbed (e.g. Houtekamer et al., 1996;Buizza et al., 2005;Vié et al., 2011;Wang et al., 2011;Bouttier et al., 2012;Nuissier et al., 2012;Romine et al., 2014), including IC, model and boundary condition errors.
In recent years, limited-area model ensemble prediction systems (LAMEPSs) have been developed to benefit from ensemble approach applied at the mesoscale and convective scale (Xue et al., 2007;Bowler et al., 2008;Clark et al., 2009;Vié et al., 2011;Wang et al., 2011;Peralta et al., 2012;Schellander-Gorgas et al., 2017;among others). Such ensembles are of particular importance due to a rapid loss of predictability observed at these scales (Hohenegger and Schär, 2007;Zhang et al., 2007;Judt, 2018). However, when using a limited-area model (LAM), one must address a new source of uncertainty -the specification of lateral boundary conditions (LBCs). This is typically done by nesting each LAMEPS member to a different host 1 EPS member. Saito et al. (2011) discusses the importance of LBC perturbation in LAMEPSs. However, applying this procedure can lead to the following issue: when the method used to generate perturbations in a host EPS is different and independent from the method used in an LAMEPS, we observe a conflict between such perturbations at the lateral boundaries. This problem has been recognized by many authors (e.g. Bowler and Mylne, 2009;Brousseau et al., 2011;Wang et al., 2011;2014;Caron, 2013;Kühnlein et al., 2014). Caron (2013) showed that correlations between the host EPS and LAMEPS tend to decrease with time and that the amplitude of LAMEPS perturbations can significantly differ from the amplitudes of the host model perturbations. As a consequence of this conflict, spurious waves are generated at the boundaries and quickly spread to the rest of the domain, resulting in excessive surface pressure spread in an LAMEPS. Kühnlein et al. (2014) demonstrated improved precipitation forecasts when IC perturbations were more consistent with LBC perturbations.
In addition to the above issue, as discussed in Guidard and Fischer (2008;GF08 hereafter) another challenge of limited-area modelling concerns the fact that LAMs are less effective at representing large-scale (e.g. synoptic) flow than global models due to their typically small domains, LBC formulation limitations, less effective LAM data assimilation (a fraction of the available observations are assimilated; see Gustafsson et al., 2018), etc. This problem can partially be addressed by applying a blending technique (Brožkovà et al., 2001;Yang, 2005;Wang et al., 2014;Hsiao et al., 2015). Different authors have developed various approaches to blending problem. Brožkovà et al. (2001) and Wang et al. (2014) proposed the use of a digital filter (DF) in the LAM and host model fields to obtain a blended analysis. This approach can be further extended to involve variational data assimilation where blending is performed before or after variational data assimilation. Caron (2013) and Wang et al. (2014) showed that blending techniques (by DF) can also be used to address conflicting perturbations. Caron (2013) used the so-called selective ETKF method, where ETKF acts only at small scales, while large scales are kept from the host model, thus resulting in blending. Wang et al. (2014) used the DF-blending method under the framework of their ALADIN-LAEF (Aire Limitée Adaptation Dynamique développement International-Limited-Area Ensemble Forecasting) system (Wang et al., 2011) to combine IC perturbations generated through the breeding method with the global ECMWF Ensemble Prediction System (ECMWF-EPS) IC perturbations. DF-blending was used to blend small-scale ALADIN-LAEF and large-scale ECMWF-EPS IC perturbations to simultaneously introduce them into perturbed ICs. Wang et al. (2014) also achieved better upper-air scores by applying the blending method rather than the breeding method without blending.
However, as GF08 discusses, the DF-blending approach presents unique problems. Most notably, it has no relation to data assimilation theory, and there have been concerns regarding its optimality in the sense of minimum variance estimate. Thus, they proposed a method which includes global model information directly into a limited-area variational analysis (section 2.1) and implemented it in the ALADIN/Action de Recherche Petite Echelle et Grand Echelle (ARPEGE) framework within ALADIN three-dimensional variational assimilation (3D-Var). We refer to this as the Jk blending method. Dahlgren and Gustafsson (2012) explored this method further within a HIgh-Resolution Limited-Area Model (HIRLAM)/ECMWF framework of the HIRLAM 3D-Var and 4D-Var. The method was found to improve on forecast scores in both the former and latter cases.
Large-scale information can also be included through the use of dual-resolution "hybrid" variational-ensemble data assimilation methods (e.g. Schwartz, 2016;Hu et al., 2017;Wu et al., 2017). Such methods incorporate ensemble-derived, flow-dependent background error covariances drawn from a coarser-resolution model into a high-resolution deterministic analysis. As opposed to the Jk method, which includes large-scale information as a new member in the 3D-Var (section 2.1), hybrid variational-ensemble data assimilation methods execute this through a background error covariance matrix. Schwartz (2016) demonstrated improved forecasts started from a hybrid analysis.
Issues of large-scale representation also reflect LAMEPSs in the sense that their IC perturbations, which are produced through techniques such as ensemble data assimilation (EDA), cannot correctly account for large-scale uncertainties. The simultaneous presence of small-and large-scale perturbations is beneficial for EPS (Wang et al., 2014;Johnson and Wang, 2016;Raynaud and Bouttier, 2016). To alleviate this problem, at Zentralanstalt für Meteorologie und Geodynamik (ZAMG), the LAMEPS ALADIN-LAEF has been developed within the framework of RC-LACE (Wang et al., 2018) and has been used in operations since 2009. It is based on the ALADIN model (ALADIN International Team, 1997;Termonia et al., 2018) and is currently run at 11 km horizontal grid spacing. However, with ever-increasing levels of computer power, it is now possible to run LAMEPSs at convection-permitting resolutions corresponding to a grid spacing of 1-4 km (e.g. Vié et al., 2011;Bouttier et al., 2012;Nuissier et al., 2012;Kühnlein et al., 2014;Romine et al., 2014;Schwartz et al., 2015;Johnson and Wang, 2016;Hagelin et al., 2017). For this reason, at ZAMG, a new convection-permitting LAMEPS named C-LAEF (Convection-permitting Limited-Area Ensemble Forecasting) based on the Applications of Research To Operations at MEsoscale model (AROME: Seity et al., 2011;Termonia et al., 2018) is being developed.
The goal of this article is to present and discuss an initial condition perturbation method that we plan to use in a first operational configuration of C-LAEF. We propose employing the Jk blending method within the C-LAEF system, for example, within the framework of its 3D-Var ensemble data assimilation system (Bouttier et al., 2012) to generate an ensemble of analyses. We refer to this as an ensemble Jk method. In such a system, small-scale perturbations are generated through 3D-Var EDA, while large-scale perturbations are generated from the host model via Jk blending. In turn, we hypothesize that final analyses are optimal, contain perturbed small and large scales which are, at the same time, consistent with perturbations coming from lateral boundaries. In this way, one can alleviate problems of large-scale forcing and inconsistencies between ICs and LBCs. This article is organized as follows. In section 2, we present formalisms of the Jk blending method and introduce the ensemble Jk method. Section 3 describes the C-LAEF configuration, experiments and verification procedures performed. Sections 4-6 present our results, and a discussion and conclusions follow in section 7.

Jk blending method
We now present basic formalisms of the Jk blending method. For a more detailed description, the reader is referred to GF08. 3D-Var is a variational method where, in order to obtain optimal analysis x a , a so-called cost function J that measures the distance of the LAM state vector x to observations y, and to the LAM background x b , is minimized 2 : where B and R are error covariance matrices of the background and observations, respectively, and where H is a nonlinear observation operator. In Equation 1, it is assumed that background and observation errors are uncorrelated. In addition, when we assume that host model errors are uncorrelated with observation and background errors, an extra term J k (x) measuring the distance from large-scale information in x to the host model large-scale information x k can be added to Equation 1: where V is the host model large-scale error covariance matrix, H 1 is the operator which fetched x from a nominal high-resolution to a low-resolution LAM space, and where H k is the operator which fetches x k from the host model low-resolution space to the same low-resolution LAM space as H 1 . This is done to include only large scales from the host model (e.g. large-scale constraint) as discussed in the introduction. H k consists of two steps: (a) the interpolation of host model fields to a high-resolution LAM geometry, and (b) its truncation to a lower resolution (see section 3.3), such that only large scales are affected during analysis. We applied the Jk blending method to the AROME 3D-Var system (Fischer et al., 2005), in which x is the state vector of five analysis variables (as is x k ): temperature, vorticity, divergence, specific humidity and the logarithm of surface pressure. A large-scale constraint can be applied to all of these or to only some. Our choices regarding this option are discussed in section 3.2. As stated in GF08, correlations between host model and background errors are different from zero as a result of lateral boundary coupling. However, when one accepts inaccuracies of roughly 10%, such correlations can be disregarded, and the total cost function can be represented as the sum of three parts as shown in Equation 2. In this work, it is assumed that this assumption also holds for our model configuration.

Ensemble Jk method
For the IC perturbation method of C-LAEF, we use the ensemble Jk method which combines the ensemble of AROME 3D-Var with the Jk blending method. As discussed in the introduction, one must obtain consistent IC and LBC, which is especially crucial if LAMEPS is integrated over a small domain. To combine large-scale perturbations drawn from the host EPS (the global EPS in this work), which provides the LBC for the LAMEPS, with the small-scale perturbations from LAMEPS, we propose an ensemble data assimilation blending technique. The approach is described as follows: 1 3D-Var EDA is used to generate perturbed analyses by: a Random observation perturbations designed to simulate observation errors. For each observation, its error is taken from an observation database and is multiplied by a normally distributed random number with a zero mean and a unit standard deviation. This value is then added to the original observation value to obtain a perturbed observation. b Background perturbations from short-range forecasts of the LAMEPS. For the time being, model errors are not considered. Perturbed-observation EDA perturbations are advantageous in that they are designed to estimate analysis uncertainty (e.g. Wei et al., 2008) which is a desirable property of initial condition perturbations. A theoretical justification of this approach is given in Žagar et al. (2005).
2 To generate correct large-scale information in perturbed analyses that are consistent with perturbed LBCs, we introduce a large-scale constraint by applying Jk blending to 3D-Var EDA. In other words, perturbed global analyses were incorporated as a new term of the 3D-Var cost function and were assimilated with observations described in the last two paragraphs. In brief, each EDA member assimilates the following three components: a Perturbed observations. b Perturbed background. c Perturbed global analysis.
Mathematically, the cost function of one ensemble member of the ensemble Jk method can be written as where subscript i denotes the ensemble member. Final perturbed analyses are then found by minimizing these cost functions as in the standard 3D-Var. The resulting blended analyses include small-scale perturbations (LAMEPS) and large-scale perturbations (global EPS) while being consistent with LBC perturbations. We believe that the ensemble Jk method can alleviate the problem of IC and LBC perturbation mismatch and improve the representation of large-scale uncertainties in ensemble systems.

C-LAEF configuration
The C-LAEF system is based on the convection-permitting AROME model as it was operationally configured at ZAMG. AROME is a high-resolution limited-area spectral non-hydrostatic model. It takes most of the ALADIN code on the adiabatic part while its physics package is mainly an adaptation of that used in Mesoscale Non-Hydrostatic research model MESO-NH (Lafore et al., 1998), and it is well adapted to a small grid spacing of roughly 1 or 2 km (Termonia et al., 2018). In addition, the AROME 3D-Var system is almost identical to that developed for the ALADIN system (Fischer et al., 2005). A more detailed description of AROME dynamics, physics and assimilation properties can be found in Seity et al. (2011), Vié et al. (2011 and Brousseau et al. (2016). In the present study, C-LAEF is configured by the AROME model with a horizontal grid spacing of 2.5 km and 90 vertical levels with 17 ensemble members (16 perturbed plus control). The integration domain used is shown in Figure 1. Due to high computational costs related to C-LAEF, the domain is relatively small at 1,080 km in the north/south direction and 1,500 km in the east/west direction. LBCs are provided by the ECMWF-EPS, and lateral boundary coupling is conducted using the Davies method (Davies, 1976). IC perturbations are described in the next subsection, and for LBC perturbations, 17 members of the ECMWF-EPS are used. Due to technical limitations, the same ECMWF-EPS members used for ALADIN-LAEF (the first 16 plus control) are also used for C-LAEF. As this work's emphasis is on IC uncertainties, model errors are not considered, and no model error representation scheme is used.
3D-Var is used for data assimilation. Conventional observations including Aircraft Meteorological DAta Relay (AMDAR), SYNOP, PILOT, TEMP, SHIP and EUROPRO-FILERS plus two satellite products, GEOWIND atmospheric motion vectors and 25 km Advanced Scatterometer (ASCAT) ocean winds, are assimilated. In this testing phase, 6 h continuous assimilation cycles applied at 0000, 0600, 1200 and 1800 UTC are performed.

Experiments
The main goal of this work is to evaluate the ensemble Jk method and to assess its added value to standard perturbed-observation EDA (e.g. Houtekamer et al., 1996;Bouttier et al., 2012), that is, EDA without the J k term. For this reason, two experiments are conducted: (a) a reference (REF) experiment in which C-LAEF is using perturbed-observation EDA to generate an ensemble of analyses, and (b) a JK experiment whereby C-LAEF uses the ensemble Jk method for the same purpose. With the exception of this difference, both experiments use the same configuration as that described in section 3.1. Assimilation is performed at a full model resolution for 17 different ensemble members. As the ensemble Jk method and perturbed-observation EDA act only on upper-air variables, to allow for a direct comparison, surface variables perturbation (e.g. land albedo, soil moisture, soil temperature, etc.) is not applied. An additional experiment (DOWN) is performed for the period studied in section 6. In DOWN, perturbed analyses were not generated by EDA or by the ensemble Jk but by the downscaling of ECMWF-EPS analyses. This makes them fully consistent with LBC perturbations and DOWN serves as a reference experiment in section 6. Comparison of AROME EDA against the downscale experiment in terms of traditional scores is presented in Raynaud and Bouttier (2016) and will not be repeated here. They concluded that EDA "significantly improves the AROME EPS performance for surface weather variables at early ranges, namely up to 12 h depending on the variable, as measured by the spread/skill relationship and the CRPS over a one-month period." In the JK experiment, global ECMWF-EPS analyses are used as a source of large-scale information in the ensemble Jk method -term x k as described in section 2.1. Horizontal resolution of the ECMWF-EPS in our experiments is spectral cubic T CO 640 with 91 vertical levels with LBCs updated every 3 h. The ECMWF-EPS uses singular vectors in combination with EDA for IC perturbation generation (Buizza et al., 2008) and resulting perturbations are completely independent from ours.
In the Jk blending method, we test several means of tuning the impact of large-scale constraint. First, we evaluate the selection of variables included in x k . Multiple tests are performed and model behaviour studied from the use of different variables. We ultimately use all analysis variables with the exception of surface pressure, as this is the best performing option (not shown). Second, a scaling factor (for each analysis variable) controls the relative weight of the J k term relative to J b and J o terms of the cost function, that is, it can be used to fine-tune the impact of all variables separately. This is a value that multiplies standard deviations within the V matrix. A similar scaling factor for background error standard deviations is standardly used in ALADIN/AROME variational assimilation. A probabilistic approach is used in this case. A scaling factor for each variable is randomly selected from between two reasonable empirically predetermined values, so that each member presents a slightly different combination. Each random number is determined from a seed, and our experiments can be reproduced exactly. Third, the inflation of ECMWF-EPS analyses perturbations (calculated by subtracting control analysis from each ensemble member) by a factor of 2 (0) at the lowest (highest) model level with exponential change between is performed before their use in ensemble Jk. After this procedure, newly calculated perturbations are again added to the ECMWF control to create inflated analyses. When this inflation procedure is not applied, the ensemble spread is reduced in JK relative to REF. Inflation restores the spread to similar values as in REF while not affecting other scores such as the RMSE.

V matrix and truncation selection
As illustrated by Equations 2 and 3, in addition to B and R covariance matrices, a V covariance matrix must also be provided. This is an important step because V controls the impact of large-scale constraint on the final analysed state. In calculating V, we used the same procedure normally applied to B, for example, the ensemble simulation method (ESM: Berre et al., 2006), meaning that V is climatological and does not contain any flow-dependent information. To obtain climatological host model large-scale error covariances using the ESM, forecast differences between 16 ECMWF-EPS members at analysis time need to be calculated. This calculation is done for 0000 and 1200 UTC runs over two-week periods in January, April, July and October of 1 year to include seasonal variability of error covariances, as this can have a significant impact on forecast quality (Storto and Randriamampianina, 2010). As in GF08, V is formulated in a univariate manner. The J k term needs to be truncated such that only large scales are kept from the host model. To do so, we adopted the same procedure as that applied in GF08. Figure 2 shows the horizontal error variance spectrum (Berre, 2000) of V and B matrices for two levels (roughly 500 and 920 hPa) and variables (divergence and specific humidity). It can be observed that at scales of above the total wave number k* (as defined in Berre, 2000) roughly 10, two spectra begin to diverge, and little energy is found in mesoscale spectra of the host model. We thus did not expect to derive any useful information from the host model at these scales, and Jk truncation was set at k* = 8 which corresponds to wavelengths of roughly 135 km.

Verification
JK and REF experiments described above are verified in 3 h intervals for surface variables, that is, 2 m temperature (T2M), 2 m relative humidity (RH2M), 10 m wind speed (W10M) and mean-sea-level pressure (MSLP) against observations drawn from 832 available surface stations within the verification domain (Figure 1). Model fields are interpolated via bilinear interpolation to observation locations. Verifying precipitation forecasts in convection-permitting models has its difficulties and traditional verification methods may not be appropriate (Ebert, 2008;Mittermaier, 2014). In this work, we use Fractions Skill Score (FSS: Roberts and Lean, 2008), which is a form of a neighbourhood approach (Gilleland et al., 2009), and it is computed following the ensemble formulation proposed by Duc et al. (2013). Integrated Nowcasting through Comprehensive Analyses (INCA: Haiden et al., 2011;Wang et al., 2017) analyses are used as a reference for precipitation verification. The INCA system is currently being developed at ZAMG, and it uses data drawn from automatic weather stations, remote-sensing data (radar, satellite), forecast fields of NWP models, and high-resolution topographic data (Haiden et al., 2011) to generate a high-resolution precipitation analysis field for a domain covering Austria and surroundings ( Figure 1). For upper-air variables, verification is done every 6 h against high-resolution deterministic ECMWF (0.1 • × 0.1 • ) and deterministic NCEP analyses (0.5 • × 0.5 • ). This approach is applied because the ensemble Jk method uses perturbed ECMWF analyses and, thus verification against deterministic ECWMF analyses can be biased toward the JK experiment. Verifying variables include temperature, relative humidity and wind speed at 500 and 850 hPa pressure levels (T500, T850, RH500, RH850, W500 and W850, respectively).
In general, probabilistic forecast verification is more difficult to preform than deterministic forecast verification because probabilistic forecast values are not always measured as 0 or 1. In addition, good ensemble prediction systems should generate superior ensemble mean forecasts over the control, high spread-skill relations and reliable probability forecasts (Du, 2007). Furthermore, Murphy (1993) described aspects or attributes that contribute to forecasting quality (e.g. accuracy, bias, reliability, resolution, discrimination, etc.). Given these sources of complexity, no single score can assess all attributes of probabilistic forecasts at once. For this reason, various scores are used to assess different aspects of forecasts and to draw conclusions about the EPSs under consideration. These include (a) the root-mean-square error (RMSE) of ensemble mean for assessing ensemble mean accuracy, (b) the continuous rank probability score (CRPS) for assessing overall EPS skill, (c) ensemble RMSE/spread relation and outlier statistics for assessing reliability, and (d) relative operational characteristics (ROC) for assessing discrimination. Detailed descriptions of these scores can be found in Talagrand et al. (1997) and Wilks (2006).
To determine if the difference in scores between the experiments is statistically significant, the moving-block bootstrap technique, following the procedure of Wilks (1997) and using 5,000 re-samples at a confidence level of 90%, was applied. Block length (four in our case) was defined as a closest integer to a cube root of the sample length (Hall et al., 1995). Observation errors are not considered, as this extends beyond the scope of our verification package. Nevertheless, as observation errors can have a significant effect on final scores (e.g. Bowler, 2008;Candille and Talagrand, 2008;Duc and Saito, 2018), we discuss this limitation in section 7.

RESULTS
In this section, we present the results of a two-month integration of REF and JK experiments to illustrate differences in average model behaviour. Experiments were performed for 62 days from 1 July until 31 August 2016 with 1200 UTC runs with a 24 h forecast range. These months involved several episodes of strong convection occurring over central Europe, rendering them appropriate periods for the verification of a convection-permitting EPS. Due to the limited domain used, a forecast range of 24 h proved more than sufficient for our IC perturbation impact study, as LBC perturbations dominate even before this time is reached. For upper-air variables, only verification results done against independent NCEP analyses are presented, but conclusions remain the same if verification is done against ECMWF analyses. However, verifying against high-resolution ECMWF analyses yields even better JK results with an increased percentage of statistically significant results relative to those presented in the following subsections (not shown).

RMSE of ensemble mean and spread
In a statistically consistent or reliable EPS, the RMSE of ensemble mean should match its spread (Fortin et al., 2014). This follows from the fact that in a perfect ensemble (i.e. a perfect sampling of true state uncertainty), truth should be statistically indistinguishable from any ensemble member ( Buizza et al., 2005). Thus, the RMSE/spread relation is a good measure of reliability. Figure 3 shows RMSE and spread for T500, T850, RH500, RH850, W500 and W850. Positive effects of the ensemble Jk method on the accuracy of the ensemble mean can be clearly identified because RMSE is reduced for almost all variables and at both levels in JK. However, differences observed are mostly statistically significant within early forecast ranges (up to 12 h). Spread remains similar in JK and REF at 850 hPa but is reduced at 500 hPa in JK, which is a favourable result because JK would be over-dispersive otherwise. As REF is under-dispersive, reducing RMSE more than spread enhances the reliability of JK.
RMSE and spread for surface variables are shown in Figure 4, in which results are more neutral. There is a slight improvement in RMSE of JK for T2M, RH2M and W10M, mostly over the first 3-9 h and for MSLP from 9 to 18 h. However, none of the differences observed are significant. Spread remains almost the same for T2M and RH2M, while it is slightly higher in the case of MSLP and lower for W10M in JK.
Overall, both REF and JK are under-dispersive for both upper-air variables and especially for surface variables. These results are consistent with those of numerous other studies (Wang et al., 2011;Bouttier et al., 2015;Harnisch and Keil, 2015;Johnson and Wang, 2016;among others). This is likely the case due to an absence of model and surface perturbations and because observation errors are not accounted for.

Continuous ranked probability score (CRPS)
The CRPS is related to the rank probability score (RPS) but for an infinite number of classes; thus, it compares the full predicted distribution with observations. The CRPS is a good measure of overall EPS skill. It is dependent on a predicted distribution's first (ensemble mean) and second moment (spread) and is negatively oriented, meaning that lower values are better. The CRPSs for upper-air and surface variables are shown in Figures 5 and 6, respectively. Better CRPS is obtained for all upper-air variables for all levels for JK. Significance test results are similar to RMSE results -significant within early forecast ranges for upper-air (up to 12 h) and up to 18 h for RH850. For the surface, significant improvements in CRPS are obtained at 9 and 12 h for MSLP in JK. Insignificant improvements observed in the first forecast hours for T2M in JK are also visible here. Thus, the CRPS supports

Outlier statistics
Outliers are defined as the number of cases in which verifying observations lie outside of the whole ensemble and are typically expressed as percentages. The perfect outlier percentage is calculated as 2/(N + 1) * 100% where N is the ensemble size (thus, a perfect score is 11% in our experiments). The percentage of outliers serves as a good measure of reliability. Figure 7 shows the percentage of outliers for upper-air variables. The number of outliers is greatly reduced in JK for T500 and T850, with weaker effect found for RH500 and RH850 and with almost neutral effect found for W850 and W500. This means that reliability is mostly improved for temperature and relative humidity. Significance test results support these conclusions.
Regarding the surface variables and results given in Figure 8, the greatest reduction in the number of outliers in JK is achieved for MSLP, which is also significant at 9 and 12 h forecast range, resulting in a more reliable ensemble. Slight and insignificant reductions are also observed for T2M.

Relative operating characteristics (ROC)
The ROC curve plots the hit rate against false alarm rate for a set of predefined probability thresholds (in our case, the difference between two consecutive thresholds is 1/N). It compares two conditional distributions where the hit rate corresponds to occurrences and the false alarm to non-occurrences of an event measuring the attribute of discrimination.
The ROC curve of upper-air variables for the 6 h forecast range is shown in Figure 9. The figure shows a slight improvement in the discrimination of humidity and wind for all levels in JK, showing that JK can better distinguish between the occurrences and non-occurrences of an event. For the surface variables, the situation is more neutral with no clear improvement for JK (not shown).
More neutral results for surface variables and for all scores presented in this section support the conclusions of Dahlgren and Gustafsson (2012) and are as expected for a number of reasons. First, lower boundary conditions have a considerable impact on surface variables. Second, surface variables are sensitive to small-scale information while the ensemble Jk method acts only on upper-air variables and at the largest scales. At 6 h forecast range, precipitation forecast is improved for all thresholds (scales above 195 km) and for thresholds up to 5 mm (scales above 45 km). For 12 and 18 h forecast ranges, neutral results are observed, while for 24 h forecast range, precipitation forecast is improved for all scales (threshold of 1 mm) and for higher thresholds (scales above 195 km).

Precipitation
Results presented here indicate that, at early forecast ranges, mostly larger scales are affected (this is in agreement with the fact that only large scales are affected in the Jk method), while at later forecast ranges, nonlinear model integration spreads the influence on all scales.

CASE-STUDY
We now illustrate benefits of including large-scale perturbations from the host model through a case-study. Figure 11 shows a synoptic situation observed on 11 July 2016 at 0000 and 1200 UTC. As is shown, a cold front related to a cyclone over the North Sea is moving over the C-LAEF domain. At the same time, on the ground strong advection involving warm and moist air from the Mediterranean area is observed. This synoptic set-up has a great potential to trigger convection in areas preceding the front. We expect to observe benefits of including large-scale perturbations in cases such as the one described here (i.e. strong synoptic forcing located near the borders of the computational domain). Any information outside of the model domain is propagated inside through LBCs. However, due to well-known deficiencies related to lateral boundary formulation in LAMs (e.g. Warner et al., 1997;Nutter et al., 2004;Termonia et al., 2009) and due to LAM orientation toward the mesoscale, this information can be deficient. We believe that introducing large-scale information in an additional way -through Jk -can alleviate such problems. Figure 12 (left column) illustrates ensemble mean MSLP for REF and JK experiments started at 0000 UTC on 11 July 2016 from 3 and 9 h forecasts. The overall distribution in pressure observed is similar across the whole domain. The biggest difference, considering all forecast hours, is visible over the upper half of the domain where lower pressure, related to an incoming frontal system, propagates further to the south in JK (note how 1,011 and 1,014 hPa contours are displaced more to the south in the left column of Figure 12).
The right column of Figure 12 shows the difference between REF and JK ensemble mean absolute error (forecast -observation) plotted on surface station locations. As the front enters and moves through the domain, an interesting phenomenon is observed: improvements in MSLP forecast for JK follow the progression of the frontal system. Smaller errors of up to 1.5 hPa are observed in the northern part of the domain from which the cold front progresses towards inland Europe. Pressure distribution is clearly better represented in JK. Although surface pressure large-scale information was not used in the Jk method directly, improvements in surface pressure forecast come from the other variables as surface pressure reflects the entire column.
In addition, Figure 13 (Supporting Information Figure  S1) shows the probability of 3 h precipitation above 1 mm (10 mm) calculated from 3-6 to 6-9 h forecasts for REF and JK experiments started at 0000 UTC on 11 July 2016. Given that no precipitation is observed (Supporting Information Figure S2) for these hours, our REF experiment generates excessively high probability values (up to 60%) for precipitation over southern Germany and western Austria. For the JK, the area affected and probabilities are reduced with the latter remaining at below 20% for 3-6 h and at 30% for 6-9 h forecasts, thus minimizing the possibility of a false alarm. Later during that day, convection developed as the front moved through the domain. Figure 14 (Supporting Information Figure S3) shows the probability of 3 h precipitation above 1 mm (10 mm) calculated from 9 to 12 h forecasts for REF and JK experiments started at 1200 UTC on 11 July 2016 and INCA analysis. The JK experiment better captured overall precipitation distribution, especially over northern Austria and southern Czech Republic and a dry area over southeastern Germany which are both missed in REF. Intense precipitation in the north of Italy was better forecasted in JK with above 90% probability for rain above 10 mm compared to around 40% in REF (Supporting Information Figure S3). However, both experiments missed the intense precipitation over central Austria. The area over southern Switzerland is not considered in the analysis because, in INCA, observation data are scarce there and the quality of the radar data is suboptimal. Overall, the results presented here demonstrate the benefits of applying large-scale constraints on C-LAEF perturbations.

PERTURBATION MISMATCHING
We now turn to the second main focus of this work -mismatched perturbations. An additional experiment (DOWN) was performed whereby IC perturbations are fully consistent with LBC perturbations. For this reason, the DOWN experiment serves as a new reference experiment. In addition, all experiments are performed using 1 h LBC update frequency. As stated by Warner et al. (1997), when solutions of the LAM and host model for areas close to lateral boundaries are different, such differences may generate spurious gradients and feedback between the two grids, which can influence the interior of the LAM domain. Given that EDA IC perturbations observed in the REF experiment are completely independent of LBC perturbations originating from ECMWF-EPS, such differences can arise.
One such event was observed for the REF run started on 17 July 2016 at 0000 UTC. Figure 15 shows the MSLP spread for the first hour of model integration. After only 5 min (Figure 15b), a large spread (with maximum value of roughly 3.5 hPa) can be observed close to the northeastern and along the southern domain boundary. As integration continues (Figure 15c,d), the anomaly in spread advances further into the domain at a velocity magnitude close to the speed of sound. After 1 h (Figure 15d), excessive spread (relative to the DOWN result shown in Figure 16a) has not left the domain and approaches the western and eastern lateral boundaries. Given that anomalies form at the boundaries and not in DOWN or the host model (not shown), it is clear that they are unrealistic and likely formed as a result of unrealistic waves generated by an IC and LBC perturbation mismatch. The fact that excessive spread is still present after 1 h of integration can create serious issues for 1 h assimilation cycles (e.g. Caron, 2013).
A potential solution to this problem could come from so-called space consistent coupling, whereby LBCs at initialization time do not come from the host model but from the LAM analysis (Brousseau et al., 2016). In turn, LBC perturbations are not imposed with full weight from the start and are instead slowly introduced over the first forecast hour to reduce the risk of perturbation conflict. Figure 16b shows the results of such integration observed after 1 h. Excessive spread is still observed close to the northwestern and eastern lateral boundaries and does not seem to be reduced at all (compared to Figure 15d), leading us to conclude that space consistent coupling is unable to solve this issue for such an event.
However, by using 3 h LBC update frequency, space consistent coupling can solve this problem (not shown) because in this case, LBC perturbations are introduced even more slowly. Nevertheless, given the benefits of higher LBC update frequency (e.g. Nutter et al., 2004;Termonia et al., 2009), the availability of global EPS 1 h files (e.g. ECMWF) and continuing efforts made to improve LAMEPSs in general, we assume that results derived from a 1 h LBC update frequency are more significant.
An alternate approach involves using the ensemble Jk method, for which LAM IC perturbations are more consistent with LBC perturbations by design. The results of integration using the ensemble Jk method for the same period are shown in Figure 16c. Excessive spread is largely reduced after 1 h of integration, and overall spread distribution resembles the DOWN pattern most consistently (although the spread is still slightly larger than in DOWN). Increased spread to the east of the domain (also found in later forecast ranges -not shown) is also found in DOWN and can be explained by the presence of a cyclone over this part of the domain. From these results, we conclude that the ensemble Jk method can indeed be used to alleviate the problem of mismatching perturbations and that constraining only the largest scales (above 135 km as discussed in section 3.3) can achieve this goal.

SUMMARY AND CONCLUDING REMARKS
Probabilistic approaches to weather prediction are widely recognized as superior to deterministic and ensemble prediction systems and are becoming more widely favoured forecast tools. This is especially true for LAMEPSs, which have been used more frequently over the last 10 years. Thus, motivations to resolve problems related to LAMEPSs are strong. Due to the typically small computational domains and deficiencies of LBC formulations, LAMEPSs are unable to correctly account for uncertainties of large-scale flow. Furthermore, mismatches between regional EPS (ICs) and host EPS lateral boundary perturbations can result when they are obtained independently.
In this article, we introduce an ensemble Jk method as a new idea for improving LAMEPS forecasts, whereby host model large-scale perturbations are introduced into LAMEPS IC perturbations using the Jk blending method. Jk blending is applied to the 3D-Var EDA system so that perturbed analyses are optimal as opposed to using the blending method developed by Wang et al. (2014). As a result, analyses contain perturbed small scales (LAM EDA) and large scales (host EPS). Furthermore, when including host model perturbations in LAM initial perturbations, final perturbations are more consistent with perturbations coming from lateral boundaries, thus minimizing the risk of potential mismatch.
The ensemble Jk method was implemented into convectionpermitting ensemble system C-LAEF with ECMWF-EPS providing lateral boundary conditions and large-scale perturbations. To assess the effect of large-scale constraint on IC perturbations, two experiments were conducted,   one using an ensemble Jk and the other using standard perturbed-observation EDA, that is, EDA without Jk blending. Impacts on average model performance were measured over 62 convectively active days of the summer of 2016.
Results show that using the ensemble Jk method provides a more accurate and reliable EPS. This is mostly the case due to a reduction of RMSE and thus due to increased accuracy. RMSE of ensemble mean is strongly reduced for upper-air variables, while spread is maintained on similar or slightly lower values. The percentage of outliers supports these conclusions as does the CRPS, which is reduced for all variables. The ROC score also benefits ensemble Jk results, meaning that discrimination is improved.
Performance in terms of surface variables is more neutral, as was expected due to its high sensitivity to small scales and the forcing effects of lower boundary conditions. Nevertheless, slight but insignificant improvements resulting from using ensemble Jk can be observed for early forecast ranges (up to 9 h) for temperature and humidity, and for surface pressure, improvements are visible up to 18 h. The strongest effect of ensemble Jk is observed for surface pressure forecasts, which are more reliable and accurate. Discrimination remains similar for all variables. Precipitation forecasts were verified by a different methodology and by using the FSS. Improvements for thresholds up to 20 mm and for scales ranging from the highest (325 km) down to the smallest (5 km) are observed when using the ensemble Jk method.
To further illustrate benefits of considering large-scale perturbations from the host model, a case-study of strong synoptic forcing observed near the borders of the computational domain is presented. It is demonstrated that using the ensemble Jk method improves surface pressure distribution in areas positioned close to an incoming front, in turn minimizing the possibility of the false detection of precipitation and improving the precipitation forecast during a period of convective activity over Austria.
Finally, a case of strong perturbation mismatch was presented. Strong anomalies in surface pressure spread were observed close to the northeastern and southern domain boundaries after only 5 min of model integration. After 1 h, excessive spread was still present close to the edges of the domain. Introducing LBC perturbations gradually over the first hour of integration, for example, space consistent coupling, did not remedy this problem. However, when ensemble Jk method was used and when large-scale IC perturbations were made more consistent with LBC perturbations, even only for the largest scales, excessive spread was mitigated after 1 h. From these findings, it is clear that future implementations of C-LAEF with hourly assimilation cycles must address the problem of perturbation mismatch, which was the leading cause of excessive spread observed here. The ensemble Jk method can be used to alleviate such problems.
Overall, using the ensemble Jk method benefits the ensemble performance of upper-air variables, precipitation and mean-sea-level pressure, while effects on the other surface variables are less significant and mostly reserved for special cases, such as those discussed in section 5. Observed impacts are strongest within the early forecast range, which is to be expected from an IC perturbation study (e.g. Vié et al., 2011). Furthermore, both of our experiments are under-dispersive, particularly for lower model levels. This is the case due to an absence of model and surface perturbations and to a lack of consideration of observation errors. In addition, due to the relatively large difference in grid spacing between C-LAEF and ECMWF-EPS, LBC "sweeping" effect (Nutter et al., 2004) is enhanced and plays a significant role in later forecast ranges, placing an additional constraint on ensemble spread growth. As noted above, we did not conduct observation error simulations of any kind. Though impacts of observation error can change the outcome of forecast verification, we believe that this was not the case in our study. First, we were only interested in relative performance, that is, differences observed between experiments, and including the observation error would have had an effect of the same direction for both experiments. Second, RMSE/spread relation and outlier statistics are scores for which the absolute performance of each system is also important. As shown by other works (e.g. Bouttier et al., 2012;Romine et al., 2014), accounting for observation error would have increased ensemble spread given in Figures 3 and 4. This could mean that an EPS using the ensemble Jk method could be declared over-dispersive in such a case. However, a slight modification of the ensemble Jk method can be done to account for such over-dispersion. The inflation factor discussed in section 3.2 can be reduced to decrease the spread while keeping RMSE unchanged (not shown).