A stochastic weather model for generating daily precipitation series at ungauged locations in the Catskill Mountain region of New York state

Information on the variability of precipitation in time and space is critical for many water resource projects. However, precipitation records at the location of interest are often either limited or unavailable due to an inadequate network of rainfall measurements. To address this need, regionalization methods have been employed to characterize spatial patterns of precipitation and to transfer precipitation information from one location to another where records are scarce. Hence, the overall objective of the present paper is to propose a stochastic weather model for generating daily precipitation at ungauged locations. The proposed approach consists of two components: (a) a regionalization approach for identifying homogeneous groups of observed daily precipitation series, and (b) a stochastic model for constructing daily precipitation events at ungauged locations within homogeneous groups. This statistical approach identifies groups of precipitation stations with similar statistical characteristics based on the combination of two multivariate statistical techniques: principal component analysis (PCA) and ordinal factor analysis (OFA). While the application of PCA in climatological regionalization studies based on precipitation amount is common, the application of OFA to include precipitation occurrence in the identification of regions is unusual. The feasibility of the approach is assessed using daily precipitation data from a network of precipitation stations in the Catskill Mountain region of New York State, United States.


| INTRODUCTION
The management and allocation of water resources have been considered as one of the most significant endeavours in human society because water plays a vital role in all natural and environmental systems. In addition, according to the Intergovernmental Panel on Climate Change (IPCC) Synthesis Report (IPCC, 2014), climate change has had significant impacts on precipitation processes in many regions of the world, including increasing trends in precipitation amounts in North America, northern Europe, and northern and central Asia. The northeastern United States, where our study region is located, has experienced a larger increase in extreme precipitation than any other region in the United States (Rosenzweig, 2011;Melillo et al., 2014). It is thus essential to consider the potential impacts of climate change on water resources. Information on the variation in time and space of precipitation and the resulting runoff is essential for these climate change impact assessment studies. However, many watersheds do not have an adequate network of precipitation measurements to provide a sufficient record of observed data.
Because the Catskill Mountains provide most of the water supply for New York City (NYC) and other regional municipalities, changes to the hydrological cycle can affect, in addition to several hundred thousand local residents, over 9 million residents of New York State (DEP, 2017). The geographical and meteorological features of the region result in complex seasonal and spatial patterns of precipitation (see section 3). In the context of regional impact studies, information on the spatial distribution of precipitation at a sufficiently high resolution is essential for accurate calculation of local values of hydrologic variables such as streamflow (Baeriswyl and Rebetez, 1997). Due to the high spatial variability of precipitation and low density of measurement stations in mountainous areas, the estimation and prediction of hydrologic variables with climate change conditions remains a challenge (Sivapalan, 2003;Yeo, 2014).
To address this need, regionalization methods have been employed to understand the spatial behaviour of precipitation in order to effectively transfer precipitation information from a location with sufficient observations to another where available records are scarce (Baeriswyl and Rebetez, 1997;Nguyen et al., 2002). More specifically, these methods have been developed and employed according to two main objectives: characterizing spatial dependency (homogeneity) and reducing uncertainty in the modelling of precipitation at different locations. The implication of spatial dependency is that the precipitation records from two or more stations in a delineated homogenous group have similar statistical characteristics. González and Valdés (2008) stress that regionalization is a powerful tool for improving the accuracy of precipitation estimation at a location of interest. Consequently, evaluation of the similarity of observed precipitation series at different locations is a first step in reducing uncertainty, leading to the reliable and accurate estimation of precipitation over an area or over multiple sites (González and Valdés, 2008).
Because it may be problematic in hydroclimatological studies to delineate regions over data-sparse areas using only in-situ precipitation records, some studies consider the inclusion of large-scale circulation patterns to help construct meaningful homogeneous regions (Satyanarayana and Srinivas, 2008;Satyanarayana and Srinivas, 2011;Asong et al., 2015). To estimate the effects of large-scale patterns to local weather conditions, it is essential to use long-term weather information rather than short-term data in order to avoid noise resulting from local weather variability. The previous studies use monthly time series of historical observations and atmospheric re-analysis from National Centers for Environmental Prediction (NCEP) for describing the effect of large scale atmospheric circulation on local weather condition. However, at relatively small spatial scales, such as our study area, local geographic features such as elevation, wind direction, and aspect play the most significant roles of forming spatial patterns of weather conditions (Burns, 1953;Daly et al., 1994;Daly et al., 2008). Therefore, in this study we evaluate a regionalization method that includes statistics derived from in-situ daily precipitation measurements, but no large scale circulation features.
Many regionalization methods determine the statistical similarity of precipitation at different locations based on the correlation of precipitation amount. A limitation of these conventional approaches is that the resulting regions do not include information on spatial variation in precipitation occurrence (wet/dryday). In this study we propose a regionalization technique based on the similarities of both precipitation amount and occurrence at different locations using the combination of principal component analysis (PCA) and ordinal factor analysis (OFA). These statistical techniques identify groups of stations with similar precipitation statistics based exclusively on the station records: the spatial location of stations is not included in the analysis. The reason that they are sometimes referred to as "regionalization" techniques is that, due to the physics of precipitation processes, stations with similar statistical properties often naturally fall into spatially contiguous regions.
In addition to the regionalization, stochastic weather generators (SWGs) have been used to generate synthesized hydrometeorologic information in agricultural, hydrological, or climate change impacts studies (Wilks and Wilby, 1999;Semenov and Barrow 2002). Schuurmans and Bierkens (2006) address the spatial variability of daily precipitation as a major contribution to discharge, groundwater level and soil moisture. A number of studies on multi-site (or distributed) SWGs have also been proposed (Wilks, 1998;Wilby et al., 2003;Khalili et al., 2009;Asong et al., 2016;Peleg et al., 2017;Evin et al., 2018). Three main components of multisite SWGs are the representation of precipitation amount, of precipitation occurrence, and of spatial variability. In general, major uncertainty in models of precipitation amount is to select an appropriate distribution to account for statistics of the observed variables. This is because the data pool is not a population but a sample. Once a suitable distribution is chosen, the choice of parameterization method adds additional uncertainty. Moreover, Markov Chain Models have been used with different order states to derive the random feature of wet/dry process. Although these SWGs provide synthesized information on daily time scales, and model statistics match the observed statistics (Wilks and Wilby, 1999), there is no match between the daily observed and estimated data. Our study therefore proposes a stochastic model to transfer daily precipitation information from a station to another based on actual observed daily data rather than the long-term observed data pool. Moreover, applying precipitation signals from outside the homogenous regime to the SWGs is likely to add more noise or uncertainty (Schuurmans and Bierkens, 2006). It could imply that the delineation of statistically homogenous weather region is an essential step prior to weather generation in order to represent spatial variability and accurate characteristics. Hence, the weather modelling in this study is conducted using the rain gauges within the identified homogeneous regions.
The remainder of this paper is organized as follows: section 2 describes the regionalization approach using OFA. section 3 describes the proposed stochastic model for estimating daily precipitation series at ungauged locations with our study region and data. In section 4, we present our calibration and validation analyses. Finally, our conclusions are presented in section 5.

| REGIONALIZATION USING OFA
Regionalization is more complicated for daily precipitation than for most other meteorological variables because it is a two-dimensional vector composed of both a discrete (i.e., occurrence) and highly skewed continuous (i.e., amount) components. In terms of regionalization of daily precipitation, while there is no agreement about which property is more important, most existing regionalization methods (see Figures S1 and S2 and Tables S1 and S2) consider only the correlation of precipitation amounts. Precipitation occurrence may however be important in regions such as the Catskill Mountains region where orography modulates precipitation patterns.
To explicitly include precipitation occurrence in a regionalization analysis, Yeo (2014) took the novel approach of applying a combination of PCA and OFA to precipitation observations from the southern part of Korean Peninsula, where two complex mountain systems play an important role modulating precipitation occurrence and amount. His study indicated that the inclusion of precipitation occurrence using OFA adds spatial information that is unavailable from PCA alone and improves simulation results generating ungauged daily precipitation series. As an extension of the unpublished study of Yeo (2014), here we apply a modified version of the Yeo (2014) approach to historical precipitation observations from the Catskill Mountains region, and evaluate whether the inclusion of occurrence information through OFA improves simulation results. Table 1 summarizes the statistical techniques that have been used to identify factors for categorical and interval data (Bartholomew et al., 2011).
OFA, a type of latent trait model (see Table 1), has been proposed for analysing categorical observations (Moustaki, 2000;Moustaki and Knott, 2000;Jöreskog and Moustaki, 2001;Bartholomew et al., 2011). OFA employs a tetrachoric correlation matrix (described in the Figures S1 and S2 and Tables S1 and S2), which is defined for categorical observations (Moustaki, 2000;Jöreskog and Moustaki, 2001), rather than the more common covariance and Pearson correlation methods that are used in PCA.

| Study area
The Catskill Mountains, located in southeastern New York State, between latitude 41.78 and 42.35 N and longitude 75.12 and 74.13 W (Figure 1), are the home of six reservoirs and associated watersheds that historically provide approximately 90% of the drinking water to NYC. This regionalization study encompasses these watersheds and the surrounding region of about 31,000 km 2 in New York and Pennsylvania. As a part of the Appalachian Mountains, numerous peaks in this region are over 900 m above sea level. The complex topography includes ridges oriented from the southwest to northeast, as well as a southeast-to-northwest oriented escarpment defining the northeastern boundary of the region. Mean seasonal precipitation ( Figure 2) is greater along the southeastern escarpment where coastal storms laden with tropical moisture provide much of the precipitation, while mid-latitude storms with various tracks are also important in this region (Towey et al., 2018).

| Data used for calibration and validation of the regionalization technique
Historical precipitation data from 106 rain gauge stations (Table 2) within this region were obtained from the Northeast Regional Climate Center (NRCC) and National Climatic Data Center (NCDC) of National Oceanic and Atmospheric Administration (NOAA). As shown in Figure 3, the number of stations measuring daily precipitations over the study area has decreased significantly since 1960. This decrease is particularly evident within the water supply region (bold boundary shown in Figure 1) where the number of stations dropped from 80 stations in the 1950s to 38 stations in the 1980s. Based on these changes in data availability, the period 1949-1959 was chosen to calibrate this regionalization analysis in order to ensure that the regionalization study includes as much spatial detail and as many stations as possible. Daily precipitation time series from 80 stations that do not have any periods of missing data longer than 2 months are used for calibration. To validate the regionalization results, the same analysis is performed for the interval 1981-1991, when fewer stations (38 stations) are available. A comparison of the regionalization analyses conducted during the calibration (1950s) and validation (1980s) periods is used to indicate whether the delineated climatic regions are dependent on time (The validation methodologies for the stochastic weather model in combination with the regionalization results are described in sections 3.5 and 3.6).

| Regionalization
In the implementation described here, PCA and OFA are employed sequentially in order to identify homogeneous groups of daily precipitation stations. PCA is performed first to identify "groups" with correlated precipitation amount. The number of principal components (PCs) retained for further analysis is based on Kaiser's rule (Kaiser, 1958), which includes only PCs with eigenvalues greater than 1. OFA is then performed on each PCA group independently, resulting in a set of OFA factors and loadings. The factors are then rotated using PROMAX (Hendrickson and White, 1964), an oblique rotation that maintains the maximum differences, but not necessarily the orthogonality, between the rotated factors, and provides more robust results than orthogonal rotation methods (White et al., 1991). As a result of the combined PCA and OFA analyses, each station is placed in one group with other stations that have similar statistical properties for precipitation occurrence and amount.

| A stochastic precipitation model for Ungauged locations
Data from stations within each region are used to estimate precipitation at other stations, or at ungauged locations, within the region, including probabilistic information derived from using a stochastic model. The stochastic modelling of the precipitation process is based on the combination of two different components: the modelling of precipitation occurrence and of precipitation amount.

| Precipitation occurrence model
The modelling of precipitation occurrence is based on the homogenous grouping given by the application of the PCA/OFA regionalization method. Let F j be the factor score for a given day j as defined by: where e i is the factor loading associated with the rain gauged station i within an identified homogeneous region of s stations, O i, j is the precipitation occurrence at a station i for day j, and S is the number of stations used in the calculation procedure. The minimum threshold of 1 mm/day is used to define a series of O i, j . With the homogenous region identified by the OFA regionalization method, the factor score F j represents the ratio of the number of stations in a given region where precipitation occurs on day j to all stations within the identified region: in other words, the probability of precipitation occurrence. A random number chosen from the uniform distribution r o j 0≤r o j ≤1 is then used to determine precipitation occurrences for each day j by comparing the random number to the value of F j as the follows: if r o j ≥F j ,then wet at day j if r o j <F j , then dry at day j

| Precipitation amount model
Regarding the modelling of precipitation amount, a logtransformation technique is applied to the highly skewed precipitation distribution to more closely approximate a normal distribution (Hay et al., 1991): where Y i, j represents the corresponding log-transformed precipitation amount, and P i, j denotes the precipitation amount at a station i and day j. When a day is determined to be dry by comparing to the threshold (1 mm/day), the logtransformed amount at day j is set to zero in order to avoid negative and undefined (for P i, j = 0) values in the new transformed regime. In accordance with the log-transformation, the mean P ð Þ of P i, j can be written by the mean (μ Y ) and the variance (σ 2 Y ) of Y i, j as follows: Thus, the regional expected value of the daily precipitation amount can be estimated by.
in which P R j is the regional mean of the daily precipitations for day j, μ RY j is the regional mean of the log-transformed daily precipitations for day j, and W j is the correction factor for matching the expected values of the original and logtransformed daily precipitations.
After the daily regional mean precipitation ( P R j ) has been estimated, a daily precipitation amount is stochastically generated by multiplying a random scaling factor (;) and the regional mean. The random scaling factor was implemented for generating variance of daily precipitation amount using the random number sampled from a uniform distribution (0 < r a ≤ 1) (Hay et al., 1991;Wilby et al., 1994). Equations (1) and (5) can be used to generate missing precipitation series at a given site within a coherent region.

| Cross-validation of the ungauged precipitation model during the calibration period
To cross-validate the stochastic model, daily precipitation data from stations in the identified homogeneous region during the calibration period (1950s) are used. Each station is deleted one at a time, and the other stations are used to estimate precipitation at the excluded station. After excluding each station, the PCA/OFA model is calibrated, and the daily precipitation series are estimated based on the data available at the remaining stations in the OR2. The computation procedure for estimating the daily precipitation series at an ungauged site can be summarized as follows: 1. Apply the PCA regionalization approach to the daily precipitation amount series. 2. Apply the OFA to each group identified by PCA to delineate the homogeneous regions. 3. Assume one rain gauge station in a homogenous region is an ungauged site. After excluding the selected rain gauge station data, calculate the tetrachoric correlation coefficients matrix using the daily precipitation occurrence data from the remaining rain gauge stations in the region. 4. Generate an ensemble of 20 stochastic realizations of daily precipitation occurrence series for an ungauged station based on the computed factor scores (F j ) (Equation (1)) and generated random number chosen from the uniform distribution (r j ). 5. Generate an ensemble of 20 realizations of daily precipitation amount series using the regional average of precipitation amount and the calculated weights (Equation (5)). 6. To keep precipitation rainfall only for the wet days, multiply the occurrence series and the amount series.
To evaluate the contribution of OFA to this weather generation procedure, three variations of this procedure were tested and compared to each other. In the first variation, referred to as Model-1, regions are defined by the application of PCA (i.e., OFA is completely excluded from the analysis). In other words, instead of applying OFA to the stations in Group 1, PCA is applied a second time to the stations in group 1, and then a stochastic weather model excluding the modelling part of daily occurrence is used. Model-2 is implemented by removing the second step completely, and calibrating the SWG on all stations in Group 1 to investigate the effect of heterogeneity on the weather generator. Model-3 is the full model described in the previous section, including both PCA and OFA.

| Validation of the ungauged precipitation model
To validate the proposed weather model, we focus on one station, Grahamsville, because historical daily precipitation series at this station exists during both the calibration and validation periods. The computational procedure was carried out for both the calibration and validation periods. The computed precipitation series at Grahamsville are then statistically analysed and compared to the observed precipitation data using and number of metrics. First, precipitation occurrence is evaluated based on several indices: annual and monthly number of wet days; the Fraction Correct (FC) index, the Proportion of Wet-day Correct (PWC) known as the hit rate, and the Critical Success Index (CSI) (Wilks, 2011). The last three metrics are calculated based on the contingency table (Table 3) or forecast. Precipitation amount is evaluated based on annual and monthly mean and root-mean-square-error (RMSE).

| RESULTS
The regionalization analysis confirmed that, although the statistical techniques do not include any information about geographic location, the factor loadings are concentrated in particular regions. Such spatial clustering is reasonable based on the physics of precipitation: the prevailing winds and storm tracks combined with orography result in particular regional patterns of precipitation.

| Calibration of the regionalization model
Five PCs with eigenvalues greater than 1 explain 77.55% of the temporal variance in precipitation during the calibration period (1949)(1950)(1951)(1952)(1953)(1954)(1955)(1956)(1957)(1958)(1959), and are therefore retained for further analysis (Table 4). In other words, the original time series from 80 stations has been reduced to only five components. PC5, which does not load onto a spatially coherent region and accounts for only~1% of the total variance, is excluded from the remainder of the analysis. Figure 4 shows the rotated loading patterns corresponding to the remaining four extracted PCs. These loading patterns indicate that, while each component has some effect throughout the study area, each component is associated with a particular region. The first component has the largest factor loadings over the southeastern and eastern portions of the study area; component 2 loads most heavily over the northwestern corner of our region; component 3 loads most heavily over the northeastern escarpment; and component 4 over the southeastern and northeastern corners of the study region. Based on these component loading patterns, we identify groups of stations that load most heavily on each component and fall within well-defined regions. Figure 5 shows the four groupings of stations identified by PCA. Group I, which includes stations that have their highest component loadings on PC1, represents a large part of Catskill Mountains including four of the six watersheds of the NYC water supply system (e.g., Ashokan, Neversink, Pepacton, and Rondout watersheds). The Cannonsville watershed and western Catskill Mountains region fall in Group II, while the northwestern Schoharie watershed is classified into Group III. Group IV includes stations located to the north and south of the NYC watersheds.
OFA was then applied within each of the four PCA regions. The results of the OFA analyses demonstrate that the spatial groupings identified by PCA based on precipitation amount can be further subdivided by OFA based on precipitation occurrence. OFA results are summarized in Figure 6. Eleven climatic regions (OR1 to OR11), where OR denotes a region defined by OFA, are identified. PCA Group I was further divided into three regions: OR1, covering the Pepacton watershed area and its higher elevation eastern ridges; OR2, covering the eastern/lower elevation Ashokan and Rondout basins; and OR3, covering the Neversink basin. OFA identifies two climatic regions within PCA Group II: within (OR4) and outside (OR5) of the Cannonsville watershed. Three distinct regions are found within PCA Group III: OR6 dominates most of the Schoharie watershed; OR7 dominates the signal mostly outside of the NYC watersheds; and OR8 dominates the lower elevation area (Grand Gorge and Manor Kill stations) of Schoharie and higher portion (Relay and Stamford stations) from Cannonsville. Finally, PCA Group IV is divided into three different sub-regions (OR9, OR10, and OR11) that lie outside of the watersheds. Results of the application indicate that the geographical features including elevation and the ridgelines (or divides) between watersheds play a role as demarcating regions and have a considerable influence on climatic patterns. Figure 7 shows validation results based on application of the same procedure to daily precipitation data from the smaller group of 38 stations for 1981-1991 (methodology described in section 3.2). The similarity of regionalization results, despite the fact that they are based on fewer stations and performed with data from a different decade, suggests that the spatial pattern of regions of similar precipitation statistics is fairly constant through time, and is therefore largely controlled by topography (Baeriswyl and Rebetez, 1997;Comrie and Glenn, 1998).

| Cross-validation of the Ungauged precipitation model during the calibration period
As further validation of the model, the stochastic weather models were employed to generate daily precipitation time series for "ungauged" locations (methodology described in section 3.5). Figures 8 and 9 show the comparison results using two metrics (bias and RMSE) for annual total amount of precipitation, annual total number of wet-days, maximum consecutive wet and dry days. Each boxplot shows the range in statistical results from all stations. Bias values closer to zero and lower RMSE values imply that a model more accurately simulates the precipitation time series. Hence, the results shown in these figures suggest that Model-3 provides the most accurate simulation of annual precipitation amounts and occurrences. Because the average annual total amounts over the region lie between 1,000 mm and 1,500 mm, the error of Model-3 is less than 10%. Figure 9 demonstrates that Model-3 is also superior for simulating sequences of precipitation occurrences (e.g., dryspell and wet-spell). The figure demonstrates that Model-3 has the smallest biases and smallest RMSEs for both the annual mean number of wet days, and for the maximum number of consecutive wet days. Table 5 shows the numerical evaluation results using three metrics (FC, PWC, and CSI) for validation at the Grahamsville station (methodology described in section 3.6).

| Performance of the ungauged precipitation model
The high values of FC and CSI are 87.3 and 64.5%, and the low values are 83.9 and 55.7%, respectively. Harpham and Wilby (2005) also employed these metrics to assess their model performances. In their applications, the values of FCs are the range between 70 and 82%, and those of CSI are between 30 and 50%. The proposed stochastic model in this study yielded higher values of the evaluation metrics than the  Harpham and Wilby (2005). This result implies that the proposed approach is able to provide an accurate description of the precipitation occurrences for this station as indicated by the high values of the evaluation metrics. Figure 10 shows observed and simulated time series of annual and monthly means of precipitation amount and number of wet-days during the calibration period. In each panel the solid line represents the observed values, blue dotted line denotes the median values of the 20 ensembles, and the range (shaded blue) represents the upper and lower limits of the simulation. In all panels the observed time series falls within the simulated range for most data points, and the model captures the interannual variability, interannual trends, and the shape of the annual cycles of both precipitation amount and number of wet days. Figure 11 shows results similar to Figure 10 except for the validation period. Note that the stochastic model used in this figure for the validation period uses parameters estimated during the calibration period only: no data from the validation period was used to calibrate the parameters. The graphical results show on average good agreement between F I G U R E 7 Regions identified by OFA for hydrologically homogeneous daily precipitation with the period (1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991) in the Catskill Mountains region F I G U R E 8 Boxplots for bias and RMSE of annual total precipitation amounts to evaluate the performances of different models observed statistics and the medians of 20 ensembles over both calibration and validation periods. More specifically, the values for the calibration are closer to the observed values. It would be because the weather model for the calibration period uses more information from more stations available than for the validation period. These results suggest that the proposed stochastic weather model for ungauged daily precipitation sequences is able to describe accurately the annual and monthly statistics of observed data during periods outside of the validation period. Figure 12 shows monthly mean precipitation amount and monthly mean number of wet-days during the calibration period for each region. The results shown in this section include statistics for each region calculated from observations, not from the stochastic model. The purpose of showing these results is to demonstrate that the regionalization identified by the model is indeed able to separate regions with different statistical characteristics.

| Climatic differences between regions
OR1 (near the Pepacton Basin), OR2 (near the Ashokan basin and the southeastern section of our study area), and OR3 (near the Rondout and Neversink Basins) receive more precipitation than other regions during all seasons except summer. However, these do not exactly correspond to the region with the most frequent wet days: the most wet-days are observed in OR1 (the Pepacton Basin), OR4 (the Cannonsville Basin), and OR5 (the northern section of our study area).
In the lower elevations of Ashokan and Rondout (OR2), winter through spring mean precipitation is higher than other regions while the number of wet-days is lower than other regions, indicating more frequent high intensity precipitation events. In general, western regions (OR1, OR4, and OR5) have more precipitation days than eastern regions (OR6, OR8, and OR11). Two sub-regions (OR6 and OR8) of the Schoharie watershed experience similar precipitation amounts, but differ in precipitation occurrence. Here, OR6 includes the higher elevation northeastern escarpment of the Catskill Mountains, where larger orographically moderated storms tend to occur.
The regional differences described above result from the spatial distribution of large precipitation events associated with coastal storms that mostly affect the southern and southeastern portions of our region; and the more frequent events associated with mid-latitude systems moving coming from the west or northwest, sometimes picking up moisture over the Great Lakes, that predominantly affect the western and northwestern portions of the study area; as documented descriptively by Thaler (1996) and analytically by Towey et al. (2018).

| CONCLUSION
In this study, a novel statistical approach based on the combined application of PCA and OFA is used to identify regions of "homogeneous" precipitation, in the sense that stations within regions have similar statistics of precipitation occurrence and amount. PCA is employed to identify stations with correlated precipitation amount, the results of which allow identification of regional patterns. PCA-based regions are referred to as "groups" to distinguish them from the OFA-based regions. OFA is then applied within each PCA-based group, providing further regionalization at finer spatial scales based on precipitation occurrence. The method was calibrated using daily precipitation data from a network of 80 gauged stations in Catskill Mountains region for the period 1949-1959. This period was selected F I G U R E 1 0 Observed and estimated annual/monthly means of precipitations and the number of wet-days for the calibration period (1949)(1950)(1951)(1952)(1953)(1954)(1955)(1956)(1957)(1958)(1959)  in order to maximize the number and density of active stations. Overall, 11 regions within which precipitation variations have coherent variations are delineated. Validation results from a more recent period (1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991), with fewer stations, confirms that the regions have not changed over recent decades. These results suggest that physical geography plays an important role in demarcating not only hydrological regimes (watersheds) but also regions with distinct climatic characteristics. Differences in average number of wet-days in the delineated regions are more marked than differences in monthly average precipitation amount.
Furthermore, these regions closely match the authors' understanding of spatial climatic variations in this region based on experience working with data as well as published literature. These results are consistent with the hypothesis that storm tracks and orographic effects in combination are the primary factors in determining the spatial precipitation pattern for storms.
We validate the regionalization approach based on its accuracy in simulating daily precipitation series at ungauged locations. After eliminating one station for which data was available during both the calibration (1950s) and validation F I G U R E 1 1 Observed and estimated annual/monthly means of precipitations and the number of wet-days for the validation period (1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)  (1980s) periods, we estimated daily precipitation series using the proposed stochastic model at the location of the eliminated station. It was found that the model successfully estimates precipitation characteristics, including precipitation occurrence as well as precipitation amount, during both periods. Moreover, a comparative analysis indicates that the model including both PCA and OFA provides statistically significant improvement in the estimates of daily precipitation over the model with PCA only.
An important application of regionalization is the transfer of hydrological information from one station to another, or to a region where data are absent. Conventional approaches, based only on precipitation amount, do not consider spatial variation in the wet/dry-day at a specific location. The approach presented here addresses this limitation by including precipitation occurrence in addition to amount. These results demonstrate that this multivariate statistical approach provides information on the spatial patterns of precipitation that is unavailable from precipitation amount alone. They also demonstrate that a stochastic precipitation model can be used in conjunction with the multivariate methods to estimate the daily precipitation series at an ungauged site. Moreover, the application results of the current study region (Catskill Mountain Region) and the previous study region (Korean Peninsula) in Yeo (2014) have indicated that the proposed stochastic precipitation model could be useful for generating unmeasured daily precipitation series.
Despite the importance given to the accurate information for watersheds situated in mountainous regions, the application of SWGs or weather generators for mountainous regions are relatively uncommon because the increased heterogeneity of climatic conditions in geographically complicated areas is likely to diminish model accuracy (Breinl et al., 2017). Acharya et al. (2017) applied a SWG (WeaGET) (Chen et al., 2012) to the same study region without considering a spatial component. The proposed model in this study generates daily information based on observed daily precipitation at nearby stations, thus it is possible to apply to other regions.
While most SWGs such as LARS-WG (Semenov and Barrow 2002), WGEN (Richardson and Wright, 1984), and WeaGET (Chen et al., 2012) synthesize daily hydrometeorologic information using parameters of a selected distribution and a transition probability matrix with the Markov Chain, the proposed model generates daily information based on observed daily precipitation at nearby stations resulting in improved simulations of both precipitation amount and occurrence. However, we acknowledge that the accuracy of F I G U R E 1 2 Monthly means of (a) precipitation amounts and (b) monthly number of wet-days for each OR for the calibration period (1949)(1950)(1951)(1952)(1953)(1954)(1955)(1956)(1957)(1958)(1959) this model may be sensitive to the homogeneity of the region and the number of observations within the region. In subsequent work, we will further refine our stochastic model by including geographical information in the model, and evaluate the applicability of this model to provide improved estimates of precipitation, and improved modelling of streamflow, over basins of varying spatial scales, from ungauged sub-basins with areas on the order of 10 km 2 to the entire watershed region.