Next Article in Journal
Peierls–Bogolyubov’s Inequality for Deformed Exponentials
Next Article in Special Issue
Modeling Multi-Event Non-Point Source Pollution in a Data-Scarce Catchment Using ANN and Entropy Analysis
Previous Article in Journal
Information Geometry of Non-Equilibrium Processes in a Bistable System with a Cubic Damping
Previous Article in Special Issue
An Entropy-Based Generalized Gamma Distribution for Flood Frequency Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generalized Beta Distribution of the Second Kind for Flood Frequency Analysis

1
College of Hydropower & Information Engineering, Huazhong University of Science & Technology, Wuhan 430074, China
2
Department of Biological and Agricultural Engineering & Zachry Department of Civil Engineering, Texas A&M University, College Station, TX 77843, USA
*
Author to whom correspondence should be addressed.
Entropy 2017, 19(6), 254; https://doi.org/10.3390/e19060254
Submission received: 18 April 2017 / Revised: 18 May 2017 / Accepted: 26 May 2017 / Published: 12 June 2017
(This article belongs to the Special Issue Entropy Applications in Environmental and Water Engineering)

Abstract

:
Estimation of flood magnitude for a given recurrence interval T (T-year flood) at a specific location is needed for design of hydraulic and civil infrastructure facilities. A key step in the estimation or flood frequency analysis (FFA) is the selection of a suitable distribution. More than one distribution is often found to be adequate for FFA on a given watershed and choosing the best one is often less than objective. In this study, the generalized beta distribution of the second kind (GB2) was introduced for FFA. The principle of maximum entropy (POME) method was proposed to estimate the GB2 parameters. The performance of GB2 distribution was evaluated using flood data from gauging stations on the Colorado River, USA. Frequency estimates from the GB2 distribution were also compared with those of commonly used distributions. Also, the evolution of frequency distribution along the stream from upstream to downstream was investigated. It concludes that the GB2 is appealing for FFA, since it has four parameters and includes some well-known distributions. Results of case study demonstrate that the parameters estimated by POME method are found reasonable. According to the RMSD and AIC values, the performance of the GB2 distribution is better than that of the widely used distributions in hydrology. When using different distributions for FFA, significant different design flood values are obtained. For a given return period, the design flood value of the downstream gauging stations is larger than that of the upstream gauging station. In addition, there is an evolution of distribution. Along the Yampa River, the distribution for FFA changes from the four-parameter GB2 distribution to the three-parameter Burr XII distribution.

1. Introduction

Estimation of flood magnitude for a given recurrence interval T (T-year flood) at a given location is essential for the design of hydraulic and civil infrastructure facilities, such as dams, spillways, levees, urban drainage, culverts, road embankments, and parking lots. A key step in flood frequency estimation or analysis (FFA) is the selection of a suitable frequency distribution [1]. Commonly used distributions for flood frequency analysis include Gumbel, gamma, generalized extreme value (GEV), Pearson type III (P-III), log-Pearson type III (LP-III), Weibull, and log-normal (LN). Some of these distributions have been adopted in different countries. For example, the P-III distribution has been adopted in China and Australia as a standard method for hydrologic frequency analysis [2,3,4]. The LP-III distribution has been adopted in the United States and the GEV distribution in Europe.
Mielke and Johnson investigated the use of two special cases of the generalized beta distribution of the second kind, namely gamma and log normal distributions, for flood frequency analysis [5]. Wilks investigated the performance of eight three-parameter probability distributions for precipitation extremes using annual and partial duration data from stations in the northeastern and southeastern United States [6]. He found that the beta-κ distribution best described the extreme right tail of annual extreme series, and the beta-P distribution was best for the partial duration data.
Recently, some generalized frequency distributions have been used for hydrologic frequency analysis. For example, Perreault et al. presented a family of distributions, named Halphen distributions, for frequency analysis of hydrometeorological extremes [7]. Papalexiou and Koutsoyiannis used the generalized gamma distribution and generalized beta distribution of the second kind (GB2) for rainfall frequency analysis across the world and showed that these distributions were appropriate for worldwide rainfall data [8]. The greatest advantage of these generalized distributions is that they provide sufficient flexibility to fit a large variety of data sets, which facilitates the selection and comparison of different distributions. For instance, the GB2 distribution includes the exponential, Weibull, and gamma distributions as special cases. Since the GB2 distribution has four parameters, logically it should perform better than 3-parameter distributions, such as GEV, P-III, LP-III or LN-III. Papalexiou and Koutsoyiannis concluded that the GB2 distribution was a suitable model for rainfall frequency analysis because of its ability to describe both J-shaped and bell-shaped data [8]. The other advantages of the GB2 distribution can be summarized as: (1) the GB2 distribution can model positive or negative skewness which is an advantage over distributions, such as lognormal, with only positive skew; (2) it can jointly estimate both location and shape parameters, while many other distributions, such as exponential, logistic, normal, etc., usually focus on location only; and (3) it can better capture the long right or left tail. Because of these advantages, the GB2 distribution was employed in this study.
The second step in flood frequency analysis is to estimate parameters of the selected distribution. There are several standard parameter estimation methods, such as moments, maximum likelihood, L-moments, probability weighted moments, and least square. Among these methods, the maximum likelihood (ML) and L-moment methods are widely used in hydrology. In addition, the principle of maximum entropy (POME) has been applied to parameter estimation [9,10]. Singh and Guo indicated that POME method was comparable to ML and L-moment methods, and for certain situations, POME method was superior to these two methods [11]. Therefore, the POME method was considered in this study for parameter estimation.
Another aspect of FFA that is of interest is how the flood frequency distribution evolves from upstream to downstream along a river. The drainage area along the river increases from upstream to downstream. It is interesting to investigate if the same frequency distribution applies at all gauging stations along the stream.
The objective of this study therefore is to employ the GB2 distribution for flood frequency analysis (FFA). The specific objectives are to: (1) estimate the GB2 distribution parameters using the principle of maximum entropy; (2) evaluate the performance of the GB2 distribution and compare it with commonly used distributions in hydrology; (3) select the best distribution; and (4) discuss the evolution of frequency distribution and its parameters along the river.

2. GB2 Distribution

The generalized beta distribution of the second kind, denoted as GB2, is a four-parameter distribution and can be expressed as:
f ( x ) = r 3 β B ( r 1 , r 2 ) ( x β ) r 1 r 3 1 ( 1 + ( x β ) r 3 ) ( r 2 + r 1 )
where B(·) is the beta function; β is the scale parameter, β > 0; and r1 > 0, r2 > 0, and r3 > 0 are the shape parameters. Parameter r3 represents the overall shape; parameter r1 governs the left tail; parameter r2 controls the right tail; and β is a scale parameter and depends on the unit of measurement. These parameters allow the distribution to be able to fit data having very different histogram shapes. It can simulate both the J-shaped and bell-shaped distributions. Parameters r1 and r2 together determine the skewness of the distribution. The general shapes of GB2 probability density distribution were shown in Figure 1.
When analyzing extreme rainfall, Papalexiou and Koutsoyiannis showed that the GB2 distribution is a very flexible four-parameter distribution [8]. By fixing certain parameters, the GB2 distribution can yield some well-known distributions, such as the beta distribution of the second kind (B2), the Burr type XII, generalized gamma (GG), and so on. These distributions can be treated as special or limiting cases of the GB2 distribution, as shown in Figure 2. Some of these special cases have been applied in hydrological frequency analysis. For example, Shao et al. employed the Burr type XII distribution for flood frequency analysis [2].

3. Estimation of Parameters of GB2 Distribution by POME Method

The GB2 distribution parameters were determined using the principle of maximum entropy (POME). The POME method involves the following steps: (1) specification of constraints; (2) maximization of entropy using the method of Lagrange multipliers; (3) derivation of the relation between Lagrange multipliers and constraints; (4) derivation of the relation between Lagrange multipliers and distribution parameters; and (5) derivation of the relation between distribution parameters and constraints. These steps are discussed in Appendix A. Here only steps (1) and (5) are outlined.
Flood discharge is considered as a random variable X, which ranges from 0 to infinite. Its probability distribution function (PDF) and cumulative distribution function (CDF) are denoted as f(x) and F(x) respectively, where x is a specific value of X. Since constraints encode the information that can be given for the random variable, following Singh (1998), the constraints for the GB2 distribution can be expressed as:
0 f ( x ) d x = 1
0 f ( x ) ln x d x = E ( ln x )
0 f ( x ) ln ( 1 + ( x β ) r 3 ) d x = E ( ln ( 1 + ( x β ) r 3 ) )
The first constraint is the total probability law, the second constraint is the mean of log values or the geometric mean, and the third constraint is the mean of log of scaled values raised to a power and then shifted by unity.
Following the derivation in Appendix A, the relation between parameters and constraints can be expressed as:
ln β 1 r 3 φ ( r 1 ) + 1 r 3 φ ( r 2 ) = E ( ln x ) β r 3 φ ( r 2 ) β r 3 φ ( r 1 + r 2 ) = E ( β r 3 . ln ( 1 + ( x β ) r 3 . ) ) ln β + 1 r 3 2 φ ( r 1 ) + 1 r 3 2 φ ( r 2 ) = var ( ln x ) φ ( r 2 ) φ ( r 1 + r 2 ) = var ( ln ( 1 + ( x β ) r 3 . ) )
where ϕ(.) is the digamma function; and ϕ′(.) is the trigamma function. Detailed information for deriving these relationships can be found in Appendix A.

4. Flood Frequency Analysis

For FFA, three problems were addressed. First, the GB2 distribution was tested using observed flood data, and was compared with commonly used distributions in hydrology. Second, a method for selecting the best distribution was discussed. Third, flood frequency analysis was carried out at several gauging stations from upstream to downstream, and the evolution of frequency distribution along the stream was investigated.

4.1. Flood Data

Flood data from eight gauging stations on the Colorado River and its tributaries, as shown in Figure 3, were considered to test the performance of the GB2 distribution and discuss the evolution of frequency distribution along the river. The Colorado River is the principal river of the Southwestern United States and northwest Mexico. It rises in the central Rocky Mountains, flows generally southwest across the Colorado Plateau and through the Grand Canyon. The basin boundary consists of mountains that are 13,000 to 14,000 feet (3962.4 m to 4267.2 m) high in Wyoming, Colorado, and Utah; and the boundary drops to elevations of less than 1000 feet (304.8 m) at Hoover Dam. The northern part of the river basin in Colorado and Wyoming is a mountainous plateau that ranges from 5000 to 8000 feet (1524 m to 2438 m) in elevation, which encompasses deep canyons, rolling valleys, and intersecting mountain ranges. The central and southern portions of the basin in eastern Utah, northwestern New Mexico, and northern Arizona consist of rugged mountain ranges interspersed with rolling plateaus and broad valleys. In general, the mountains in the southern part of the basin are much lower than those in the northern part. Of the eight gauging stations considered in this study, gauging stations or sites 1, 2 and 3 are on the Yampa River which is a secondary tributary of the Colorado River. Sites 4, 5, 6, 7 and 8 are on the mainstream of the Colorado River. Site 8 is near the location of the Hoover Dam. The data of these gauging stations is directly downloaded from USGS (United States Geological Survey) website. The characteristics of flow data of these gauging stations, including length of the data, mean, standard deviation, skewness, and kurtosis, were calculated, as shown in Table 1. Since there is a dam, named Glenn Canyon, regulating the river flow past Lees Ferry (shown in Figure 3), the characteristics of the flow at the Hoover dam (site 8) are quite different from those at sites 4, 5, 6 and 7 upstream. It can be seen from Table 1 that for sites 1 to 7 the mean values increase from upstream to downstream, as more rainfall or water flows into the river. Since the standard deviation is related to the flood magnitude, it also increases with the mean value. For site 8, considering the impact of reservoir operation, some streamflow was stored in the reservoir, which leads that the streamflow at site 8 is reduced. The skewness is positive for all gauging stations, indicating that the right tail is longer or fatter than the left side and the mass of distribution is concentrated on the left side. Kurtosis is a measure of the peakedness of the probability distribution. The skewness and kurtosis values in the mainstream are generally lower than those in the tributaries.

4.2. Performance Measures

For evaluating the performance of the GB2 distribution, two measures were employed: (1) the root mean square deviation (RMSD); and (2) the Akaike information criterion (AIC). These methods assess the fitted distribution at a site by summarizing the deviations between observed discharges and computed discharges.
A frequently used method for assessing the goodness-of-fit of a function is the RMSD [12]. This method was used by NERC (1975) for ranking candidate distributions [13]. RMSD can be expressed as:
R M S D = 1 n i = 1 n ( Q t h e ( i ) Q e m p ( i ) Q e m p ( i ) ) 2
where n is the sample size; Qthe is the computed discharge at the ith plotting position. Qemp denotes the observed ith smallest discharge. The value of RMSD is from 0 to 1. The samller is, the better the distribution fits.
AIC is a measure of the relative quality of statistical models for a given set of data. It also includes a penalty that is an increasing function of the number of estimated parameters. The AIC value was calculated as [14]:
AIC = n(ln (MSE)) + 2 K
where K is the number of parameters of the distribution, and MSE was calculated by
M S E = 1 n i = 1 n ( Q t h e ( i ) Q e m p ( i ) ) 2
Given a set of candidate models for the data, the preferred model is the one with the minimum AIC value.

4.3. Evaluation of GB2 Distribution

Annual maximum flood peak data from four gauging stations, namely sites 2, 6, 7 and 8 in Figure 3, were selected. The empirical frequencies were calculated first. The purpose of defining the empirical distribution is to compare it with selected theoretical distributions in order to verify whether they fit sample data.
Many plotting positions are proposed, most of which can be expressed in general form:
P i = i a n + 1 2 a
where a is a constant having values from 0 to 0.5 in different formula, 0.5 for Hazen’s formula, 0.3 for Chegadayev’s formula, zero for Weibull’s formula, 3/8 for Blom’s formula, 1/3 for Tukey’s formula, and 0.44 for Gringorten’s formula.
Among these formulars, Gringorten’s formular is recoganized by lots of researchers, especially for GEV, gumbel, exponential, Generalized pareto distributions which have been widely used for flood frequency analysis [15,16,17,18,19,20]. The Gringorten formula is also used for GB2 distribution. For normal, generalized normal and Gamma distributions, the Blom’s formula is recommended [21,22]. For Pearson type 3 and log Pearson type 3 distributions, Weibull’s formula is recommended [18,21]. The GB2 distribution was employed to fit the annual maximum (AM) series of the four sites. The distribution parameters were estimated using Equation (3) and given in Table 2. The fitted GB2 distributions and empirical frequency of each AM series are shown in Figure 4. In the left of Figure 4, the line represents the fitted distribution and circle the empirical frequencies of observations. Results show that the marginal distributions fit the empirical data well. Histograms of AM flood peak series fitted by the GB2 distribution for the gauging stations on the Colorado River are shown in the right section of Figure 4. It also indicates that the GB2 distribution can successfully be fitted to empirical histograms.
Several distributions, including normal, exponential, gamma, Gumbel, generalized normal, pearson type III, log Pearson type III, generalized Pareto, and generalized extreme-value that are commonly used in hydrology, were fitted to the AM series at this site. The L-moment method was used to estimate the parameters of these distributions.
Singh and Guo compared the POME method with the L-moment method, and indicated that the two methods are comparable [11,23,24]. Therefore no matter what method is used, it has little influence on the value of the T-year design discharge. The Kolmogorov-Smirnov test was used here to compare a sample with a reference probability distribution. The p-value was calculated and given in Table 3 as well. The higher or more close to 1 the p-value is the more similar the theoretical and empirical distributions are. It is indicated from Table 3 that the p-value of GB2 distribution is 1 or close to 1, which demonstrates that the GB2 distribution fit the data better. Table 3 also listed the RMSD and AIC values computed for the fitted GB2 distribution using Equations (4)–(7). The smaller the RMSD and AIC values are, the better the distribution fits. For the site streamboat springs, the GB2 and generalized normal distributions have the smallest RMSD values, which is equal to 0.025. For the site Near Cisco, the GB2 has the smallest RMSE values, which is equal to 0.061. For the site Near Colorado-Utah, the GB2 and gamma distributions have the smallest RMSE value. For the site Hoover dam, the GB2 distribution has the smallest RMSE value. Since the GB2 distribution have more parameters, the AIC values of GB2 distribution are larger than those of generalized normal, Gamma and GEV distributions. Thus, generally GB2 distribution gives a getter fit.
In order to compare the POME with the current used method, the maximum likelihood (ML) method was also employed for the parameter estimation of GB2 distribution. Taking the site Near Colorada-Utah for an example, the estimated parameters by POME and ML method are given in Table 4. The p-value, RMSE and AIC values are also given in Table 4. It is indicated that the parameters obtained by the two method are more or less the same. And the RMSE and AIC values based on the POME method are smaller.

4.4. Flood Frequency Analysis

The Hoover dam is a multi-purpose dam, serving the needs of flood control, irrigation, water supply, and hydropower generation. Therefore, it was desired to determine the most appropriate distribution for FFA at the dam site. The T-year design flood at Hoover dam was calculated using each distribution, as given in Table 5, and it can be seen that different distributions yielded significantly different values. For example, the 1000-year design flood values calculated by the GB2 and gamma distributions were 76,702 and 50,485 ft3/s, respectively. The RMSD and AIC values for GB2 distribution (Gamma distribution) were 0.036 (0.057) and 1098.8 (1192.9), respectively, which indicates that the performance of GB2 distribution is much better than that of the gamma distribution. It concludes that if the gamma distribution were used, the design flood would be underestimated and potential flood risk would be higher.

4.5. Change in Flood Frequency Distribution with Change in Drainage Area

The GB2 distribution was applied for FFA along the main stem of the Colorado River. Four gauging stations (sites 4, 5, 6 and 7) from upstream to downstream were used, as shown in Figure 3 and Table 6. These gauging stations were selected, because all these stations are on the mainstream and no dam has been built on this reach. The drainage area and statistical characteristics (including mean, skewness and kurtosis of the annual maximum data) of these stations were calculated, as given in Table 1. The T-year design flood of these gauging stations was calculated, as shown in Figure 5, in which the x-axis represents the return periods and the y-axis represents the design flood values. Figure 5 shows that for a given return period, the design flood value of the downstream gauging stations is larger than that of the upstream gauging stations. The increasing rates of drainage area and T-year design flood values between the adjacent gauging stations were computed, as given in Table 6, which indicates that the percentage increase of the drainage area was nearly the same as that of the design flood values. For instance, with the increase of drainage area up to 45% from the gauging station near Dotsero to that near Cameo, the flood value increased by 43% on average. It is also seen that from upstream to downstream, when the drainage area increased by 45%, 55% and 26%, the flood value increased by 43%, 42%, and 16%, respectively. It seems that in a mountainous watershed, the upstream the reach is, the greater the impact the drainage area has on flood. This may be because that the runoff coefficient is generally larger in the steep area.

4.6. Evolution of Frequency Distribution along Stream

In order to determine the evolution of frequency distribution and its parameters along the river, data from the Yampa River were applied, because this river is taken as one of the west’s last wild rivers and has only a few small dams and diversions. The Yampa River with a length of 402 km, located in northwestern Colorado, is a tributary of Green River and a secondary tributary of the Colorado River. Data from three gauging stations along this river, designated as sites 1, 2 and 3 in Figure 6, were used. The GB2 distribution was used to fit the AM series of each of the three gauging stations, as shown in Table 7. It can be seen that shape parameters r1 and r2 decreased along the river. The value of r1 became close to be 1. When r1 equals 1, the GB2 distribution becomes the Burr XII distribution [25]. This distribution has been shown to reasonably fit the income distribution data [20,26,27] and has recently been used in hydrology [2,28]. The PDF of Burr XII distribution can be written as:
f ( x ) = r 3 b B ( 1 , r 2 ) ( x b ) 1 × r 3 1 ( 1 + ( x b ) r 3 ) ( r 2 + 1 ) = r 3 r 2 b ( x b ) r 3 1 ( 1 + ( x b ) r 3 ) ( r 2 + 1 )
where b is the scale parameter. The Burr XII distribution was also used to fit the data at the gauging station near Maybell of Yampa River. The estimated parameters of Burr XII distribution were: r2 = 1.94, r3 = 4.19, and b = 12.33. The fitting results of the GB2 and Burr distributions for the gauging station near Maybell are shown in Figure 7. For the gauging station near Maybell, parameters of the GB2 distribution estimated by POME method are nearly as the same as the parameters of the Burr XII distribution estimated by MLE method. Thus, Burr XII distribution instead of GB2 distribution can be used for FFA at that station. In other words, the distribution for FFA changes from the four-parameter GB2 distribution to the three-parameter Burr XII distribution along the Yampa River. There is an evolution of distribution along this river. From Equation (1), the value of scale parameter β increases with the mean value, because more water flows into the stream. Parameters r1 and r2 govern the left and right tails, respectively. The smaller the value of r1, the fatter the left tail is; and the smaller the value of r2, the fatter the right tail is. It can be seen from Table 7 that both r1 and r2 decrease along the stream, which demonstrates that both the left and right tails become fatter, and the PDF values become larger in these areas and lower in the central area.

5. Conclusions

The GB2 provides sufficient flexibility to fit a large variety of data sets. Papalexiou and Koutsoyiannis introduced this distribution in hydrology and used it for rainfall frequency analysis [8]. In this study, the generalized beta distribution of the second kind (GB2) is introduced for FFA for the first time. The POME method was proposed to estimate the parameters of GB2 distribution. Equations of POME method was deduced by ourselves and given in Appendix A. The Colorado River basin was selected as a case study to test the performance of GB2 distribution. Frequency estimates from the GB2 distribution were also compared with those of commonly used distributions in hydrology. In addition, some characteristics of FFA in mountainous areas are discussed. The conclusions can be summarized as follows:
(1)
Results demonstrate that the GB2 is appealing for FFA, since it has four parameters which allows the distribution to be able to fit data having very different histogram shapes, such as the J-shaped and bell-shaped distributions. And by fixing certain parameters, the GB2 distribution can yield some well-known distributions, such as the beta distribution of the second kind (B2), the Burr type XII, generalized gamma (GG), and so on.
(2)
The parameters estimated by POME method are found reasonable. Both the marginal distributions and histograms indicates that the GB2 distribution can successfully be fitted to empirical values using the POME method.
(3)
The performance of the GB2 distribution is better than that of the widely used distributions in hydrology. For the site streamboat springs, the GB2 and generalized normal distributions have the smallest RMSD values. For the site Near Cisco, the GB2 has the smallest RMSE values. For the site Near Colorado-Utah, the GB2 and gamma distributions have the smallest RMSE value. For the site Hoover dam, the GB2 distribution has the smallest RMSE value. Since the GB2 distribution have more parameters, the AIC values of GB2 distribution are larger than those of generalized normal, Gamma and GEV distributions. Thus, generally GB2 distribution gives a getter fit.
(4)
When using different distributions for FFA, significant different design flood values are obtained. It concludes that if the wrong distribution were used, the design flood would be underestimated and potential flood risk would be higher.
(5)
The design flood value increase with the drainage area. For a given return period, the design flood value of the downstream gauging stations is larger than that of the upstream gauging stations. In this study, the percentage increase of the drainage area was nearly the same as that of the design flood values. It seems that in a mountainous watershed, the upstream the reach is, the greater the impact the drainage area has on flood. This may be because that the runoff coefficient is generally larger in the steep area.
(6)
There is an evolution of distribution along this river. Along the Yampa River, the distribution for FFA changes from the four-parameter GB2 distribution to the three-parameter Burr XII distribution. And both r1 and r2 decrease along the stream, which demonstrates that both the left and right tails become fatter, and the PDF values become larger in these areas and lower in the central area, which means that when the drainage area become larger, the flood magnitudes has a more significant variation.

Acknowledgments

The project was financially supported by the National Natural Science Foundation of China (51679094, 51509273, 91547208 and 41401018), Fundamental Research Funds for the Central Universities (2017KFYXJJ194, 2016YXZD048).

Author Contributions

Vijay P. Singh conceived and designed the experiments; Lu Chen performed the experiments and analyzed the data; Lu Chen wrote the draft of the paper and Vijay P. Singh revised it. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Estimation of Parameters of GB2 Distribution

The GB2 distribution parameters can be estimated by maximizing the Shannon entropy H(X) which, for a random variable X, can be expressed as:
H ( X ) = 0 f ( x ) log f ( x ) d x
where f(x) is the probability density function (PDF). The principle of maximum entropy (POME) indicates that the most appropriate PDF is the one that maximizes the value of entropy, given available data and a set of known constraints [29].
Specification of Constraints: Following Singh, the constraints for the GB2 distribution can be expressed as
0 f ( x ) d x = 1
0 f ( x ) ln x d x = E ( ln x )
0 f ( x ) ln ( 1 + ( x β ) r 3 ) d x = E ( ln ( 1 + ( x β ) r 3 ) )
Method of Lagrange Multipliers for Maximizing Entropy: In the search for an appropriate probability distribution for a given random variable, entropy should be maximized. In other words, the best fitted distribution is the one with the highest entropy. The method of Lagrange multipliers was used to obtain the appropriate probability distribution with the maximum entropy. Finally, the form of this distribution is given as:
f ( x ) = exp ( λ 0 λ 1 ln ( x ) λ 2 ln ( 1 + ( x β ) r 3 ) )
in which λ0, λ1,and λ2 are the Lagrange multipliers. Let p = β r 3 . Then, Equation (A3a) can be written as
f ( x ) = exp ( λ 0 λ 1 ln ( x ) λ 2 ln ( 1 + p x r 3 ) )
Let λ 2 = λ 2 p and q = r3. Papalexiou and Koutsoyiannis defined the entropy-based PDF as:
f ( x ) = exp ( λ 0 λ 1 ln ( x ) λ 2 p ln ( 1 + p x q ) )
Substitution of Equation (A4) in Equation (A2a) yields:
0 f ( x ) d x = 0 exp ( λ 0 λ 1 ln ( x ) λ 2 p ln ( 1 + p x q ) ) d x = 1
From Equation (A5):
exp ( λ 0 ) = 0 exp ( λ 1 ln x λ 2 ln ( 1 + p x q ) / p ) d x = 0 exp ( λ 1 ln x ) exp ( λ 2 p ln ( 1 + p x q ) ) d x = 0 x ( λ 1 ) ( 1 + p x q ) ( λ 2 p ) d x
Let t = p x q . Then x = ( t p ) 1 q , and d x = 1 p q ( t p ) 1 q 1 d t . Thus, Equation (A6) can be expressed as:
exp ( λ 0 ) = 0 x ( λ 1 ) ( 1 + p x q ) ( λ 2 p ) d x = 0 ( t p ) λ 1 q ( 1 + t ) λ 2 p 1 p q ( t p ) 1 q 1 d t = 0 1 q p λ 1 1 q t λ 1 q ( 1 + t ) λ 2 p t 1 q 1 d t
Let y = t 1 + t . Then t = y 1 y , and d t = 1 ( 1 y ) 2 d y .
Since y ( 0 ) = 0 and y ( ) = 1 , y [ 0 , 1 ] .
exp ( λ 0 ) = 0 1 1 q p λ 1 1 q ( y 1 y ) λ 1 q ( 1 + y 1 y ) λ 2 p ( y 1 y ) 1 q 1 1 ( 1 y ) 2 d y = 0 1 1 q p λ 1 1 q ( y 1 y ) λ 1 + 1 q 1 ( 1 + y 1 y ) λ 2 p 1 ( 1 y ) 2 d y = 0 1 1 q p λ 1 1 q ( y 1 y ) λ 1 + 1 q 1 ( 1 1 y ) λ 2 p + 2 d y = 0 1 1 q p λ 1 1 q ( y ) 1 λ 1 q 1 ( 1 y ) 1 λ 1 q + λ 2 p 1 d y = 1 q p λ 1 1 q B ( 1 λ 1 q , 1 λ 1 q + λ 2 p )
The Lagrange multiplier λ 0 can be calculated from Equation (A8) as:
λ 0 = ln q + λ 1 1 q ln ( p ) + ln Γ ( 1 λ 1 q ) + ln Γ ( 1 λ 1 q + λ 2 p ) ln Γ ( λ 2 p )
From Equation (A4), the other equation for calculating λ 0 can be defined as:
λ 0 = ln ( 0 exp ( λ 1 ln x λ 2 p ln ( 1 + p x r 3 ) ) d x )
Relation between Lagrange multipliers and constraints: Defining a = 1 λ 1 q and b = 1 λ 1 q + λ 2 p , differentiate Equation (A9) with respect to λ 1 and λ 2 :
λ 0 λ 1 = ln p q + ln Γ ( a ) a a λ 1 + ln Γ ( b ) ( b ) b λ 1 ln Γ ( a + b ) ( a + b ) ( a + b ) λ 1 = ln p q 1 q φ ( a ) + 1 q φ ( b )
λ 0 λ 2 = ln Γ ( b ) ( b ) b λ 2 ln Γ ( a + b ) ( a + b ) ( a + b ) λ 2 = 1 p φ ( b ) 1 p φ ( a + b )
where ϕ(.) is a digamma function. Differentiate Equation (A10) with respect to λ 1 and λ 2 :
λ 0 λ 1 = 0 ln x exp ( λ 1 ln x λ 2 p ln ( 1 + p x q ) ) d x 0 exp ( λ 1 ln x λ 2 p ln ( 1 + p x q ) ) d x = E ( ln x )
λ 0 λ 2 = 0 x q exp ( λ 1 ln x λ 2 p ln ( 1 + p x q ) ) d x 0 exp ( λ 1 ln x λ 2 p ln ( 1 + p x q ) ) d x = E ( ln ( 1 + p x q ) p )
Based on Equations (A11) and (A12), the relation between Lagrange multipliers and constraints can be expressed as:
ln p q 1 q φ ( a ) + 1 q φ ( b ) = E ( ln x )
1 p φ ( b ) 1 p φ ( a + b ) = E ( ln ( 1 + p x q ) p )
Since there are four parameters, Equations (A13a) and (A13b) are not sufficient for calculating parameters, and two additional equations are needed that are given as:
2 λ 0 2 λ 1 = 1 q 2 φ ( a ) + 1 q 2 φ ( b ) = var ( ln x )
2 λ 0 2 λ 2 = φ ( r 2 ) φ ( r 1 + r 2 ) = var ( ln ( 1 + ( x β ) q ) )
Relation between Lagrange multipliers and parameters: Substituting Equation (A8) in Equation (A4), it is known that:
f ( x ) = 1 1 q p λ 1 1 q B ( 1 λ 1 q , 1 λ 1 q + λ 2 p ) x λ 1 ( 1 + p x q ) λ 2 p
Equation (A15) is the GB2 distribution. Comparing Equation (1) with Equation (A15), the following equations can be obtained:
λ 1 = 1 r 1 q λ 2 = p ( r 2 + 1 λ 1 q ) p = ( 1 β ) r 3 q = r 3
Relation between parameters and constraints: Based on the relation between parameters and constraints, and parameters and Lagrange multipliers, the relation between parameters and constraints can be expressed as:
ln β 1 r 3 φ ( r 1 ) + 1 r 3 φ ( r 2 ) = E ( ln x ) β r 3 φ ( r 2 ) β r 3 φ ( r 1 + r 2 ) = E ( β r 3 . ln ( 1 + ( x β ) r 3 . ) ) ln β + 1 r 3 2 φ ( r 1 ) + 1 γ 3 2 φ ( r 2 ) = var ( ln x ) φ ( r 2 ) φ ( r 1 + r 2 ) = var ( ln ( 1 + ( x β ) r 3 . ) )

References

  1. Beven, K.J.; Hornberger, G.M. Assessing the effect of spatial pattern of precipitation in modeling streamflow hydrographs. J. Am. Water Resour. Assoc. 1982, 18, 823–829. [Google Scholar] [CrossRef]
  2. Shao, Q.; Wong, H.; Xia, J.; Ip, W. Models for extremes using the extended three-parameter Burr XII system with application to flood frequency analysis. Hydrol. Sci. J. 2004, 49, 685–702. [Google Scholar] [CrossRef]
  3. Chen, L.; Guo, S.L.; Yan, B.W.; Liu, P.; Fang, B. A new seasonal design flood method based on bivariate joint distribution of flood magnitude and date of occurrence. Hydrol. Sci. J. 2010, 55, 1264–1280. [Google Scholar] [CrossRef]
  4. Chen, L.; Singh, V.P.; Guo, S.; Hao, Z.; Li, T. Flood coincidence risk analysis using multivariate copula functions. J. Hydrol. Eng. 2012, 17, 742–755. [Google Scholar] [CrossRef]
  5. Mielke, P.W., Jr.; Johnson, E.S. Some generalized beta distributions of the second kind having desirable application features in hydrology and meteorology. Water Resour. Res. 1974, 10, 223–226. [Google Scholar] [CrossRef]
  6. Wilks, D.S. Comparison of three-parameter probability distributions for representing annual extreme and partial duration precipitation series. Water Resour. Res. 1993, 29, 3543–3549. [Google Scholar] [CrossRef]
  7. Perreault, L.; Bobée, B.; Rasmussen, P. Halphen distribution system. I: Mathematical and statistical properties. J. Hydrol. Eng. 1999, 4, 189–199. [Google Scholar] [CrossRef]
  8. Papalexiou, S.M.; Koutsoyiannis, D. Entropy based derivation of probability distributions: A case study to daily rainfall. Adv. Water Resour. 2012, 45, 51–57. [Google Scholar] [CrossRef]
  9. Singh, V.P. Entropy Based Parameter Estimation in Hydrology; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1998. [Google Scholar]
  10. Singh, V.P. Entropy-Based Parameter Estimation Hydrology; Springer: Dordrecht, The Netherlands, 1998. [Google Scholar]
  11. Singh, V.P.; Guo, H. Parameter estimation for 3-parameter generalized Pareto distribution by the principle of maximum entropy (POME). Hydrol. Sci. J. 1995, 40, 165–181. [Google Scholar] [CrossRef]
  12. Karim, A.; Chowdhury, J.U. A comparison of four distributions used in flood frequency analysis in Bangladesh. Hydrol. Sci. J. 1995, 40, 55–66. [Google Scholar] [CrossRef]
  13. Natural Environment Research Council. Flood Studies Report; Natural Environment Research Council: London, UK, 1975; Volumes 1–5. [Google Scholar]
  14. Zhang, L.; Singh, V.P. Bivariate flood frequency analysis using the copula method. J. Hydrol. Eng. 2006, 11, 150–164. [Google Scholar] [CrossRef]
  15. Ross, R. Graphical method for plotting and evaluating weibull distribution data. In Proceedings of the 4th International Conference on Properties and Application of Dielectric Materials, Brisbane, Austrialia, 3–8 July 1994; pp. 250–253. [Google Scholar]
  16. Cunnane, C. Unbiased plotting positions—A review. J. Hydrol. 1978, 37, 205–222. [Google Scholar] [CrossRef]
  17. Makkonen, L. Notes and correspondence plotting positions in extreme value analysis. J. Appl. Meteorol. Clim. 2006, 45, 334–340. [Google Scholar] [CrossRef]
  18. Shabri, A. A Comparison of plotting formulas for the pearson type III distribution. J. Technol. 2002, 36, 61–74. [Google Scholar] [CrossRef]
  19. Gringorten, I.I. A plotting rule for extreme probability paper. J. Geophys. Res. 1963, 68, 813–814. [Google Scholar] [CrossRef]
  20. Dagum, C. A New Model of Personal Income Distribution: Specification and Estimation. In Modeling Income Distributions and Lorenz Curves; Springer: New York, NY, USA, 2008; pp. 3–25. [Google Scholar]
  21. Mehdi, F.; Mehdi, J. Determination of plotting position formula for the normal, log-normal, pearson(III), log-pearson(III) and gumble distribution hypotheses using the probability plot correlation coefficient test. World Appl. Sci. J. 2011, 15, 1181–1185. [Google Scholar]
  22. Kim, S.; Shin, H.; Kim, T.; Taesoon, K.; Heo, J. Derivation of the probability plot correlation coefficient test statistics for the generalized logistic distribution. In Proceedings of the International Workshop Advances in Statistical Hydrology, Taormina, Italy, 23–25 May 2010; pp. 1–8. [Google Scholar]
  23. Singh, V.P.; Guo, H. Parameter estimation for 2-parameter log-logistic distribtuion distribution (LLD2) by maximum entropy. Civ. Eng. Syst. 1995, 12, 343–357. [Google Scholar] [CrossRef]
  24. Singh, V.P.; Guo, H. Parameter estimations for 2-parameter Pareto distribution by pome. Stoch. Hydrol. Hydraul. 1980, 9, 81–93. [Google Scholar]
  25. Burr, I.W. Cumulative Frequency Functions. Ann. Math. Stat. 1942, 13, 215–232. [Google Scholar] [CrossRef]
  26. Kleiber, C.; Kotz, S. Statistical Size Distributions in Economics and Actuarial Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
  27. Singh, S.K.; Maddala, G.S. A function for size distribution of incomes. Econometrica 1976, 44, 963–970. [Google Scholar] [CrossRef]
  28. Hao, Z.; Singh, V.P. Entropy-based parameter estimation for extended Burr XII distribution. Stoch. Environ. Res. Risk Assess. 2008, 23, 1113–1122. [Google Scholar] [CrossRef]
  29. Singh, V.P. Hydrologic synthesis using entropy theory: Review. J. Hydrol. Eng. 2011, 16, 421–433. [Google Scholar] [CrossRef]
Figure 1. Shapes of PDF of GB2 distribution.
Figure 1. Shapes of PDF of GB2 distribution.
Entropy 19 00254 g001
Figure 2. The GB2 distribution and its special cases (where BR12 means the Burr XII distribution; BR3 means the Burr III distribution; B2 means the beta distribution of second kind; Fisk means log-logistic distribution; L means the Lomax distribution; IL means inverse Lomax distribution; GA distribution means the gamma distribution; GN means the generalized normal distribution; W means the Weibull distribution and EXP means the exponential distribution).
Figure 2. The GB2 distribution and its special cases (where BR12 means the Burr XII distribution; BR3 means the Burr III distribution; B2 means the beta distribution of second kind; Fisk means log-logistic distribution; L means the Lomax distribution; IL means inverse Lomax distribution; GA distribution means the gamma distribution; GN means the generalized normal distribution; W means the Weibull distribution and EXP means the exponential distribution).
Entropy 19 00254 g002
Figure 3. Locations of gauging stations on the Colorado River.
Figure 3. Locations of gauging stations on the Colorado River.
Entropy 19 00254 g003
Figure 4. Marginal distributions and histograms of AM flood peak series fitted by the GB2 distribution for the gauging stations on the Colorado River. (a) Steamboat springs; (b) Near Colorado-Utah; (c) Near Cisco; (d) Hoover Dam.
Figure 4. Marginal distributions and histograms of AM flood peak series fitted by the GB2 distribution for the gauging stations on the Colorado River. (a) Steamboat springs; (b) Near Colorado-Utah; (c) Near Cisco; (d) Hoover Dam.
Entropy 19 00254 g004aEntropy 19 00254 g004b
Figure 5. Flood values along the mainstream of the upper Colorado River.
Figure 5. Flood values along the mainstream of the upper Colorado River.
Entropy 19 00254 g005
Figure 6. Evaluations of PDF of sites along the Yampa River.
Figure 6. Evaluations of PDF of sites along the Yampa River.
Entropy 19 00254 g006
Figure 7. Marginal distribution and histograms of AM flood peak series fitted by the GB2 and Burr XII distributions for the gauging station near Maybell on the Yampa River.
Figure 7. Marginal distribution and histograms of AM flood peak series fitted by the GB2 and Burr XII distributions for the gauging station near Maybell on the Yampa River.
Entropy 19 00254 g007
Table 1. Characteristics of the gauging stations used in the study.
Table 1. Characteristics of the gauging stations used in the study.
RiverNo.Gaging StationDrainage Area (Square Miles)Length of DataMean Value (ft3/s)Standard DeviationSkewnessKurtosis
Yampa River1Below Stagecoach Reservoir2281957–20143151890.913.49
2Steamboat Springs5671904–2013363011150.262.94
3Near Maybell33831904–201310,41936570.904.88
Colorado River4Near Dotsero43901941–2013987044500.392.59
5Near Cameo79861934–201319,04976870.262.68
6Near Colorado-Utah17,8471951–201326,71413,9360.843.53
7Near Cisco24,1001884–201334,32916,5200.362.31
8Hoover Dam171,7001934–201326,13168311.375.83
Table 2. Parameters of the GB2 distribution for the gauging stations along the Colorado River.
Table 2. Parameters of the GB2 distribution for the gauging stations along the Colorado River.
NumberLocationr1r2r3β
4Near Dotsero1.5860.301.7585.11
5Near Cameo1.1277.572.53112.93
6Near Colorado-Utah3.9483.080.9469.05
7Near Cisco2.7376.821.0780.90
8Hoover Dam10.59434.721.3143.62
Table 3. RMSE and AIC values of different distributions.
Table 3. RMSE and AIC values of different distributions.
NumberDistributionSteamboat SpringsNear CiscoNear Colorada-UtahHoover Dam
p-ValueRMSEAICp-ValueRMSEAICp-ValueRMSEAICp-ValueRMSEAIC
1GB20.9760.025924.10.9910.0611384.110.047852.710.0361098.8
2Normal0.9260.043992.90.7870.19415020.8390.22110310.4360.0811236.2
3Exponential0.4090.1221306.80.3360.1711669.60.8390.1451036.40.9190.0551152.6
4Gamma10.0451005.10.9590.0641512.810.047842.50.6920.0571192.9
5Gumbel0.9760.0661143.50.9590.0881546.210.107869.10.9780.0391122.3
6Generalized normal0.8440.025922.90.9910.1371455.510.083852.80.9780.0391106.7
7Pearson type III0.9760.035953.20.9910.1142510.054895.610.0581146.8
8Log Pearson type III0.9760.034951.30.9910.1061431.510.054893.110.0521133
9Generalized Pareto0.9760.07811580.9910.0621386.110.09960.210.0541169
10GEV0.9760.027929.60.9910.1381450.110.128865.510.0361096.8
Table 4. Parameters estimated by POME and ML methods for site Near Colorada-Utah.
Table 4. Parameters estimated by POME and ML methods for site Near Colorada-Utah.
Methodsr1r2r3βp-ValueRMSEAIC
POME2.1424.781.40157.1810.0169−357.76
ML2.2630.851.35158.5510.0170−357.80
Table 5. Comparison of T-year design flood discharges (103 ft3/s) calculated by different distributions for the Hoover dam site.
Table 5. Comparison of T-year design flood discharges (103 ft3/s) calculated by different distributions for the Hoover dam site.
NumberReturn Period10005001005010
1GB276.70267.91451.19834.13830.125
2Normal45.80044.45140.93834.28831.488
3Exponential68.56163.58352.02435.48630.508
4Gamma50.48548.42443.31434.61331.320
5Gumbel58.92655.33246.97334.79930.912
6Generalized normal50.51349.27145.32535.73231.485
7Pearson type III60.02556.45147.98535.14530.926
8Log Pearson type III69.56864.49452.71335.63931.858
9Generalized Pareto64.80959.87049.08434.89330.695
10GEV57.80954.76647.32435.27031.072
Table 6. Statistical characteristics of the four gauging stations, the increasing rate of drainage area and flood discharge between adjacent gauging stations.
Table 6. Statistical characteristics of the four gauging stations, the increasing rate of drainage area and flood discharge between adjacent gauging stations.
NumberLocationsDrainage Area (Square Miles)Increase in Drainage Area (%)Increase in in Flood Value (%)
10005001005010Mean
4Near Dotsero1137045404142474643
5Near Cameo2068355504844323542
6Near Colorado-Utah4622826111215222016
7Near Cisco62419
Table 7. Parameters of the GB2 distribution for four gauging stations along the Yampa River.
Table 7. Parameters of the GB2 distribution for four gauging stations along the Yampa River.
NumberLocationr1r2r3β
1Below stagecoach Reservoir17.4415.250.552.10
2Steamboat springs1.205.493.595.81
3Near Maybell1.142.073.9212.11

Share and Cite

MDPI and ACS Style

Chen, L.; Singh, V.P. Generalized Beta Distribution of the Second Kind for Flood Frequency Analysis. Entropy 2017, 19, 254. https://doi.org/10.3390/e19060254

AMA Style

Chen L, Singh VP. Generalized Beta Distribution of the Second Kind for Flood Frequency Analysis. Entropy. 2017; 19(6):254. https://doi.org/10.3390/e19060254

Chicago/Turabian Style

Chen, Lu, and Vijay P. Singh. 2017. "Generalized Beta Distribution of the Second Kind for Flood Frequency Analysis" Entropy 19, no. 6: 254. https://doi.org/10.3390/e19060254

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop