Introduction

These study notes are based on three Exam 7 syllabus readings related to stochastic methods for loss reserving: Using the ODP Bootstrap Model: A Practicioner’s Guide by Mark Shapland, Obtaining Predictive Distributions for Reserves Which Incorporate Expert Opinion by R.J. Verrall, and Stochastic Loss Reserving Using Bayesian MCMC Models by Glenn Meyers.

easypackages::packages("data.table", "DT", "ggplot2", "statmod")
## Loading required package: data.table
## Loading required package: DT
## Loading required package: ggplot2
## Loading required package: statmod
## All packages loaded successfully

Notation

  • \(w\) denotes accident periods
  • \(d\) denotes development age
  • The data triangle has \(n\) rows an columns
  • \(c(w,d)\) is the cumulative loss for accident year \(w\) as of development age \(d\)
  • \(U(w) = c(w,n)\) is the ultimate claim value for accident year \(w\) (assuming no tail development)
  • \(q(w,d)\) is the incremental loss for accident year \(w\) from age \(d-1\) to age \(d\)
  • \(m_{w,d}\) is a GLM estimate of \(q(w,d)\), and \(r_{w,d}\) is the corresponding unscaled Pearson residual
  • \(R(w) = U(w) - c(w,d)\) is the future development (reserve) after age \(d\) for accident year \(w\).
  • \(f(d)\) is a factor applied to cumulative losses to estimate the next incremental loss, i.e. \(q(w,d+1) = f(d)c(w,d)\).
  • \(F(d)\) is a factor applied to cumulative losses to estimate the next cumulative loss, i.e. \(c(w, d+1) = F(d)c(w,d)\).
  • \(h(k)\) is a factor relating to the diagonal \(k\) along which \(w+d\) is constant.
  • \(e(w,d)\) is the error that occurs at the estimate of accident year \(w\) and development age \(d\).
  • For a random variable \(x\), the notation \(x^*\) denotes a sample of that random variable.

Model Specifications

ODP formulation of the Chain Ladder Method

This approach involves specifying a Generalized Linear model whose results match those from the traditional chain ladder method. The correspondence relies on identifying an implicit assumption behind the chain ladder method: that each accident year has a parameter representing its relative level, namely, the cumulative loss on the most recent diagonal. A variation on the chain ladder method assumes that accident years are homogeneous, and the level parameter for each accident year can be estimate as the column averages: \[ \frac{\sum_{1\leq w \leq n-d+1} c(w,d)}{n-d+1} \] These estimates would replace the actual observed values along the diagonal when calculating estimates.

The Over-Dispersed Poisson Model models incremental claims using a GLM with a log link function and overdispersed Poisson error distribution. In other words, \[ E[q(w,d)] = m_{w,d} \] \[ \mathrm{Var}(q(w,d)) = \phi m_{w,d}^z \] and \[ \log(m_{w,d}) = \eta_{w,d} = c + \alpha_w + \beta_d \] where the parameters \(\alpha\) denote the relative level of each accident year, and \(\beta\) denotes the development pattern. (Note that in this formulation, \(\beta_d\) is treated as a cumulative factor. Shapland’s model matrices suggest that it is actually being treated as an incremental factor, but the resulting models should be the same.)
  • In this formulation, we assume that \(w=1\) and \(d=1\) are the base levels, so \(\alpha_1=0\) and \(\beta_1=0\). In an alternate formulation, the constant \(c\) can be removed and \(\alpha_1\) is no longer required to be 0.
  • The OPD model corresponds to \(z=1\). Alternative options include \(z=0\) (Normal), \(z=2\) (Gamma), and \(z=3\) (inverse Gaussian)
  • The model is generally only applied to paid losses, because incurred losses can have negative incremental values which require special treatment.

Note that due to the log link function, the ODP bootstrap model is equivalent to a model in which \(q(w,d)\) has mean \(x_wy_d\), where by normalization we can assume \(\sum_{1\leq d \leq n} y_d = 1\). In this formulation, \(x_w\) represents the expected ultimate cumulative losses for accident year \(w\), and \(y_d\) is the proportion of losses that emerge in each development period. The overdispersed Poisson assumption can also be replaced with an assumption of an overdispersed negative binomial distribution

The modelling approach can be illustrated using the following example from one of Shapland’s spreadsheets:

shapland.table.6 = fread("Stochastic Loss Reserving/shapland-6_year_table.csv")
shapland.table.6[, d_factor := as.factor(d)]
shapland.table.6[, w_factor := as.factor(w)]
datatable(shapland.table.6)

Fitting the model without an intercept, and assuming that \(\beta_d\) are cumulative factors, the model is as follows:

glm.CL.cumulative = glm(incremental_loss ~ w_factor + d_factor + 0, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.CL.cumulative)
## 
## Call:
## glm(formula = incremental_loss ~ w_factor + d_factor + 0, family = quasipoisson(link = "log"), 
##     data = shapland.table.6)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0493  -1.3869   0.3249   0.7592   1.6734  
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## w_factor1   4.6929     0.1249  37.576 4.25e-12 ***
## w_factor2   4.6693     0.1261  37.033 4.91e-12 ***
## w_factor3   4.7175     0.1248  37.792 4.01e-12 ***
## w_factor4   4.7301     0.1291  36.636 5.46e-12 ***
## w_factor5   4.7791     0.1320  36.204 6.14e-12 ***
## w_factor6   4.8283     0.1474  32.761 1.66e-11 ***
## d_factor2  -0.8473     0.1271  -6.665 5.60e-05 ***
## d_factor3  -1.5892     0.1896  -8.384 7.79e-06 ***
## d_factor4  -1.4100     0.2025  -6.963 3.89e-05 ***
## d_factor5  -2.3786     0.3809  -6.244 9.58e-05 ***
## d_factor6  -3.0834     0.7474  -4.125  0.00206 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 2.715116)
## 
##     Null deviance: 7375.114  on 21  degrees of freedom
## Residual deviance:   27.969  on 10  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

In this case, model predictions can be produced by exponentiating the sum of the corresponding \(w\) factor and \(d\) factor:

shapland.table.6[, GLM_prediction := predict(glm.CL.cumulative, type = "response")]
datatable(shapland.table.6[, .(w, d, incremental_loss, GLM_prediction)])

For example, the prediction for accident year 2 and development age 3 can be calculated as follows:

alpha_2 = glm.CL.cumulative$coefficients['w_factor2']
beta_3 = glm.CL.cumulative$coefficients['d_factor3']
q_2_3_estimate = exp(alpha_2 + beta_3)
print(paste0("The estimate for q(2,3) is ", q_2_3_estimate))
## [1] "The estimate for q(2,3) is 21.7607604017224"

The formulation involving incremental factors requires indicator variables for each age that is less than or equal to the age of the observation – in other words, it is equal to one for each incremental development factor that contributes to the cumulative factor.

shapland.table.6[, beta_2_ind := ifelse(d >= 2, 1, 0)]
shapland.table.6[, beta_3_ind := ifelse(d >= 3, 1, 0)]
shapland.table.6[, beta_4_ind := ifelse(d >= 4, 1, 0)]
shapland.table.6[, beta_5_ind := ifelse(d >= 5, 1, 0)]
shapland.table.6[, beta_6_ind := ifelse(d >= 6, 1, 0)]
datatable(shapland.table.6[,.(w, d, incremental_loss, beta_2_ind, beta_3_ind, beta_4_ind, beta_5_ind, beta_6_ind)])

The model using this specification is:

glm.CL.incremental = glm(incremental_loss ~ w_factor + beta_2_ind + beta_3_ind + beta_4_ind + beta_5_ind + beta_6_ind + 0, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.CL.incremental)
## 
## Call:
## glm(formula = incremental_loss ~ w_factor + beta_2_ind + beta_3_ind + 
##     beta_4_ind + beta_5_ind + beta_6_ind + 0, family = quasipoisson(link = "log"), 
##     data = shapland.table.6)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0493  -1.3869   0.3249   0.7592   1.6734  
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## w_factor1    4.6929     0.1249  37.576 4.25e-12 ***
## w_factor2    4.6693     0.1261  37.033 4.91e-12 ***
## w_factor3    4.7175     0.1248  37.792 4.01e-12 ***
## w_factor4    4.7301     0.1291  36.636 5.46e-12 ***
## w_factor5    4.7791     0.1320  36.204 6.14e-12 ***
## w_factor6    4.8283     0.1474  32.761 1.66e-11 ***
## beta_2_ind  -0.8473     0.1271  -6.665 5.60e-05 ***
## beta_3_ind  -0.7419     0.2059  -3.603  0.00482 ** 
## beta_4_ind   0.1792     0.2558   0.701  0.49948    
## beta_5_ind  -0.9686     0.4147  -2.336  0.04166 *  
## beta_6_ind  -0.7048     0.8277  -0.852  0.41436    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 2.715116)
## 
##     Null deviance: 7375.114  on 21  degrees of freedom
## Residual deviance:   27.969  on 10  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

Predictions from this model are as follows. (The “hat values” are explained later, but are produced now for convenience.)

shapland.table.6[, GLM_prediction_v2 := predict(glm.CL.incremental, type = "response")]
shapland.table.6[, GLM_hat := hatvalues(glm.CL.incremental)]
datatable(shapland.table.6[, .(w, d, incremental_loss, GLM_prediction, GLM_prediction_v2)])

Note that, rounding differences aside, both formulations produce the same predictions.

Equivalence of GLM formulation and Chain Ladder

The ODP formulation produces estimates that align with the Chain Ladder estimates of historical observations produced using the following approach:
  1. Calculate volume-weighted age-to-age and age-to-ultimate factors.
  2. Using the latest cumulative diagonal, multiply by the ratio of the age-to-ultimate factor for that diagonal, to the age-to-ultimate factor for each previous diagonal. (Alternately: divide by previous age-to-age factors, going backward.) This produces estimates of the past cumulative values.
  3. Convert the cumulative estimates to incremental estimates.

To illustrate this approach, the age-to-ulimate factors calculated by Shapland for this table are:

age.to.ultimate.6 = data.table(d = 1:6, age_to_ult = c(2.016, 1.411, 1.234, 1.073, 1.023, 1.00))
datatable(age.to.ultimate.6)

Join these to the table and identify the values corresponding to the most recent diagonal:

shapland.table.6 = merge(shapland.table.6, age.to.ultimate.6, by = "d", all.x = TRUE)
latest.diagonal.6 = shapland.table.6[w + d == 7, .(w, latest_diagonal_loss = cumulative_loss, latest_diagonal_LDF = age_to_ult)]
shapland.table.6 = merge(shapland.table.6, latest.diagonal.6, by = "w")
shapland.table.6[, cl_cumulative_estimate := latest_diagonal_loss * latest_diagonal_LDF / age_to_ult]
shapland.table.6[, cl_prev_cumulative_estimate := shift(cl_cumulative_estimate, type = "lag"), by = "w"]
shapland.table.6[is.na(cl_prev_cumulative_estimate), cl_prev_cumulative_estimate := 0]
shapland.table.6[, cl_incremental_estimate := cl_cumulative_estimate - cl_prev_cumulative_estimate]
datatable(shapland.table.6[, .(w, d, cumulative_loss, age_to_ult, latest_diagonal_loss, latest_diagonal_LDF, cl_cumulative_estimate, cl_incremental_estimate, GLM_prediction)])
The main advantages to using this approach, referred to as the “simplified GLM” or “ODP bootstrap” are that:
  • It is simpler to calculate, rather than needing to fit a GLM
  • It is more easily explainable to others
  • The GLM framework, with a log link function, does not allow for negative incremental values; this is not a restriction of the development factor approach

Alternative Model Formulations

Incurred Loss Triangles

The method can also by applied to incurred loss triangles, with the following considerations:
  • The resulting distribution will be possible outcomes of IBNR, not a distribution of unpaid claims
  • To convert to an unpaid loss distribution, one option is to run a paid data model concurrently, and use the random payment pattern from this model to convert the ultimate values from the incurred model to a payment pattern. This allows us to add value by incorporating information from the case reserves in the estimates of ultimate, while still focusing on the payment stream to measure risk.
  • An alternate approach is to model paid and case reserves separately.

Bornhuetter-Ferguson and Cape Cod models

These approaches can be used to address the weakness in the paid chain ladder method: that an abnormally high or low initial observation can produce volatile ultimate estimates. Options include:
  • For the Bornhuetter-Ferguson approach, sample a priori ultimate loss ratios from a distribution
  • For the Cape Cod approach, apply the algorithm to the sample triangle in each ODP bootstrap iteration to produce a stochastic Cape Cod projection

GLM Bootstrap approach

The GLM framework allows for more flexibility than the chain ladder method does. Examples include:
  • Parameters can be added to measure and adjust for calendar-year effects. Because a model will not have a solution if there are more parameters than observations, then it is recommended to start with a small number of parameters (e.g. one CY effect) and add more as needed.
  • Parameters can be reduced (e.g. by setting groups of parameters to be equal) in order to reduce overfitting
  • A GLM can be used to model non-triangular data. A common example is when only the 5 most recent diagonals will result in a “trapezoid” shape.

As an example, we can produce a model in which all accident years are at the same level, letting the intercept represent this common parameter.

glm.homogeneous.AY =  glm(incremental_loss ~ beta_2_ind + beta_3_ind + beta_4_ind + beta_5_ind + beta_6_ind, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.homogeneous.AY)
## 
## Call:
## glm(formula = incremental_loss ~ beta_2_ind + beta_3_ind + beta_4_ind + 
##     beta_5_ind + beta_6_ind, family = quasipoisson(link = "log"), 
##     data = shapland.table.6)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9724  -1.1892   0.2867   0.9872   1.6665  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.73766    0.05371  88.207  < 2e-16 ***
## beta_2_ind  -0.86646    0.10545  -8.217 6.18e-07 ***
## beta_3_ind  -0.75769    0.17375  -4.361 0.000559 ***
## beta_4_ind   0.16990    0.21601   0.787 0.443795    
## beta_5_ind  -0.98083    0.35143  -2.791 0.013708 *  
## beta_6_ind  -0.69315    0.70287  -0.986 0.339692    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.976119)
## 
##     Null deviance: 665.616  on 20  degrees of freedom
## Residual deviance:  30.367  on 15  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

Another approach is to use a single parameter for all development periods, which is equivalent to fitting \(d - 1\) as a continuous variable:

shapland.table.6[, d_shifted := d - 1]
glm.one.dev.parameter = glm(incremental_loss ~ w_factor + d_shifted + 0, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.one.dev.parameter)
## 
## Call:
## glm(formula = incremental_loss ~ w_factor + d_shifted + 0, family = quasipoisson(link = "log"), 
##     data = shapland.table.6)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9705  -1.0970   0.0000   0.8852   3.1063  
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## w_factor1  4.63924    0.16102  28.811 7.29e-14 ***
## w_factor2  4.61487    0.16153  28.570 8.18e-14 ***
## w_factor3  4.65711    0.15810  29.457 5.37e-14 ***
## w_factor4  4.61339    0.16331  28.249 9.56e-14 ***
## w_factor5  4.70300    0.16629  28.282 9.40e-14 ***
## w_factor6  4.82831    0.19185  25.167 4.68e-13 ***
## d_shifted -0.61329    0.06903  -8.885 3.96e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 4.600753)
## 
##     Null deviance: 7375.114  on 21  degrees of freedom
## Residual deviance:   62.894  on 14  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4
Drawbacks of this approach include:
  • By departing from the assumptions that result in an equivalence between the chain ladder method and the GLM, the chain ladder calculations can no longer be substituted. The need to fit a GLM at each iteration may slow down the simulation process
  • The model may not be as easily explained in terms of development factors

Practical challenges

Some of the practical challenges (with additional ones elabolated on below) associated with the ODP bootstrap approach include:
  1. Residuals should sum to zero if the model is a good fit, but in practice, they may not be. One way to address this would be to add a single constant to all non-zero residuals to ensure that the re-sampled triangles are not biased.
  2. When \(N\)-year weighted averages are used, the GLM approach can be used as normal. For the ODP bootstrap method, the entire triangle will have to be re-sampled so that cumulative values can be created. However, for consistency, only the \(N\)-year averages should be used for projecting from the resampled triangles.
  3. Options for dealing with missing values include:
    • Estimating them from surrounding values
    • LDFs can be calculated without missing values; in this case, the missing values will need to be resampled (because they are needed to calculate cumulative values) but should be excluded from projections
    • Use a GLM
  4. Outliers may be removed and treated similarly to missing values, though there is no need to remove the resampled points from projections – intention is to remove the extreme impact, not delete the cell entirely. Large number of outliers suggests the model is not a good fit to the data, and alternatives should be considered (new parameters, different error term)
    • The inter-quartile range is defined to be the range between the 25th and 75th percentiles, typically depicted as the box on a box-whisker plot. One way to define outliers is with respect to this range, e.g. beyond 3 times the range.
    • Extreme caution should be exercised when removing outliers, because they may represent realistic extreme values that we still want to include in the analysis.
    • Residual outliers could be an indication that residuals are not normally distributed, and if so, we want to keep them in so that the simulation can replicate the shape of the distribution.
  5. Exposures may have changed dramatically over the years, e.g. due to rapid growth or runoff. This can be addressed by dividing claim data by exposures, and modelling on a pure premium basis rather than a total loss basis. At the end of the similation, results would be multiplied by exposure to get the total loss estimate. In a GLM context, the exposure-adjusted losses would be the target variable, with exposures as weights.
  6. Development beyond the triangle can be modelled by incorporating a distribution for the tail factor in the simulations. In a GLM approach, the final development parameter could be taken to extend indefinitely to obtain a tail factor.
  7. The observed residual distribution may not reflect the extent of extreme outcomes. In this case, a distribution can be fitted to the residuals to allow for the possibility that a sampled value could be more extreme than any of the residuals observed so far.

Negative incremental values

A log link GLM cannot handle negative response values, so one approach would to be to redefine the link function for negative values as \[ -\log(|q(w,d)|) \] and leaving it unchanged for positive values. This approach works when negative values are relatively rare; in particular, it won’t work if the sum of a column is negative. (For an incurred triangle, it is entirely possible for entire columns of incremental values to be negative in which case the GLM won’t be able to fit even with this transformation.) In this case, one approach would be to define \(\Psi\) to be the largest negative value, and adjust the response variable to \[ q^+(w,d) = q(w,d) - \Psi \] This transformation should be undone after predictions are made: \[ m_{w,d} = m_{w,d}^+ + \Psi \] Since fitted values can also be negative, then the standard deviation portion of the residuals will need to be replaced with the square root of the absolute value: \[ r_{w,d} = \frac{q(w,d) - m_{w,d}}{\sqrt{|m_{w,d}^z|}} \] The same adjustment must be made to the sampled triangle: \[ q^*(w,d) = r^* \sqrt{|m_{w,d}^z|} + m_{w,d} \]

Negatives during simulation

Negative fitted values can also lead to challenges when simulating from a Gamma distribution to add process variance.
  1. One option is to flip the Gamma distirbution in the \(y\)-axis: use the absolute value of the mean, and take the negative of the sampled result. A drawback of this is that it turns a right-skewed distribution into a left-skewed one.
  2. A second option is to shift the Gamma distribution to the left: use the absolute value of the mean, and add twice the (negative) mean to the sampled result. This maintains the skew of the Gamma distribution, while keeping the mean the same.
Negative incremental values can also result in extreme outcomes during simulations, especially if negative incremental losses occur in the first few periods and result in highly leveraged development factors. Options for addressing this include:
  • Remove the “bad” iterations from the simulation, and replace them with new ones. Caution should be exercised here, to ensure that the simulation does not understate the probability of extreme outcomes.
  • Recalibrate the model to address the cause of the negative values: examples include modelling losses gross of salvage and subrogation separtely from salvage and subrogation, or removing rows with sparse data (e.g. first year product was written).
  • Limit incremental values to a minimum value of zero

Heteroscedasticity

A key assumption is that the standardized Pearson residuals are independent and identically distributed; in particular, they have the same variance. Without this assumption, it’s not reasonable to take a residual from one development or accident period and apply it to another. Options for dealing with heteroscedasticity include:
  • In a GLM context, try different parameters (e.g. account for calendar year trends), or change the error term.
  • Stratified sampling involves grouping development periods with homogeneous variances, and sampling only within these groups. A drawback of this approach is that it reduces the variablility of the sample.
  • Hetero-adjustment parameters: group residuals based on variance, then multiply the residuals by the ratio of the overall standard deviation to the group standard deviation. As a result, all groups should now have the same standard deviation.
    • After sampling residuals, they should be divided by the adjustment factor for the group corresponding to the location for which an observation is being simulated to restore the original variance before calculating the resampled triangle, to ensure that the heteroscedastic variances are replicated.
    • The combined effect is to we multiply by the standard deviation of the group corresponding to the location for which the observation is being simulated, and divide by the standard deviation of the group from which the residual was drawn.
    • This allows for free sampling with replacement from the adjusted residuals
    • The adjustment factors are new parameters, which should be accounted for when calculating degrees of freedom
    • The adjustment factors should also be applied to the variance when simulating process variance
  • A third approach is to calculate a different scale parameter for each hetero group, using the formula below. These are also new parameters, so degree of freedom calculations will be affected. These can be used as-is when simulating process variance; for residual sampling, they provide a hetero-adjustment factor of \(h_i = \sqrt{\phi}/ \sqrt{\phi_i}\).

The scale parameter for a group \(G\) of size \(n_i\) can be calculated as the average of squared degree-of-freedom adjusted residuals: \[ \phi_i = \frac{\sum_{i \in G} \left( \sqrt{\frac{N}{N-p}} r_{w,d}\right)^2}{n_i} \]

Heteroecthesious Data

Data is homoecthesious if each accident year has similar exposures. Examples include:
  • Partial first development period: A typical example is a triangle with valuations at 6 months, 18 months, 30 months, etc. In this case, the only concern is the most recent accident year, which only has half a year of exposure. The 6-18 factor will extrapolate the first 6 months for a full accident year, so it is common practice to reduce projected future payments by half to remove the extra exposure.
  • Partial last calendar year data: a typical example is whene every data point in the triangle has 1 year of exposure, with the exception of the most recent diagonal, which only has 6 months. An approach here is to annualize the exposures in the last diagonal, determine development factors, then de-annualize the sampled triangle.

Bootstrap Method

The ODP bootstrap method samples from the residuals to generate a new table, and calculates a new estimate of the ultimate claim amount. Repeated sampling allows us to approximate the distribution of ultimate claim amounts:
  1. Calculate the residuals and scale parameter – several options are available at this step
  2. Sample the non-zero residuals with replacement
  3. Calculate a revised incremental triangle based on the residuals and the model predicted values: \(q^*(w,d) = r^*\sqrt{m_{w,d}^z} + m_{w,d}\)
  4. Cumulate the triangle and apply the chain ladder method to produce a new point estimate
  5. Repeat the above process to generate a distribution of point estimates from a large number of samples

For the ODP bootstrap method, the residuals used are the unscaled Pearson residuals, which are the same as the usual chain ladder residuals: \[ r_{w,d} = \frac{q(w,d) - m_{w,d}}{\sqrt{m_{w,d}^z}} \] The scale parameter can be calculated as in Clark’s paper, using a \(chi^2\) formula: \[ \phi = \frac{\sum r_{w,d}^2}{N - p} \] where \(N\) is the number of data points, and \(p\) is the number of parameters in the model, \(2(n-1) + 1\) in the case of the chain ladder approach.

The method can be applied to Shapland’s 6-year table as follows:

shapland.table.6[, unscaled_pearson_residual := (incremental_loss - cl_incremental_estimate) / sqrt(cl_incremental_estimate)]
shapland.table.6[, GLM_residual_pearson := residuals(glm.CL.cumulative, type = "pearson")]
datatable(shapland.table.6[,.(w, d, incremental_loss, cl_incremental_estimate, unscaled_pearson_residual, GLM_residual_pearson)])

The scale parameter can be calculated as follows:

N = 21
p = 11
phi = shapland.table.6[, sum(unscaled_pearson_residual^2)] / (N-p)
print(paste0("The scale parameter is ", phi))
## [1] "The scale parameter is 2.7179846116338"

For illustrative purposes, fix the following random sequence of residuals:

residual.sample = c(15, 12, 16, 8, 8, 18, 9, 10, 7, 9, 12, 14, 14, 11, 10, 4, 14, 7, 13, 11, 11)
shapland.table.6 = shapland.table.6[order(d, w)]
shapland.table.6$random_residual = shapland.table.6[residual.sample, 'unscaled_pearson_residual']
shapland.table.6[, sampled_incremental_loss := random_residual * sqrt(cl_incremental_estimate) + cl_incremental_estimate]
datatable(shapland.table.6[, .(w, d, incremental_loss, unscaled_pearson_residual, random_residual, sampled_incremental_loss)])

The framework can be improved by using different types of residuals. Scaled Pearson residuals are multiplied by a degree-of-freedom factor \[ f^{DoF} = \sqrt{\frac{N}{N-p}} \] so we obtain \[ r_{w,d}^S = \frac{q(w,d) - m_{w,d}}{\sqrt{m_{w,d}^z}} f^{DoF} \] However, these residuals are not standardized, meaning that they do not all have the same variance. To achieve this, we multiply the residuals by \[ f^H_{w,d} = \sqrt{\frac{1}{1 - H_{i,i}}} \] where \(H_{i,i}\) are the diagonals of the “hat” matrix from the GLM. This produces the standardized Pearson residuals: \[ r_{w,d}^H = \frac{q(w,d) - m_{w,d}}{\sqrt{m_{w,d}^z}} f^H_{w,d} \] The scale parameter can be approxmiated as the average square of these residuals: \[ \phi^H = \frac{\sum (r_{w,d}^H)^2}{N} \] The above method can be repeated using the “hat” residuals:

shapland.table.6[, hat_residual := unscaled_pearson_residual * sqrt(1 / (1 - GLM_hat))]
datatable(shapland.table.6[,.(w, d, incremental_loss, cl_incremental_estimate, unscaled_pearson_residual, hat_residual)])
shapland.table.6 = shapland.table.6[order(d, w)]
shapland.table.6$random_residual_hat = shapland.table.6[residual.sample, 'hat_residual']
shapland.table.6[, sampled_incremental_loss_hat := random_residual_hat * sqrt(cl_incremental_estimate) + cl_incremental_estimate]
datatable(shapland.table.6[, .(w, d, incremental_loss, unscaled_pearson_residual, random_residual_hat, sampled_incremental_loss_hat)])

The chain ladder method, or equivalent GLM, can now be applied to the sampled data to produce a new point estimate:

glm.CL.sample = glm(sampled_incremental_loss_hat ~ w_factor + d_factor + 0, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.CL.sample)
## 
## Call:
## glm(formula = sampled_incremental_loss_hat ~ w_factor + d_factor + 
##     0, family = quasipoisson(link = "log"), data = shapland.table.6)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2757  -0.4511   0.0000   0.5570   2.2085  
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## w_factor1   4.8870     0.1028  47.522 4.11e-13 ***
## w_factor2   4.5816     0.1167  39.256 2.75e-12 ***
## w_factor3   4.8037     0.1067  45.001 7.06e-13 ***
## w_factor4   4.8200     0.1108  43.488 9.93e-13 ***
## w_factor5   4.8007     0.1157  41.498 1.58e-12 ***
## w_factor6   4.6456     0.1435  32.366 1.87e-11 ***
## d_factor2  -0.8332     0.1088  -7.656 1.73e-05 ***
## d_factor3  -1.8728     0.1831 -10.226 1.29e-06 ***
## d_factor4  -1.2899     0.1654  -7.800 1.47e-05 ***
## d_factor5  -3.4007     0.5351  -6.355 8.29e-05 ***
## d_factor6  -6.0079     2.5672  -2.340   0.0413 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 2.144985)
## 
##     Null deviance: 7653.714  on 21  degrees of freedom
## Residual deviance:   21.798  on 10  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

Estimates of future incremental losses can be produced, and summed to determine an overall reserve estimate:

d = 1:6
w = 1:6
unpaid.estimate = as.data.table(merge(d, w, by = NULL))
colnames(unpaid.estimate) = c("d", "w")
unpaid.estimate = unpaid.estimate[d+w > 7]
unpaid.estimate[, d_factor := as.factor(d)]
unpaid.estimate[, w_factor := as.factor(w)]
unpaid.estimate$GLM_prediction = predict(glm.CL.sample, newdata = unpaid.estimate, type = "response")
datatable(unpaid.estimate[,.(d, w, GLM_prediction)])

Reserves for each accident year are as follows:

simulated.reserves = unpaid.estimate[, .(reserve_estimate = sum(GLM_prediction)), by = w]
datatable(simulated.reserves[order(w)])

The process can be further improved by adding process variance, e.g. sampling from a Gamma distribution.

Assumptions underlying this method include:
  1. The residuals are independent and identically distributed. A normality assumption is not required, which is an advantage of this method.

Incorporating correlation into simulations

When simulations are performed by business segment, correlation between the segments must be considered.
  • Using location mapping, also known as synchronized bootstrapping, when residuals are sampled from the first business line, the location of the residual triangle is noted. For subsequent business segments, the residual at the same location is used.
    • This preserves the degree of correlation in the original residuals.
    • Requires all business segments to use data triangles that are exactly the same size, with no missing values or outliers.
    • No other correlation assumptions can be used for stress testing the aggregate results.
  • Re-sorting involves using copulas to combine the results.
    • This allows triangles to have different sizes and shapes
    • Different correlation assumptions, e.g. ones not in the data, can be used
    • Selection of copulas may have other beneficial effects on the simulation, e.g. strengthening tail correlation
    • Correlation matrix may be selected by calculating Spearman’s rank order correlation, based on the ranks of total unpaid claims for all accident years combined, then using this to select a correlation matrix judgmentally.

Bayesian Simulations

Bayesian methods provide a way of incorporating subjective expert opinion into a stochastic reserving model; in particular, they provide an approach to assessing the variation of the resulting distribution, which is not available when using judgment to adjust the results of a deterministic method. Examples of incorporating expert opinion include:
  • Changes in payment pattern due to company policy – intervention in a development factor in a particular row
  • Selection of how many years of data to use in the estimation
  • Legislative changes limiting benefits
  • Prior loss ratio assumption in Bornhuetter-Ferguson method – notably, this provides a way of generating the predictive distribution, which is not normally part of the BF method.
A key challenge in making these adjustments is that they generally disrupt the assumptions underling stochastic frameworks, and it is not clear what effect these adjustments have on the prediction errors. Advantages of the Bayesian approach include:
  • Prediction variance is estimated directly, rather than requiring separate estimates of process variance and parameter variance.
  • The full predictive distribution is provided, not just the moments.
  • Bayesian methods can assess the effect of using different sets of data on the uncertainty of the outcome.

Chain Ladder approach

Verrall considers a generalization of the chain ladder model in which there is a factor \(\lambda_{w,d}\) for each accident year and development age. (The standard chain ladder method uses \(\lambda_{w,d} = \lambda_d\) for all \(w\).) The form of this model is that \(q(w,d) | c(w,d), \lambda_{i,j}, \phi\) has the overdispersed negative binomial distribution, with \[ E[q(w,d)] = (\lambda_{i,j}-1) c(w,d-1) \] and \[ \mathrm{Var}(q(w,d)) = \varphi\lambda_{i,j}(\lambda_{i,j}-1) c(w, d-1) = \varphi \lambda_{i,j} E[q(w,d)] \] The general method of incorporating expert judgment into this process is:
  1. Set certain values for \(\lambda_{i,j}\) to be equal to each other. Examples include:
    • Set specific entries in a column equal to each other to represent an assumption that the payment pattern is the same in certain accident years.
    • To estimate using only the last \(k\) years, divide the parameters in each column into two groups, with one representing the last \(k\) years. Set all parameters within each group to be equal, and use a high variance on the prior mean.
  2. Specify a prior mean for each \(\lambda_{i,j}\). If no prior mean is desired, use a vague prior distribution.
  3. Set the variances of the prior distributions based on the level of confidence in the subjective selection. Large variances will result in the result being largely estimated from the data; small variances will tend to stay toward the prior mean.

The Gamma distribution, with mean \(\alpha/\beta\) and variance \(\alpha/\beta^2\) is a popular prior distribution, because it is easy to keep the mean constant while changing the variance: decrease \(\beta\) to increase variance, while decreasing \(\alpha\) to offset its effect on the mean.

A variation that allows for negative incremental values is to assume a normal distribution, and a variance assumption similar to Mack’s approach (i.e. proportional to prior cumulative losses, with a different constant of proportionality for each development age). Specifically, \[ E[q(w,d)] = (\lambda_j - 1) c(w,d-1) \] and \[ \mathrm{Var}(q(w,d)) = \varphi_j c(w, d-1) \]

Bornhuetter-Ferguson Approach

The Bornhuetter-Ferguson approach can be implemented by assuming that \(q(w,d) | x, y, \varphi\) is overdispersed Poisson with \[ E[q(w,d)] = x_w y_d \] and \(\sum_{1\leq d \leq n} y_d = 1\), and assuming that \(x_w \sim \Gamma(\alpha_w, \beta_w)\). The mean \(M_w = \alpha_w / \beta_w\) is selected to represent the prior expectation of loss, with \(\beta_w\) adjusted to represent the level of confidence in this assumption. (Smaller \(\beta_w\) correspond to lower confidence.)

It can be shown that the result is a credibility-weighted average of the chain ladder estimate \[ (\lambda_j - 1) c(w, d-1) \] against the prior estimate of the incremental loss \[ (\lambda_j - 1) \frac{M_w}{\lambda_j \lambda_{j+1} \cdots \lambda_n} \] applying a credibility factor to the chain ladder estimate equal to \[ Z_{w,d} = \frac{\sum_{1 \leq d \leq j-1} y_d}{\beta_w\varphi + \sum_{1\ \leq d \leq j-1} y_d} \] For the column parameters, improper prior distributions are used. Note that because we have freedom to select \(\beta_w\), the result is a range of models that transition continuously from the chain ladder method and the Bornhuetter-Ferguson technique.

Correlated Chain Ladder

This model assumes the follwing, given a premium \(P_w\) in accident year \(w\).
  1. The incremental loss \(C_{w,d}\) has a lognormal distribution with log mean \(\mu_{w,d}\) and log standard deviation \(\sigma_d\), where \(\sigma_d\) decreases for later development periods. This can be enforced by specifying that \(\sigma_d = \sum_{d \leq i \leq K} a_i\) where \(a_i \sim U(0,1)\).
  2. The log of the expected loss ratio is \(\ell \sim U(-1, 0.5)\)
  3. \(\alpha_w \sim N(\log(P_w) + \ell, \sqrt{10})\)
  4. \(\beta_d \sim U(-5,5)\)
  5. \(\mu_{1,d} = \alpha_1 + \beta_d\)
  6. \(\mu_{w,d} = \alpha_w + \beta_d + \rho(\log(C_{w-1, d}) - \mu_{w-1,d})\) when \(w> 1\). Here, \(\rho \sim U(-1, 1)\).

The sixth point allows for correlation between the accident years: the mean for a given year is adjusted upward or downward based on whether the previous year was above or below expectations. \(\rho\) is the correlation coefficient between the logs of the incremental losses in the adjacent accident years. When \(\rho=0\), this reduces to the Leveled Chain Ladder model. Note that the prior distributions can be adjusted to account for prior knowledge as well; for example, a Bornhuetter-Ferguson style approach would provide narrow distributions for \(\alpha_w\) and \(\ell\).

The overall process for implementing this model is:
  1. Generate a large sample of parameter sets from the posterior distribution using MCMC
  2. For each parameter sets, run a large number of simulations by sampling incremental losses from the lognormal distribution with the given parameters. Note that this has to be done starting from accident year 1 and working down in order to account for the correlation between accident years.
  3. For each parameter set, calculate summary statistics (mean, standard deviation) by accident year.
  4. Percentiles can be calculated by counting the number fo simulations that produced results that were less than or equal to the observed amount.

Due to the presence of accident year correlations, this method generally produces higher estimates of standard deviation than the traditional chain ladder method. This can be validated by looking at the posterior distribution of \(\rho\) to validate that it is generally positive. It generally performs better on incurred loss data than on paid loss data, indicating that the case reserves are providing useful information.

Correlated Incremental Trend

This model is based on incremental paid values \(I_{w,d}\). Since these are both skewed, and potentially negative, a mixed lognormal-normal distribution is assumed. This model incorporates a calendar year trend into the CCL model. The specifications of this model are:
  1. \(\mu_{w,d} = \alpha_w + \beta_d + \tau(w+d-1)\)
  2. \(Z_{w,d} \sim \mathrm{lognormal}(\mu_{w,d}, \sigma_d)\) with increasing \(\sigma_d\). The reason for this is because the losses are incremental, and more volatile losses will be settled later.
  3. \(I_{1,d} \sim \mathrm{normal}(Z, \delta)\)
  4. \(I_{w,d} \sim \mathrm{normal}(Z_{w,d} + \rho(I_{w-1, d} - Z_{w-1,d})e^{\tau}, \delta)\). The correlation must be applied outside the “log” space in order to allow for negative values, and the \(e^{\tau}\) factor ensures that the calendar year trend is applied to this difference.

Setting \(\rho=0\) produces the levelled incremental trend model. Prior distributions may need to be set so that \(\beta_d \sim U(0, \beta_{d-1})\) for \(d\) above a certain value in order to ensure that they decrease.

Changing Settlement Rate

This model reflects the idea that claims settlement may speed up over time due to use of technology. It introduces a parameter \(\gamma\), for which positive values represent a speedup in claim settlement: \[ \mu_{w,d} = \alpha_w + \beta_d (1 - \gamma)^{w-1} \] (This is a speedup since \(\beta_d\) is usually negative.)

Bayesian MCMC

The general approach to simulating posterior distributions is to use a Markov Chain Monte Carlo (MCMC) algorithm. This is based on establishing a Markov chain whose limiting distribution is the posterior distribution, and letting the chain run long enough that it begins to visit states with a frequency proportional to the posterior probabilities. It is based on the cumulative incurred losses \(C_{w,d}\).
  • Given a vector of parameters \(y\) and a set of observations \(x\), we specify the prior distribution \(p(y)\) and the conditional distribution \(f(x|y)\); the model produces \(f(y|x)\).
  • Given a current state \(y_{t-1}\), a proposed vector of parameters \(y^*\) is selected from the proposal distribution \(J(y_t | y_{t-1})\).
  • The proposal is either rejected (\(y_t = y_{t-1}\)) or accepted (\(y_t = y^*\)) based on the ratio of the posterior probability of the proposal to the posterior probability of \(y_{t-1}\); higher ratios are more likely to be accepted.
  • Initial iterations of the algorithm are the adaptive phase in which the proposal distribution is optimized.
  • The next phase is the burn-in phase in which the chain is run long enough for it to approach the limiting distribution.
  • After the burn-in phase, the actual parameter samples can be taken.
Appropriateness of the proposal distribution can be assessed by looking at trace plots which give the value of the parameter at each iteration.
  • If the acceptance rate is too high, there will be a high degree of autocorrelation between subsequent observations because \(y_t\) is very close to \(y_{t-1}\).
  • If the acceptance rate is too low, then there will be large “flat” areas in the graph (which again produces autocorrelation).
  • Appropriate acceptance rates are between the two extremes, avoiding autocorrelation.
A proxy for appropriate levels of autocorrelation is the acceptance rate; optimal values range form 25% for multiparameter models to 50% for single parameter models. Approaches to reducing autocorrelation include:
  • Thinning involves taking only a sample of the states visited, e.g. every \(n\)th sample.
  • Running multiple chains concurrently — an indicator of convergence is that all the chains “bounce around” in the same area.
  • When multiple chains are used, the within-chain variability \(W\) and the between-chain variability \(B\) can be assessed. The potential scale reduction factor (PSRF) is \(\sqrt{(W+B)/W}\); typical rules involve accepting convergence if this falls below a threshold such as 1.05.

Evaluation of Model Performance

Evaluation of Underlying Models

Residual Analysis

A key assumption of the ODP bootstrap model is that the residuals are independent and identically distributed. This can be tested by graphing residuals against:
  • Calendar year
  • Development age
  • Accident year
  • Fitted incremental loss
shapland.table.6[, calendar_year := w + d]
shapland.cy.summary = shapland.table.6[, .(avg_residual = mean(unscaled_pearson_residual, na.rm = FALSE)), by = calendar_year]
ggplot(data = shapland.table.6, aes(x = calendar_year, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.cy.summary, aes(x = calendar_year, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Calendar Year")
## Warning in plyr::split_indices(scale_id, n): '.Random.seed' is not an
## integer vector but of type 'NULL', so ignored

shapland.ay.summary = shapland.table.6[, .(avg_residual = mean(unscaled_pearson_residual, na.rm = FALSE)), by = w]
ggplot(data = shapland.table.6, aes(x = w, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.ay.summary, aes(x = w, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Accident Year")

shapland.dy.summary = shapland.table.6[, .(avg_residual = mean(unscaled_pearson_residual, na.rm = FALSE)), by = d]
ggplot(data = shapland.table.6, aes(x = d, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.dy.summary, aes(x = d, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Development Year")

ggplot(data = shapland.table.6, aes(x = GLM_prediction, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + labs(title = "Unscaled Pearson Residuals Residuals by Predicted Value")
Visual checks can assess for:
  • Unmodelled trends by accident year, caldendar year, or development period: look for patterns / trends in the residuals. Possible adjustment would be to add a parameter to the model.
  • Heteroscadicity: look for widening / narrowing bands of residuals. Indicates that a hetero-adjustment should be applied.

In either case, the graphs should be re-checked after the adjustment to validate that the issue has been addressed.

A possible concern is that the relatively flat green lines for development year and accident year may indicate that the model is overparametrized. One approach to addressing this is to start with a model containing a single parameter for each of accident year and development year, then parameters only as needed. (Note that the reading suggests adding a calendar year parameter, but since there is a linear dependency between these three variables, calendar year would be aliased.)

simple.glm = glm(incremental_loss ~ w + d, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(simple.glm)
## 
## Call:
## glm(formula = incremental_loss ~ w + d, family = quasipoisson(link = "log"), 
##     data = shapland.table.6)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.3334  -1.0733   0.1436   0.9471   3.1994  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.19406    0.19157  27.113 4.77e-16 ***
## w            0.02753    0.03750   0.734    0.472    
## d           -0.61684    0.06186  -9.971 9.34e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 3.700131)
## 
##     Null deviance: 665.616  on 20  degrees of freedom
## Residual deviance:  65.156  on 18  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4
shapland.table.6$simple_prediction = predict(simple.glm, type = "response")
shapland.table.6[, simple_residual := (incremental_loss - simple_prediction) / sqrt(simple_prediction)]

Now, see how the plots change:

shapland.ay.summary = shapland.table.6[, .(avg_residual = mean(simple_residual, na.rm = FALSE)), by = w]
ggplot(data = shapland.table.6, aes(x = w, y = simple_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.ay.summary, aes(x = w, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Accident Year")

shapland.dy.summary = shapland.table.6[, .(avg_residual = mean(simple_residual, na.rm = FALSE)), by = d]
ggplot(data = shapland.table.6, aes(x = d, y = simple_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.dy.summary, aes(x = d, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Development Year")

The accident year plot suggests that we could group accident years 2 and 3 together, as well as 4, 5, and 6. The development year plot suggests that periods 2 and 3 should be grouped together, as well as 4, 5, and 6. This can result in a model with fewer parameters than the original.

Heteroscedasticity

This test involves calculating the relative standard deviation by time period, and looking for possible groupings that could provide a basis for a hetero-adjustment:

overall.residual.sd = shapland.table.6[, sd(unscaled_pearson_residual)]
dev.period.residual.sd = shapland.table.6[, .(relative_sd = sd(unscaled_pearson_residual / overall.residual.sd)), by = d]
ggplot(data = dev.period.residual.sd, aes(x = d, y = relative_sd)) + geom_point() + labs(title = "Relative Standard Deviation of Residuals by Development Period")
## Warning: Removed 1 rows containing missing values (geom_point).

This suggests that the standard deviation in development age 1 is lower than it is in other periods, so one option is to use development period 1 as a hetero group on its own, with the rest combined into a single group. (Note that although 5 shows a high standard deviation, it’s based on only 2 data points.)

shapland.table.6[, hetero_group := ifelse(d == 1, 1, 2)]
hetero.adjustment = shapland.table.6[, .(hetero_adjustment = overall.residual.sd / sd(unscaled_pearson_residual)), by = hetero_group]
shapland.table.6 = merge(shapland.table.6, hetero.adjustment, by = "hetero_group", all.x = TRUE)
shapland.table.6[, adjusted_residuals := unscaled_pearson_residual * hetero_adjustment]
dev.period.adj.residual.sd = shapland.table.6[, .(relative_sd = sd(adjusted_residuals / overall.residual.sd)), by = d]
ggplot(data = dev.period.adj.residual.sd, aes(x = d, y = relative_sd)) + geom_point() + labs(title = "Relative Standard Deviation of Residuals by Development Period")
## Warning: Removed 1 rows containing missing values (geom_point).

This adjustment has brought the standard deviations more in line.

shapland.dy.summary = shapland.table.6[, .(avg_residual = mean(adjusted_residuals, na.rm = FALSE)), by = d]
ggplot(data = shapland.table.6, aes(x = d, y = adjusted_residuals)) + geom_point(colour = "blue") + geom_line(data = shapland.dy.summary, aes(x = d, y = avg_residual), colour = "green") + labs(title = "Adjusted Unscaled Pearson Residuals Residuals by Development Year")

Note that the distirbution in the first development period is now much less narrow than it was before.

Normality Plots

Although normality is not a requirement of the ODP residuals, tests for normality can still be used to compare two models by assessing the degree of skewness of the residuals. A visual test, the QQ plot against the normal distribution, is a plot in which the residuals are on the \(y\)-axis, and the corresponding normal quantile is on the \(x\)-axis:

residual.QQ.data = shapland.table.6[order(unscaled_pearson_residual)]
residual.QQ.data$residual_percentile = (1:(nrow(residual.QQ.data)) - 0.5) / (nrow(residual.QQ.data))
residual.QQ.data[, normal_quantile := qnorm(residual_percentile)]
ggplot(data = residual.QQ.data, aes(x = normal_quantile, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + geom_abline(intercept = 0, slope = 1, colour = "red")

Compare to the results after the hetero adjustment:

adj.residual.QQ.data = shapland.table.6[order(adjusted_residuals)]
adj.residual.QQ.data$residual_percentile = (1:(nrow(residual.QQ.data)) - 0.5) / (nrow(residual.QQ.data))
adj.residual.QQ.data[, normal_quantile := qnorm(residual_percentile)]
ggplot(data = adj.residual.QQ.data, aes(x = normal_quantile, y = adjusted_residuals)) + geom_point(colour = "blue") + geom_abline(intercept = 0, slope = 1, colour = "red")

The adjustment has brought the residuals slightly closer to normality in the left tail, but it is not a big improvement. The Kolmogorov-Smirnov or Anderson-Darling tests can also provide formal statistical tests of normality.

The AIC and BIC can also be used to compare models. In this context, they are defined in terms of the sum of squared differences between each residual and its normal counterpart, denoted \(\mathrm{RSS}\). (Note that this assesses how close the residuals are to normal, rather than how close the predictions are to actuals.) These include a penalty for the number of parameters, \(p\). The AIC is \[ \mathrm{AIC} = n(\log(2\pi \mathrm{RSS}/n) + 1) + 2p \] and the BIC is \[ \mathrm{BIC} = n \log(\mathrm{RSS}/n) + p\log(n) \]

Implied Development Patterns

Models can be compared by looking at their implied development patterns, both incremental and cumulative. Compare the basic ODP model with the simplified GLM with only one accident year and development year parameter.

implied.development = shapland.table.6[w == 1, .(d, GLM_prediction, simple_prediction)]
odp.total = implied.development[, sum(GLM_prediction)]
glm.total = implied.development[, sum(simple_prediction)]
implied.development[, odp_inc_pct := GLM_prediction / odp.total]
implied.development[, glm_inc_pct := simple_prediction / glm.total]
implied.development[, odp_cumulative_pct := cumsum(odp_inc_pct)]
implied.development[, glm_cumulative_pct := cumsum(glm_inc_pct)]
datatable(implied.development[, .(d, odp_inc_pct, odp_cumulative_pct, glm_inc_pct, glm_cumulative_pct)])
ggplot(data = implied.development, aes(x = d, y = odp_inc_pct)) + geom_line(colour = "green") + geom_line(aes(y = odp_cumulative_pct), colour = "blue") + labs(title = "Implied Development Pattern - ODP Model")

ggplot(data = implied.development, aes(x = d, y = glm_inc_pct)) + geom_line(colour = "green") + geom_line(aes(y = glm_cumulative_pct), colour = "blue") + labs(title = "Implied Development Pattern - Two-parameter GLM")

Visually, we can see how the two-parameter GLM has smoothed out the development pattern relative to the ODP / chain ladder method.

Evaluation of Simulation Results

Review of Summary statistics

Simulation summary statistics, such as mean unpaid, standard errors, coefficients of variation, maxima / minima, and percentiles should be reviewed for reasonability. Items to check include:
  • Standard error of unpaid claims should increase when moving from the olders years to the more recent years, because the unpaid amount should be greater in recent years.
  • Standard error for all years should be greater than any individual year.
  • Coefficients of variation should decrease when moving to more recent years, and the CV for all years should be smaller than any individual year. This is due to the independence of individual claims: in recent accident years, random variations are more likely to offset each other. (This is a decrease in process variation.)
  • The above trend is reversed when looking at cash flows on a calendar year basis, since in later calendar years the cash flows are smaller but more uncertain.
  • An exception to the above is that CV may start to increase in the most recent years, because parameter uncertainty begins to overwhelm process uncertainty. If this is the case, a Cape Cod of Bornhuetter-Ferguson model may be warranted.
  • Maxima / minima should be checked to ensure they are not implausible.
  • Mean incremental values and coefficients of variation should be consistent across development periods. Pay close attention to differences between historical and future simulated values.
  • Parameters of smooth distributions fit to the simulaiton results (normal, lognormal, gamma). These can assess the quality of fit or be used to parametrize a dynamic financial analysis model.

p-p plots

Similar to the tests of normality on the individual models, the distribution simulation as a whole can be tested against the assumption that the percentiles of observed data, under the distribution produced by the model, are uniformly distributed. The expected percentiles are \[ \frac{1}{n+1}, \frac{2}{n+1}, \ldots, \frac{n}{n+1} \] and are plotted on the horizontal axis. The sorted predicted percentiles are plotted on the vertical axis. A more formal statistal test, Kolmogorov-Smirnov, can be performed. The test statistic is \[ D = \max|p_i - f_i| \] where \(p_i\) are the model percentiles and \(f_i\) are the expected percentiles. The critical value is \(1.36 / \sqrt{n}\), where \(n\) is the number of observations.

The simulated data below can illustrate the relationship between the qualitative shape of the graph and the distribution. I’ll assume the observed follow a Gamma distribution with mean 30 and variance 180, then assess the performance of the following models:
  • A gamma distribution with the same mean and variance – the “correct” model
  • A gamma distribution with the same variance, but a mean of 40 – a model that is biased high
  • A gamma distribution with the same variance, but a mean of 20 – a model that is biased low
  • A gamma distribution with the same mean but a variance that is too small – a model that has a lighter tail than the observed
  • An gamma distribution with the same mean but a variance that is too large – a model that has a heavier tail than the observed
  • The black line indicates a 45-degree angle, and the red lines indicate the Kolmogorov-Smirnov bounds. The simulated gamma has a larger mean than the gamma to which it is being compared.

    set.seed(123)
    pp.plot.data = data.table(simulation_number = 1:1000)
    pp.plot.data[, observed := rgamma(1000, shape = 5, scale = 6)]
    pp.plot.data[, small_gamma_percentile := pgamma(observed, shape = 20/9, scale = 9)]
    pp.plot.data[, medium_gamma_percentile := pgamma(observed, shape = 5, scale = 6)]
    pp.plot.data[, large_gamma_percentile := pgamma(observed, shape = 80/9, scale = 9/2)]
    pp.plot.data[, light_tail_gamma_percentile := pgamma(observed, shape = 10, scale = 3)]
    pp.plot.data[, heavy_tail_gamma_percentile := pgamma(observed, shape = 30/12, scale = 12)]
    datatable(pp.plot.data)
    pp.plot = function(percentiles) {
      n = length(percentiles)
      graph.data = data.table(percentile = percentiles)[order(percentiles)]
      graph.data$percentile_rank = 1:n
      graph.data[, expected_percentile := percentile_rank / (n + 1)]
      pp.plot = ggplot(data = graph.data, aes(x = expected_percentile, y = percentile)) + geom_point() + geom_abline(intercept = 0, slope = 1) + geom_abline(intercept = 1.36 / sqrt(n), slope = 1, colour = "red") + geom_abline(intercept = -1.36 / sqrt(n), colour = "red")
      histogram = ggplot(data = graph.data, aes(x = percentile)) + geom_histogram(binwidth = 0.1, center = 0.05)
      ks_value = graph.data[, max(abs(expected_percentile - percentile))]
      return(list("pp.plot" = pp.plot, "histogram" = histogram, "ks_value" = ks_value, "ks_crit" = 1.36 / sqrt(n)))
    }

    For a well-fitting model, the percentiles should be uniformly distributed:

    result = (pp.plot(pp.plot.data$medium_gamma_percentile))
    plot(result$histogram)

    plot(result$pp.plot)

    print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
    ## [1] "The Kolmogorov-Statistic is 0.0408437644594914 and the critical value is 0.04300697617829"

    If the model is biased high, then the observed values will be in low percentiles too often:

    result = (pp.plot(pp.plot.data$large_gamma_percentile))
    plot(result$histogram)

    plot(result$pp.plot)

    print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
    ## [1] "The Kolmogorov-Statistic is 0.354580925461175 and the critical value is 0.04300697617829"

    If the model is biased low, then the observed values will appear in large percentiles too often:

    result = (pp.plot(pp.plot.data$small_gamma_percentile))
    plot(result$histogram)

    plot(result$pp.plot)

    print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
    ## [1] "The Kolmogorov-Statistic is 0.331435986121301 and the critical value is 0.04300697617829"

    If the model is too light in the tails, then the observed values will occur in large and small percentiles too often. In this case, the confidence intervals produced by the model will be too small.

    result = (pp.plot(pp.plot.data$light_tail_gamma_percentile))
    plot(result$histogram)

    plot(result$pp.plot)

    print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
    ## [1] "The Kolmogorov-Statistic is 0.129961155261367 and the critical value is 0.04300697617829"

    If the model is too heavy in the tails, then the observed values will occur in middle percentiles too often. In this case, the confidence intervals produced by the model will be too wide.

    result = (pp.plot(pp.plot.data$heavy_tail_gamma_percentile))
    plot(result$histogram)

    plot(result$pp.plot)

    print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
    ## [1] "The Kolmogorov-Statistic is 0.105180487856205 and the critical value is 0.04300697617829"

Use of Multiple Models

Multiple stochastic models can be combined by assigning a weight to the resluts of each model. Approaches to selecting the weights include:
  • Based on a qualitative assessment of the merits of each model
  • Bayesian methods may be used to weight the models based on the results of their forecasts, i.e. assign weights to each candidate model and view the weights as parameters. If the posterior distribution shows significant variability, this is an indicator of model risk.
Broadly speaking, there are two approaches to running the simulations when multiple models are used:
  • Run each model with exactly the same random variables
  • Run models with independent random variables

The weights are used to randomly sample the specified percentage of iterations from each simulation.

Results may need to be “shifted” if case reserves are not consistent with the calculated IBNR. (For example, positive case reserves but negative calculated IBNR means we should consider whether or not the case reserves are likely to be redundant.) This can be done additively (maintain the shape and width of the distribution) or multiplicatively (maintain shape, but change the width).