These study notes are based on three Exam 7 syllabus readings related to stochastic methods for loss reserving: Using the ODP Bootstrap Model: A Practicioner’s Guide by Mark Shapland, Obtaining Predictive Distributions for Reserves Which Incorporate Expert Opinion by R.J. Verrall, and Stochastic Loss Reserving Using Bayesian MCMC Models by Glenn Meyers.
easypackages::packages("data.table", "DT", "ggplot2", "statmod")
## Loading required package: data.table
## Loading required package: DT
## Loading required package: ggplot2
## Loading required package: statmod
## All packages loaded successfully
This approach involves specifying a Generalized Linear model whose results match those from the traditional chain ladder method. The correspondence relies on identifying an implicit assumption behind the chain ladder method: that each accident year has a parameter representing its relative level, namely, the cumulative loss on the most recent diagonal. A variation on the chain ladder method assumes that accident years are homogeneous, and the level parameter for each accident year can be estimate as the column averages: \[ \frac{\sum_{1\leq w \leq n-d+1} c(w,d)}{n-d+1} \] These estimates would replace the actual observed values along the diagonal when calculating estimates.
The Over-Dispersed Poisson Model models incremental claims using a GLM with a log link function and overdispersed Poisson error distribution. In other words, \[ E[q(w,d)] = m_{w,d} \] \[ \mathrm{Var}(q(w,d)) = \phi m_{w,d}^z \] and \[ \log(m_{w,d}) = \eta_{w,d} = c + \alpha_w + \beta_d \] where the parameters \(\alpha\) denote the relative level of each accident year, and \(\beta\) denotes the development pattern. (Note that in this formulation, \(\beta_d\) is treated as a cumulative factor. Shapland’s model matrices suggest that it is actually being treated as an incremental factor, but the resulting models should be the same.)Note that due to the log link function, the ODP bootstrap model is equivalent to a model in which \(q(w,d)\) has mean \(x_wy_d\), where by normalization we can assume \(\sum_{1\leq d \leq n} y_d = 1\). In this formulation, \(x_w\) represents the expected ultimate cumulative losses for accident year \(w\), and \(y_d\) is the proportion of losses that emerge in each development period. The overdispersed Poisson assumption can also be replaced with an assumption of an overdispersed negative binomial distribution
The modelling approach can be illustrated using the following example from one of Shapland’s spreadsheets:
shapland.table.6 = fread("Stochastic Loss Reserving/shapland-6_year_table.csv")
shapland.table.6[, d_factor := as.factor(d)]
shapland.table.6[, w_factor := as.factor(w)]
datatable(shapland.table.6)
Fitting the model without an intercept, and assuming that \(\beta_d\) are cumulative factors, the model is as follows:
glm.CL.cumulative = glm(incremental_loss ~ w_factor + d_factor + 0, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.CL.cumulative)
##
## Call:
## glm(formula = incremental_loss ~ w_factor + d_factor + 0, family = quasipoisson(link = "log"),
## data = shapland.table.6)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0493 -1.3869 0.3249 0.7592 1.6734
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## w_factor1 4.6929 0.1249 37.576 4.25e-12 ***
## w_factor2 4.6693 0.1261 37.033 4.91e-12 ***
## w_factor3 4.7175 0.1248 37.792 4.01e-12 ***
## w_factor4 4.7301 0.1291 36.636 5.46e-12 ***
## w_factor5 4.7791 0.1320 36.204 6.14e-12 ***
## w_factor6 4.8283 0.1474 32.761 1.66e-11 ***
## d_factor2 -0.8473 0.1271 -6.665 5.60e-05 ***
## d_factor3 -1.5892 0.1896 -8.384 7.79e-06 ***
## d_factor4 -1.4100 0.2025 -6.963 3.89e-05 ***
## d_factor5 -2.3786 0.3809 -6.244 9.58e-05 ***
## d_factor6 -3.0834 0.7474 -4.125 0.00206 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 2.715116)
##
## Null deviance: 7375.114 on 21 degrees of freedom
## Residual deviance: 27.969 on 10 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
In this case, model predictions can be produced by exponentiating the sum of the corresponding \(w\) factor and \(d\) factor:
shapland.table.6[, GLM_prediction := predict(glm.CL.cumulative, type = "response")]
datatable(shapland.table.6[, .(w, d, incremental_loss, GLM_prediction)])
For example, the prediction for accident year 2 and development age 3 can be calculated as follows:
alpha_2 = glm.CL.cumulative$coefficients['w_factor2']
beta_3 = glm.CL.cumulative$coefficients['d_factor3']
q_2_3_estimate = exp(alpha_2 + beta_3)
print(paste0("The estimate for q(2,3) is ", q_2_3_estimate))
## [1] "The estimate for q(2,3) is 21.7607604017224"
The formulation involving incremental factors requires indicator variables for each age that is less than or equal to the age of the observation – in other words, it is equal to one for each incremental development factor that contributes to the cumulative factor.
shapland.table.6[, beta_2_ind := ifelse(d >= 2, 1, 0)]
shapland.table.6[, beta_3_ind := ifelse(d >= 3, 1, 0)]
shapland.table.6[, beta_4_ind := ifelse(d >= 4, 1, 0)]
shapland.table.6[, beta_5_ind := ifelse(d >= 5, 1, 0)]
shapland.table.6[, beta_6_ind := ifelse(d >= 6, 1, 0)]
datatable(shapland.table.6[,.(w, d, incremental_loss, beta_2_ind, beta_3_ind, beta_4_ind, beta_5_ind, beta_6_ind)])
The model using this specification is:
glm.CL.incremental = glm(incremental_loss ~ w_factor + beta_2_ind + beta_3_ind + beta_4_ind + beta_5_ind + beta_6_ind + 0, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.CL.incremental)
##
## Call:
## glm(formula = incremental_loss ~ w_factor + beta_2_ind + beta_3_ind +
## beta_4_ind + beta_5_ind + beta_6_ind + 0, family = quasipoisson(link = "log"),
## data = shapland.table.6)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0493 -1.3869 0.3249 0.7592 1.6734
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## w_factor1 4.6929 0.1249 37.576 4.25e-12 ***
## w_factor2 4.6693 0.1261 37.033 4.91e-12 ***
## w_factor3 4.7175 0.1248 37.792 4.01e-12 ***
## w_factor4 4.7301 0.1291 36.636 5.46e-12 ***
## w_factor5 4.7791 0.1320 36.204 6.14e-12 ***
## w_factor6 4.8283 0.1474 32.761 1.66e-11 ***
## beta_2_ind -0.8473 0.1271 -6.665 5.60e-05 ***
## beta_3_ind -0.7419 0.2059 -3.603 0.00482 **
## beta_4_ind 0.1792 0.2558 0.701 0.49948
## beta_5_ind -0.9686 0.4147 -2.336 0.04166 *
## beta_6_ind -0.7048 0.8277 -0.852 0.41436
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 2.715116)
##
## Null deviance: 7375.114 on 21 degrees of freedom
## Residual deviance: 27.969 on 10 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
Predictions from this model are as follows. (The “hat values” are explained later, but are produced now for convenience.)
shapland.table.6[, GLM_prediction_v2 := predict(glm.CL.incremental, type = "response")]
shapland.table.6[, GLM_hat := hatvalues(glm.CL.incremental)]
datatable(shapland.table.6[, .(w, d, incremental_loss, GLM_prediction, GLM_prediction_v2)])
Note that, rounding differences aside, both formulations produce the same predictions.
To illustrate this approach, the age-to-ulimate factors calculated by Shapland for this table are:
age.to.ultimate.6 = data.table(d = 1:6, age_to_ult = c(2.016, 1.411, 1.234, 1.073, 1.023, 1.00))
datatable(age.to.ultimate.6)
Join these to the table and identify the values corresponding to the most recent diagonal:
shapland.table.6 = merge(shapland.table.6, age.to.ultimate.6, by = "d", all.x = TRUE)
latest.diagonal.6 = shapland.table.6[w + d == 7, .(w, latest_diagonal_loss = cumulative_loss, latest_diagonal_LDF = age_to_ult)]
shapland.table.6 = merge(shapland.table.6, latest.diagonal.6, by = "w")
shapland.table.6[, cl_cumulative_estimate := latest_diagonal_loss * latest_diagonal_LDF / age_to_ult]
shapland.table.6[, cl_prev_cumulative_estimate := shift(cl_cumulative_estimate, type = "lag"), by = "w"]
shapland.table.6[is.na(cl_prev_cumulative_estimate), cl_prev_cumulative_estimate := 0]
shapland.table.6[, cl_incremental_estimate := cl_cumulative_estimate - cl_prev_cumulative_estimate]
datatable(shapland.table.6[, .(w, d, cumulative_loss, age_to_ult, latest_diagonal_loss, latest_diagonal_LDF, cl_cumulative_estimate, cl_incremental_estimate, GLM_prediction)])
The main advantages to using this approach, referred to as the “simplified GLM” or “ODP bootstrap” are that:
As an example, we can produce a model in which all accident years are at the same level, letting the intercept represent this common parameter.
glm.homogeneous.AY = glm(incremental_loss ~ beta_2_ind + beta_3_ind + beta_4_ind + beta_5_ind + beta_6_ind, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.homogeneous.AY)
##
## Call:
## glm(formula = incremental_loss ~ beta_2_ind + beta_3_ind + beta_4_ind +
## beta_5_ind + beta_6_ind, family = quasipoisson(link = "log"),
## data = shapland.table.6)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9724 -1.1892 0.2867 0.9872 1.6665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.73766 0.05371 88.207 < 2e-16 ***
## beta_2_ind -0.86646 0.10545 -8.217 6.18e-07 ***
## beta_3_ind -0.75769 0.17375 -4.361 0.000559 ***
## beta_4_ind 0.16990 0.21601 0.787 0.443795
## beta_5_ind -0.98083 0.35143 -2.791 0.013708 *
## beta_6_ind -0.69315 0.70287 -0.986 0.339692
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 1.976119)
##
## Null deviance: 665.616 on 20 degrees of freedom
## Residual deviance: 30.367 on 15 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
Another approach is to use a single parameter for all development periods, which is equivalent to fitting \(d - 1\) as a continuous variable:
shapland.table.6[, d_shifted := d - 1]
glm.one.dev.parameter = glm(incremental_loss ~ w_factor + d_shifted + 0, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.one.dev.parameter)
##
## Call:
## glm(formula = incremental_loss ~ w_factor + d_shifted + 0, family = quasipoisson(link = "log"),
## data = shapland.table.6)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9705 -1.0970 0.0000 0.8852 3.1063
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## w_factor1 4.63924 0.16102 28.811 7.29e-14 ***
## w_factor2 4.61487 0.16153 28.570 8.18e-14 ***
## w_factor3 4.65711 0.15810 29.457 5.37e-14 ***
## w_factor4 4.61339 0.16331 28.249 9.56e-14 ***
## w_factor5 4.70300 0.16629 28.282 9.40e-14 ***
## w_factor6 4.82831 0.19185 25.167 4.68e-13 ***
## d_shifted -0.61329 0.06903 -8.885 3.96e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 4.600753)
##
## Null deviance: 7375.114 on 21 degrees of freedom
## Residual deviance: 62.894 on 14 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
Drawbacks of this approach include:
A log link GLM cannot handle negative response values, so one approach would to be to redefine the link function for negative values as \[ -\log(|q(w,d)|) \] and leaving it unchanged for positive values. This approach works when negative values are relatively rare; in particular, it won’t work if the sum of a column is negative. (For an incurred triangle, it is entirely possible for entire columns of incremental values to be negative in which case the GLM won’t be able to fit even with this transformation.) In this case, one approach would be to define \(\Psi\) to be the largest negative value, and adjust the response variable to \[ q^+(w,d) = q(w,d) - \Psi \] This transformation should be undone after predictions are made: \[ m_{w,d} = m_{w,d}^+ + \Psi \] Since fitted values can also be negative, then the standard deviation portion of the residuals will need to be replaced with the square root of the absolute value: \[ r_{w,d} = \frac{q(w,d) - m_{w,d}}{\sqrt{|m_{w,d}^z|}} \] The same adjustment must be made to the sampled triangle: \[ q^*(w,d) = r^* \sqrt{|m_{w,d}^z|} + m_{w,d} \]
The scale parameter for a group \(G\) of size \(n_i\) can be calculated as the average of squared degree-of-freedom adjusted residuals: \[ \phi_i = \frac{\sum_{i \in G} \left( \sqrt{\frac{N}{N-p}} r_{w,d}\right)^2}{n_i} \]
For the ODP bootstrap method, the residuals used are the unscaled Pearson residuals, which are the same as the usual chain ladder residuals: \[ r_{w,d} = \frac{q(w,d) - m_{w,d}}{\sqrt{m_{w,d}^z}} \] The scale parameter can be calculated as in Clark’s paper, using a \(chi^2\) formula: \[ \phi = \frac{\sum r_{w,d}^2}{N - p} \] where \(N\) is the number of data points, and \(p\) is the number of parameters in the model, \(2(n-1) + 1\) in the case of the chain ladder approach.
The method can be applied to Shapland’s 6-year table as follows:
shapland.table.6[, unscaled_pearson_residual := (incremental_loss - cl_incremental_estimate) / sqrt(cl_incremental_estimate)]
shapland.table.6[, GLM_residual_pearson := residuals(glm.CL.cumulative, type = "pearson")]
datatable(shapland.table.6[,.(w, d, incremental_loss, cl_incremental_estimate, unscaled_pearson_residual, GLM_residual_pearson)])
The scale parameter can be calculated as follows:
N = 21
p = 11
phi = shapland.table.6[, sum(unscaled_pearson_residual^2)] / (N-p)
print(paste0("The scale parameter is ", phi))
## [1] "The scale parameter is 2.7179846116338"
For illustrative purposes, fix the following random sequence of residuals:
residual.sample = c(15, 12, 16, 8, 8, 18, 9, 10, 7, 9, 12, 14, 14, 11, 10, 4, 14, 7, 13, 11, 11)
shapland.table.6 = shapland.table.6[order(d, w)]
shapland.table.6$random_residual = shapland.table.6[residual.sample, 'unscaled_pearson_residual']
shapland.table.6[, sampled_incremental_loss := random_residual * sqrt(cl_incremental_estimate) + cl_incremental_estimate]
datatable(shapland.table.6[, .(w, d, incremental_loss, unscaled_pearson_residual, random_residual, sampled_incremental_loss)])
The framework can be improved by using different types of residuals. Scaled Pearson residuals are multiplied by a degree-of-freedom factor \[ f^{DoF} = \sqrt{\frac{N}{N-p}} \] so we obtain \[ r_{w,d}^S = \frac{q(w,d) - m_{w,d}}{\sqrt{m_{w,d}^z}} f^{DoF} \] However, these residuals are not standardized, meaning that they do not all have the same variance. To achieve this, we multiply the residuals by \[ f^H_{w,d} = \sqrt{\frac{1}{1 - H_{i,i}}} \] where \(H_{i,i}\) are the diagonals of the “hat” matrix from the GLM. This produces the standardized Pearson residuals: \[ r_{w,d}^H = \frac{q(w,d) - m_{w,d}}{\sqrt{m_{w,d}^z}} f^H_{w,d} \] The scale parameter can be approxmiated as the average square of these residuals: \[ \phi^H = \frac{\sum (r_{w,d}^H)^2}{N} \] The above method can be repeated using the “hat” residuals:
shapland.table.6[, hat_residual := unscaled_pearson_residual * sqrt(1 / (1 - GLM_hat))]
datatable(shapland.table.6[,.(w, d, incremental_loss, cl_incremental_estimate, unscaled_pearson_residual, hat_residual)])
shapland.table.6 = shapland.table.6[order(d, w)]
shapland.table.6$random_residual_hat = shapland.table.6[residual.sample, 'hat_residual']
shapland.table.6[, sampled_incremental_loss_hat := random_residual_hat * sqrt(cl_incremental_estimate) + cl_incremental_estimate]
datatable(shapland.table.6[, .(w, d, incremental_loss, unscaled_pearson_residual, random_residual_hat, sampled_incremental_loss_hat)])
The chain ladder method, or equivalent GLM, can now be applied to the sampled data to produce a new point estimate:
glm.CL.sample = glm(sampled_incremental_loss_hat ~ w_factor + d_factor + 0, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(glm.CL.sample)
##
## Call:
## glm(formula = sampled_incremental_loss_hat ~ w_factor + d_factor +
## 0, family = quasipoisson(link = "log"), data = shapland.table.6)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2757 -0.4511 0.0000 0.5570 2.2085
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## w_factor1 4.8870 0.1028 47.522 4.11e-13 ***
## w_factor2 4.5816 0.1167 39.256 2.75e-12 ***
## w_factor3 4.8037 0.1067 45.001 7.06e-13 ***
## w_factor4 4.8200 0.1108 43.488 9.93e-13 ***
## w_factor5 4.8007 0.1157 41.498 1.58e-12 ***
## w_factor6 4.6456 0.1435 32.366 1.87e-11 ***
## d_factor2 -0.8332 0.1088 -7.656 1.73e-05 ***
## d_factor3 -1.8728 0.1831 -10.226 1.29e-06 ***
## d_factor4 -1.2899 0.1654 -7.800 1.47e-05 ***
## d_factor5 -3.4007 0.5351 -6.355 8.29e-05 ***
## d_factor6 -6.0079 2.5672 -2.340 0.0413 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 2.144985)
##
## Null deviance: 7653.714 on 21 degrees of freedom
## Residual deviance: 21.798 on 10 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
Estimates of future incremental losses can be produced, and summed to determine an overall reserve estimate:
d = 1:6
w = 1:6
unpaid.estimate = as.data.table(merge(d, w, by = NULL))
colnames(unpaid.estimate) = c("d", "w")
unpaid.estimate = unpaid.estimate[d+w > 7]
unpaid.estimate[, d_factor := as.factor(d)]
unpaid.estimate[, w_factor := as.factor(w)]
unpaid.estimate$GLM_prediction = predict(glm.CL.sample, newdata = unpaid.estimate, type = "response")
datatable(unpaid.estimate[,.(d, w, GLM_prediction)])
Reserves for each accident year are as follows:
simulated.reserves = unpaid.estimate[, .(reserve_estimate = sum(GLM_prediction)), by = w]
datatable(simulated.reserves[order(w)])
The process can be further improved by adding process variance, e.g. sampling from a Gamma distribution.
Assumptions underlying this method include:The Gamma distribution, with mean \(\alpha/\beta\) and variance \(\alpha/\beta^2\) is a popular prior distribution, because it is easy to keep the mean constant while changing the variance: decrease \(\beta\) to increase variance, while decreasing \(\alpha\) to offset its effect on the mean.
A variation that allows for negative incremental values is to assume a normal distribution, and a variance assumption similar to Mack’s approach (i.e. proportional to prior cumulative losses, with a different constant of proportionality for each development age). Specifically, \[ E[q(w,d)] = (\lambda_j - 1) c(w,d-1) \] and \[ \mathrm{Var}(q(w,d)) = \varphi_j c(w, d-1) \]
The Bornhuetter-Ferguson approach can be implemented by assuming that \(q(w,d) | x, y, \varphi\) is overdispersed Poisson with \[ E[q(w,d)] = x_w y_d \] and \(\sum_{1\leq d \leq n} y_d = 1\), and assuming that \(x_w \sim \Gamma(\alpha_w, \beta_w)\). The mean \(M_w = \alpha_w / \beta_w\) is selected to represent the prior expectation of loss, with \(\beta_w\) adjusted to represent the level of confidence in this assumption. (Smaller \(\beta_w\) correspond to lower confidence.)
It can be shown that the result is a credibility-weighted average of the chain ladder estimate \[ (\lambda_j - 1) c(w, d-1) \] against the prior estimate of the incremental loss \[ (\lambda_j - 1) \frac{M_w}{\lambda_j \lambda_{j+1} \cdots \lambda_n} \] applying a credibility factor to the chain ladder estimate equal to \[ Z_{w,d} = \frac{\sum_{1 \leq d \leq j-1} y_d}{\beta_w\varphi + \sum_{1\ \leq d \leq j-1} y_d} \] For the column parameters, improper prior distributions are used. Note that because we have freedom to select \(\beta_w\), the result is a range of models that transition continuously from the chain ladder method and the Bornhuetter-Ferguson technique.
This model reflects the idea that claims settlement may speed up over time due to use of technology. It introduces a parameter \(\gamma\), for which positive values represent a speedup in claim settlement: \[ \mu_{w,d} = \alpha_w + \beta_d (1 - \gamma)^{w-1} \] (This is a speedup since \(\beta_d\) is usually negative.)
shapland.table.6[, calendar_year := w + d]
shapland.cy.summary = shapland.table.6[, .(avg_residual = mean(unscaled_pearson_residual, na.rm = FALSE)), by = calendar_year]
ggplot(data = shapland.table.6, aes(x = calendar_year, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.cy.summary, aes(x = calendar_year, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Calendar Year")
## Warning in plyr::split_indices(scale_id, n): '.Random.seed' is not an
## integer vector but of type 'NULL', so ignored
shapland.ay.summary = shapland.table.6[, .(avg_residual = mean(unscaled_pearson_residual, na.rm = FALSE)), by = w]
ggplot(data = shapland.table.6, aes(x = w, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.ay.summary, aes(x = w, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Accident Year")
shapland.dy.summary = shapland.table.6[, .(avg_residual = mean(unscaled_pearson_residual, na.rm = FALSE)), by = d]
ggplot(data = shapland.table.6, aes(x = d, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.dy.summary, aes(x = d, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Development Year")
ggplot(data = shapland.table.6, aes(x = GLM_prediction, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + labs(title = "Unscaled Pearson Residuals Residuals by Predicted Value")
In either case, the graphs should be re-checked after the adjustment to validate that the issue has been addressed.
A possible concern is that the relatively flat green lines for development year and accident year may indicate that the model is overparametrized. One approach to addressing this is to start with a model containing a single parameter for each of accident year and development year, then parameters only as needed. (Note that the reading suggests adding a calendar year parameter, but since there is a linear dependency between these three variables, calendar year would be aliased.)
simple.glm = glm(incremental_loss ~ w + d, data = shapland.table.6, family = quasipoisson(link = "log"))
summary(simple.glm)
##
## Call:
## glm(formula = incremental_loss ~ w + d, family = quasipoisson(link = "log"),
## data = shapland.table.6)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3334 -1.0733 0.1436 0.9471 3.1994
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.19406 0.19157 27.113 4.77e-16 ***
## w 0.02753 0.03750 0.734 0.472
## d -0.61684 0.06186 -9.971 9.34e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 3.700131)
##
## Null deviance: 665.616 on 20 degrees of freedom
## Residual deviance: 65.156 on 18 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
shapland.table.6$simple_prediction = predict(simple.glm, type = "response")
shapland.table.6[, simple_residual := (incremental_loss - simple_prediction) / sqrt(simple_prediction)]
Now, see how the plots change:
shapland.ay.summary = shapland.table.6[, .(avg_residual = mean(simple_residual, na.rm = FALSE)), by = w]
ggplot(data = shapland.table.6, aes(x = w, y = simple_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.ay.summary, aes(x = w, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Accident Year")
shapland.dy.summary = shapland.table.6[, .(avg_residual = mean(simple_residual, na.rm = FALSE)), by = d]
ggplot(data = shapland.table.6, aes(x = d, y = simple_residual)) + geom_point(colour = "blue") + geom_line(data = shapland.dy.summary, aes(x = d, y = avg_residual), colour = "green") + labs(title = "Unscaled Pearson Residuals Residuals by Development Year")
The accident year plot suggests that we could group accident years 2 and 3 together, as well as 4, 5, and 6. The development year plot suggests that periods 2 and 3 should be grouped together, as well as 4, 5, and 6. This can result in a model with fewer parameters than the original.
This test involves calculating the relative standard deviation by time period, and looking for possible groupings that could provide a basis for a hetero-adjustment:
overall.residual.sd = shapland.table.6[, sd(unscaled_pearson_residual)]
dev.period.residual.sd = shapland.table.6[, .(relative_sd = sd(unscaled_pearson_residual / overall.residual.sd)), by = d]
ggplot(data = dev.period.residual.sd, aes(x = d, y = relative_sd)) + geom_point() + labs(title = "Relative Standard Deviation of Residuals by Development Period")
## Warning: Removed 1 rows containing missing values (geom_point).
This suggests that the standard deviation in development age 1 is lower than it is in other periods, so one option is to use development period 1 as a hetero group on its own, with the rest combined into a single group. (Note that although 5 shows a high standard deviation, it’s based on only 2 data points.)
shapland.table.6[, hetero_group := ifelse(d == 1, 1, 2)]
hetero.adjustment = shapland.table.6[, .(hetero_adjustment = overall.residual.sd / sd(unscaled_pearson_residual)), by = hetero_group]
shapland.table.6 = merge(shapland.table.6, hetero.adjustment, by = "hetero_group", all.x = TRUE)
shapland.table.6[, adjusted_residuals := unscaled_pearson_residual * hetero_adjustment]
dev.period.adj.residual.sd = shapland.table.6[, .(relative_sd = sd(adjusted_residuals / overall.residual.sd)), by = d]
ggplot(data = dev.period.adj.residual.sd, aes(x = d, y = relative_sd)) + geom_point() + labs(title = "Relative Standard Deviation of Residuals by Development Period")
## Warning: Removed 1 rows containing missing values (geom_point).
This adjustment has brought the standard deviations more in line.
shapland.dy.summary = shapland.table.6[, .(avg_residual = mean(adjusted_residuals, na.rm = FALSE)), by = d]
ggplot(data = shapland.table.6, aes(x = d, y = adjusted_residuals)) + geom_point(colour = "blue") + geom_line(data = shapland.dy.summary, aes(x = d, y = avg_residual), colour = "green") + labs(title = "Adjusted Unscaled Pearson Residuals Residuals by Development Year")
Note that the distirbution in the first development period is now much less narrow than it was before.
Although normality is not a requirement of the ODP residuals, tests for normality can still be used to compare two models by assessing the degree of skewness of the residuals. A visual test, the QQ plot against the normal distribution, is a plot in which the residuals are on the \(y\)-axis, and the corresponding normal quantile is on the \(x\)-axis:
residual.QQ.data = shapland.table.6[order(unscaled_pearson_residual)]
residual.QQ.data$residual_percentile = (1:(nrow(residual.QQ.data)) - 0.5) / (nrow(residual.QQ.data))
residual.QQ.data[, normal_quantile := qnorm(residual_percentile)]
ggplot(data = residual.QQ.data, aes(x = normal_quantile, y = unscaled_pearson_residual)) + geom_point(colour = "blue") + geom_abline(intercept = 0, slope = 1, colour = "red")
Compare to the results after the hetero adjustment:
adj.residual.QQ.data = shapland.table.6[order(adjusted_residuals)]
adj.residual.QQ.data$residual_percentile = (1:(nrow(residual.QQ.data)) - 0.5) / (nrow(residual.QQ.data))
adj.residual.QQ.data[, normal_quantile := qnorm(residual_percentile)]
ggplot(data = adj.residual.QQ.data, aes(x = normal_quantile, y = adjusted_residuals)) + geom_point(colour = "blue") + geom_abline(intercept = 0, slope = 1, colour = "red")
The adjustment has brought the residuals slightly closer to normality in the left tail, but it is not a big improvement. The Kolmogorov-Smirnov or Anderson-Darling tests can also provide formal statistical tests of normality.
The AIC and BIC can also be used to compare models. In this context, they are defined in terms of the sum of squared differences between each residual and its normal counterpart, denoted \(\mathrm{RSS}\). (Note that this assesses how close the residuals are to normal, rather than how close the predictions are to actuals.) These include a penalty for the number of parameters, \(p\). The AIC is \[ \mathrm{AIC} = n(\log(2\pi \mathrm{RSS}/n) + 1) + 2p \] and the BIC is \[ \mathrm{BIC} = n \log(\mathrm{RSS}/n) + p\log(n) \]
Models can be compared by looking at their implied development patterns, both incremental and cumulative. Compare the basic ODP model with the simplified GLM with only one accident year and development year parameter.
implied.development = shapland.table.6[w == 1, .(d, GLM_prediction, simple_prediction)]
odp.total = implied.development[, sum(GLM_prediction)]
glm.total = implied.development[, sum(simple_prediction)]
implied.development[, odp_inc_pct := GLM_prediction / odp.total]
implied.development[, glm_inc_pct := simple_prediction / glm.total]
implied.development[, odp_cumulative_pct := cumsum(odp_inc_pct)]
implied.development[, glm_cumulative_pct := cumsum(glm_inc_pct)]
datatable(implied.development[, .(d, odp_inc_pct, odp_cumulative_pct, glm_inc_pct, glm_cumulative_pct)])
ggplot(data = implied.development, aes(x = d, y = odp_inc_pct)) + geom_line(colour = "green") + geom_line(aes(y = odp_cumulative_pct), colour = "blue") + labs(title = "Implied Development Pattern - ODP Model")
ggplot(data = implied.development, aes(x = d, y = glm_inc_pct)) + geom_line(colour = "green") + geom_line(aes(y = glm_cumulative_pct), colour = "blue") + labs(title = "Implied Development Pattern - Two-parameter GLM")
Visually, we can see how the two-parameter GLM has smoothed out the development pattern relative to the ODP / chain ladder method.
Similar to the tests of normality on the individual models, the distribution simulation as a whole can be tested against the assumption that the percentiles of observed data, under the distribution produced by the model, are uniformly distributed. The expected percentiles are \[ \frac{1}{n+1}, \frac{2}{n+1}, \ldots, \frac{n}{n+1} \] and are plotted on the horizontal axis. The sorted predicted percentiles are plotted on the vertical axis. A more formal statistal test, Kolmogorov-Smirnov, can be performed. The test statistic is \[ D = \max|p_i - f_i| \] where \(p_i\) are the model percentiles and \(f_i\) are the expected percentiles. The critical value is \(1.36 / \sqrt{n}\), where \(n\) is the number of observations.
The simulated data below can illustrate the relationship between the qualitative shape of the graph and the distribution. I’ll assume the observed follow a Gamma distribution with mean 30 and variance 180, then assess the performance of the following models:The black line indicates a 45-degree angle, and the red lines indicate the Kolmogorov-Smirnov bounds. The simulated gamma has a larger mean than the gamma to which it is being compared.
set.seed(123)
pp.plot.data = data.table(simulation_number = 1:1000)
pp.plot.data[, observed := rgamma(1000, shape = 5, scale = 6)]
pp.plot.data[, small_gamma_percentile := pgamma(observed, shape = 20/9, scale = 9)]
pp.plot.data[, medium_gamma_percentile := pgamma(observed, shape = 5, scale = 6)]
pp.plot.data[, large_gamma_percentile := pgamma(observed, shape = 80/9, scale = 9/2)]
pp.plot.data[, light_tail_gamma_percentile := pgamma(observed, shape = 10, scale = 3)]
pp.plot.data[, heavy_tail_gamma_percentile := pgamma(observed, shape = 30/12, scale = 12)]
datatable(pp.plot.data)
pp.plot = function(percentiles) {
n = length(percentiles)
graph.data = data.table(percentile = percentiles)[order(percentiles)]
graph.data$percentile_rank = 1:n
graph.data[, expected_percentile := percentile_rank / (n + 1)]
pp.plot = ggplot(data = graph.data, aes(x = expected_percentile, y = percentile)) + geom_point() + geom_abline(intercept = 0, slope = 1) + geom_abline(intercept = 1.36 / sqrt(n), slope = 1, colour = "red") + geom_abline(intercept = -1.36 / sqrt(n), colour = "red")
histogram = ggplot(data = graph.data, aes(x = percentile)) + geom_histogram(binwidth = 0.1, center = 0.05)
ks_value = graph.data[, max(abs(expected_percentile - percentile))]
return(list("pp.plot" = pp.plot, "histogram" = histogram, "ks_value" = ks_value, "ks_crit" = 1.36 / sqrt(n)))
}
For a well-fitting model, the percentiles should be uniformly distributed:
result = (pp.plot(pp.plot.data$medium_gamma_percentile))
plot(result$histogram)
plot(result$pp.plot)
print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
## [1] "The Kolmogorov-Statistic is 0.0408437644594914 and the critical value is 0.04300697617829"
If the model is biased high, then the observed values will be in low percentiles too often:
result = (pp.plot(pp.plot.data$large_gamma_percentile))
plot(result$histogram)
plot(result$pp.plot)
print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
## [1] "The Kolmogorov-Statistic is 0.354580925461175 and the critical value is 0.04300697617829"
If the model is biased low, then the observed values will appear in large percentiles too often:
result = (pp.plot(pp.plot.data$small_gamma_percentile))
plot(result$histogram)
plot(result$pp.plot)
print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
## [1] "The Kolmogorov-Statistic is 0.331435986121301 and the critical value is 0.04300697617829"
If the model is too light in the tails, then the observed values will occur in large and small percentiles too often. In this case, the confidence intervals produced by the model will be too small.
result = (pp.plot(pp.plot.data$light_tail_gamma_percentile))
plot(result$histogram)
plot(result$pp.plot)
print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
## [1] "The Kolmogorov-Statistic is 0.129961155261367 and the critical value is 0.04300697617829"
If the model is too heavy in the tails, then the observed values will occur in middle percentiles too often. In this case, the confidence intervals produced by the model will be too wide.
result = (pp.plot(pp.plot.data$heavy_tail_gamma_percentile))
plot(result$histogram)
plot(result$pp.plot)
print(paste0("The Kolmogorov-Statistic is ", result$ks_value, " and the critical value is ", result$ks_crit))
## [1] "The Kolmogorov-Statistic is 0.105180487856205 and the critical value is 0.04300697617829"
The weights are used to randomly sample the specified percentage of iterations from each simulation.
Results may need to be “shifted” if case reserves are not consistent with the calculated IBNR. (For example, positive case reserves but negative calculated IBNR means we should consider whether or not the case reserves are likely to be redundant.) This can be done additively (maintain the shape and width of the distribution) or multiplicatively (maintain shape, but change the width).