This notebook illustrates the concepts in the Exam 7 readings Measuring the Variability of Chain Ladder Reserve Estimates by Thomas Mack and Testing the Assumptions of Age-to-Age Factors by Gary Venter. These concepts are illustrated by reproducing the examples in the papers. The following packages are used in this notebook:
easypackages::packages("data.table", "DT", "ggplot2", "dplyr")
options(scipen=999)
Mack illustrates his ideas using the following data (provided here in tabular format):
mack.table = fread("./mack_table_1993.csv")
datatable(mack.table)
The above format may be easier for performing calculations, but for visualization purposes we can also convert it to a traditional triangle format, with rows representing accident years and columns representing development years.
mack.triangle = data.table("AY" = 1:10)
for(j in 1:10) {
current.column = mack.table[k == j, .(i, loss)]
current.column = current.column[order(i)]
current.column = current.column[, loss]
length(current.column) = 10
mack.triangle = cbind(mack.triangle, current.column)
names(mack.triangle)[j+1] = paste0("DY_", j)
}
mack.triangle
## AY DY_1 DY_2 DY_3 DY_4 DY_5 DY_6 DY_7 DY_8 DY_9 DY_10
## 1: 1 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
## 2: 2 106 4285 5396 10666 13782 15599 15496 16169 16704 NA
## 3: 3 3410 8992 13873 16141 18735 22214 22863 23466 NA NA
## 4: 4 5655 11555 15766 21266 23425 26083 27067 NA NA NA
## 5: 5 1092 9565 15836 22169 25955 26180 NA NA NA NA
## 6: 6 1531 6445 11702 12935 15852 NA NA NA NA NA
## 7: 7 557 4020 10946 12314 NA NA NA NA NA NA
## 8: 8 1351 6947 13112 NA NA NA NA NA NA NA
## 9: 9 3133 5395 NA NA NA NA NA NA NA NA
## 10: 10 2063 NA NA NA NA NA NA NA NA NA
Notation used by Mack is as follows:
In this notation, the age-to-age factors are defined as \(c_{j,k}\)-weighted averages of the accident year observed development factors: \[ f_{k,1} = \frac{\sum_{1\leq i \leq I-k} c_{i, k+1}}{\sum_{1\leq i \leq I-k} c_{i, k}} \] The unweighted averages of the development factors are denoted by \[ f_{k,2} = \frac{1}{I-k} \sum_{1\leq i \leq I-k} \frac{c_{i, k+1}}{c_{i,k}} \] Averages of development factors weighted by \(c_{i,k}^2\) are denoted by \[ f_{k,0} = \frac{\sum_{1\leq i \leq I-k} c_{i,k} c_{i,k+1}}{\sum_{1\leq i \leq I-k} c_{i,k}^2} \]
These factors can be calculated from the example above as follows:
mack.age.to.age = mack.table[, next_dev := k+1]
mack.age.to.age = merge(mack.age.to.age, mack.age.to.age[, .(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
mack.age.to.age[, obs_dev_factor := next_loss / loss]
mack.factors = mack.age.to.age[, .(f_k_0 = sum(loss * next_loss) / sum(loss^2), f_k_1 = sum(next_loss) / sum(loss), f_k_2 = mean(obs_dev_factor)), by = k]
datatable(mack.factors)
Venter uses the following notation:
In this notebook, I’ll use Mack’s notation.
More generally, Venter considers the variance to be a function, of development period and the cumulative losses to date: \[ \mathrm{Var}(c_{j, k+1} | c_{j,1},\ldots, c_{j,k}) = a(d, c_{j,k}) \] where \(a\) is a function that does not vary by accident year. Under this assumption, the minimum variance unbiased estimator is the one that minimizes \[ \sum_w [f_k c_{j,k} - (c_{j, k+1} - 1)]^2 g(k)/ a(k, c_{j,k}) \] for some function \(g(k)\), typically chosen to simplify the minimization.
If this test fails, one approach is to test an alternate emergence pattern. (See next section.)
Calculating standard errors for the chain ladder method requires an assumption about the variance of the losses. Mack points out that since \(c_{j, k+1} / c_{j,k}\) is an unbiased estimator of \(f_k\), then each of \(f_{k,0}\), \(f_{k,1}\), and \(f_{k,2}\) are also unbiased estimators of \(f_k\); more generally, any weighted average of \(c_{j,k+1}/c_{j,k}\) will be unbiased. Mack explains that the variance of the estimator is minimized when the weights are inversely proportional to \(\mathrm{Var}(c_{j, k+1} / c_{j,k} | c_{j,1}, \ldots, c_{j,k})\). Moreover, these weights can also be interpreted as the weights in a least-squares regression, because if we want to select \(f_k\) in order to minimize the quantity \[ \sum_i w_i (c_{i, k+1} - c_{i,k} f_k)^2 \] then, setting the derivative with respect to \(f_k\) equal to 0, we obtain \[ 0 = \sum_i w_ic_{i,k}(c_{i,k} f_k - c_{i,k+1}) \] Rearranging, we obtain \[ f_k = \frac{\sum_i w_i c_{i,k}^2 \frac{c_{i, k+1}}{c_{i,k}}}{\sum_i w_i c_{i,k}^2} \] Therefore, the least-squares regression fit is the average of \(c_{i,k+1} / c_{i,k}\), weighted by \(w_i c_{i,k}^2\).
This means that:A key point to take away from the above is that least squares regression implicitly assumes the variance is constant for all observations, so we need to use weighted regression, with weights inversely proportional to the variance, if we want to assume the variance changes based on the size of the loss, as we do in the chain ladder method.
Standard error estimates can be obtained by performing linear regression. Note that under Mack’s assumptions, there is no intercept, which is indicated by “+ 0” in the model formula. In ordet to correctly calculate \(p\)-values, incremental loss is used as the response variable – the reason is that in this case, the significance test will be against the hypothesis that the factor is zero (default in the LM function), rather than testing against the hypothesis that the factor is 1. Here are the details of the models for each of the three weighting options discussed by Mack, for the first development period.
mack.age.to.age[, weight := 1]
mack.age.to.age[, incremental_loss := next_loss - loss]
f_0.model = lm(incremental_loss ~ loss + 0, data = mack.age.to.age[k == 1], weights = weight)
summary(f_0.model)
##
## Call:
## lm(formula = incremental_loss ~ loss + 0, data = mack.age.to.age[k ==
## 1], weights = weight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2843.5 -983.1 2785.0 3951.6 7143.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## loss 1.2172 0.4106 2.964 0.018 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3769 on 8 degrees of freedom
## Multiple R-squared: 0.5234, Adjusted R-squared: 0.4638
## F-statistic: 8.786 on 1 and 8 DF, p-value: 0.01803
mack.age.to.age[, weight := 1 / loss]
f_1.model = lm(incremental_loss ~ loss + 0, data = mack.age.to.age[k == 1], weights = weight)
summary(f_1.model)
##
## Call:
## lm(formula = incremental_loss ~ loss + 0, data = mack.age.to.age[k ==
## 1], weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -95.36 -71.36 47.45 99.60 385.34
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## loss 1.997 1.129 1.768 0.115
##
## Residual standard error: 166.9 on 8 degrees of freedom
## Multiple R-squared: 0.281, Adjusted R-squared: 0.1911
## F-statistic: 3.127 on 1 and 8 DF, p-value: 0.115
mack.age.to.age[, weight := 1 / loss^2]
f_2.model = lm(incremental_loss ~ loss + 0, data = mack.age.to.age[k == 1], weights = weight)
summary(f_2.model)
##
## Call:
## lm(formula = incremental_loss ~ loss + 0, data = mack.age.to.age[k ==
## 1], weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -6.551 -6.157 -3.991 -0.983 32.224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## loss 7.201 4.114 1.75 0.118
##
## Residual standard error: 12.34 on 8 degrees of freedom
## Multiple R-squared: 0.2769, Adjusted R-squared: 0.1865
## F-statistic: 3.063 on 1 and 8 DF, p-value: 0.1182
Notice that both \(f_0\) is statistically significant, but \(f_1\) and \(f_2\) are borderline. This is not surprising, since \(f_2\) corresponds to a higher variance assumption. As a result, if the variance assumption corresponding to \(f_2\) is considered reasonable, then we might question whether Mack’s first assumption is valid in this case.
To quickly compare development periods, use the following function:
cl.significance = function(development.period = 1, weight.power = 1) {
temporary.copy = copy(mack.age.to.age)
temporary.copy[, weight := 1 / loss^weight.power]
temporary.model = lm(incremental_loss ~ loss + 0, data = temporary.copy[k == development.period], weights = weight)
return(summary(temporary.model)$coefficients)
}
For unweighted regression / square-weighted averages:
constant.variance.summary = lapply(1:8, function(r) cl.significance(development.period = r, weight.power = 0))
constant.variance.summary
## [[1]]
## Estimate Std. Error t value Pr(>|t|)
## loss 1.217176 0.4106412 2.964087 0.01803326
##
## [[2]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.5689516 0.1087864 5.229989 0.001212464
##
## [[3]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.2608889 0.07063776 3.693336 0.01016841
##
## [[4]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.1619717 0.02307658 7.018879 0.0009054525
##
## [[5]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.09970741 0.03610088 2.761911 0.05075004
##
## [[6]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.04053438 0.01984237 2.042819 0.1336836
##
## [[7]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.03219615 0.00471755 6.82476 0.02080209
##
## [[8]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.01588833 0.01494527 1.063101 0.4805347
For the usual chain ladder estimates:
linear.variance.summary = lapply(1:8, function(r) cl.significance(development.period = r, weight.power = 1))
linear.variance.summary
## [[1]]
## Estimate Std. Error t value Pr(>|t|)
## loss 1.996887 1.12933 1.768206 0.1150002
##
## [[2]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.6235228 0.1358361 4.590257 0.002513007
##
## [[3]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.2708881 0.09049822 2.993298 0.02421683
##
## [[4]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.1716746 0.02538993 6.761525 0.00107481
##
## [[5]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.1133849 0.03537668 3.205074 0.03274211
##
## [[6]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.04193464 0.02257781 1.857338 0.1602526
##
## [[7]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.03326355 0.004881918 6.813624 0.02086803
##
## [[8]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.01693648 0.01505585 1.12491 0.4626201
For the unweighted average method:
square.variance.summary = lapply(1:8, function(r) cl.significance(development.period = r, weight.power = 2))
square.variance.summary
## [[1]]
## Estimate Std. Error t value Pr(>|t|)
## loss 7.200535 4.114158 1.750184 0.1181997
##
## [[2]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.6958945 0.1676164 4.151708 0.004285933
##
## [[3]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.3145103 0.1198492 2.624218 0.0393632
##
## [[4]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.1829256 0.02726923 6.708134 0.001114491
##
## [[5]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.1269622 0.03338933 3.802479 0.01906297
##
## [[6]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.04332764 0.02512291 1.724626 0.1830604
##
## [[7]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.0343554 0.004953969 6.934924 0.02016614
##
## [[8]]
## Estimate Std. Error t value Pr(>|t|)
## loss 0.01799499 0.01509302 1.192273 0.4443079
Note that the above approximations are not the AIC or BIC, but they will produce the same relative rankings of models.
An easy check of the linearity assumpion is to plot \(c_{i, k+1}\) against \(c_{i,k}\) for each development period, along with the slope from the chain ladder method and a line fit to the points:
linearity.check = function(dev_period) {
slope = mack.age.to.age[k == dev_period, sum(next_loss)] / mack.age.to.age[k == dev_period, sum(loss)]
return(ggplot(data = mack.age.to.age[k == dev_period], aes(x = loss, y = next_loss)) + geom_point() + geom_abline(intercept = 0, slope = slope, color = "red") + geom_smooth(method = 'lm') + coord_cartesian(ylim = c(0, 25000), xlim = c(0,25000)) + labs(title = paste0("Visual Linearity Check for Development Period", dev_period)))
}
plot(linearity.check(dev_period = 1))
## Warning in plyr::split_indices(scale_id, n): '.Random.seed' is not an
## integer vector but of type 'NULL', so ignored
plot(linearity.check(dev_period = 2))
plot(linearity.check(dev_period = 3))
The departure of the points from the line through the origin for development periods 1 and 2 suggests that a linearity assumption is not appropriate, and that a linear-with-constant approach may provide a better fit. Moreover, given that the earlier development periods have more data points, it is more reasonable to add an extra parameter here than it is in later years. An additional consideraiton is where the value of the loss in the year being projected lies relative to the other points – if it is in an area where the proportionality assumption produces reasonable estimates, then an extra parameter may not be needed.
Corresponds to the assumption \[ E(c_{i, k+1} | c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = c_{i,k}f_k + g_d \] A constant term often tends to be significant in slowly-reporting lines. The following example shows a linear fit to the first development period, using weights corresponding to the usual Chain Ladder variance assumption:
mack.age.to.age[, weight := 1 / loss]
lin.with.constant = lm(next_loss ~ loss, data = mack.age.to.age[k == 1], weights = weight)
summary(lin.with.constant)
##
## Call:
## lm(formula = next_loss ~ loss, data = mack.age.to.age[k == 1],
## weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -48.883 -30.266 4.825 8.985 118.355
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4328.519 516.315 8.383 0.0000675 ***
## loss 1.214 0.421 2.883 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 53.71 on 7 degrees of freedom
## Multiple R-squared: 0.5428, Adjusted R-squared: 0.4775
## F-statistic: 8.31 on 1 and 7 DF, p-value: 0.02356
The intercept is statistically significant, as suggested by the visual check. Metrics for evaluating this model include:
lwc.prediction = predict(lin.with.constant)
observed = mack.age.to.age[k == 1, next_loss]
lwc.SSE = sum((lwc.prediction - observed)^2)
lwc.adj.SSE = lwc.SSE / (9-2)^2
lwc.AIC = lwc.SSE * exp(2 * 2 / 9)
lwc.BIC = lwc.SSE * 9^(2/9)
print(paste0("The SSE is ", lwc.SSE))
## [1] "The SSE is 29804878.0451142"
print(paste0("The adjusted SSE is ", lwc.adj.SSE))
## [1] "The adjusted SSE is 608262.817247228"
print(paste0("The AIC approximation is ", lwc.AIC))
## [1] "The AIC approximation is 46484388.1424645"
print(paste0("The BIC approximation is ", lwc.BIC))
## [1] "The BIC approximation is 48566995.7879631"
Compare this to the classic chain ladder approach:
lin.without.constant = lm(next_loss ~ loss + 0, data = mack.age.to.age[k == 1], weights = weight)
lnc.prediction = predict(lin.without.constant)
lnc.SSE = sum((lnc.prediction - observed)^2)
lnc.adj.SSE = lnc.SSE / (9-1)^2
lnc.AIC = lnc.SSE * exp(2 * 1 / 9)
lnc.BIC = lnc.SSE * 9^(1/9)
print(paste0("The SSE is ", lnc.SSE))
## [1] "The SSE is 164826968.411835"
print(paste0("The adjusted SSE is ", lnc.adj.SSE))
## [1] "The adjusted SSE is 2575421.38143493"
print(paste0("The AIC approximation is ", lnc.AIC))
## [1] "The AIC approximation is 205843973.082097"
print(paste0("The BIC approximation is ", lnc.BIC))
## [1] "The BIC approximation is 210404593.218451"
On all metrics, the linear-with-constant assumpion performs better in the first development period.
The significance of the parameters across all development periods can be calculated as follows: To quickly compare development periods, use the following function:
lwc.significance = function(development.period = 1, weight.power = 1) {
temporary.copy = copy(mack.age.to.age)
temporary.copy[, weight := 1 / loss^weight.power]
temporary.model = lm(incremental_loss ~ loss , data = temporary.copy[k == development.period], weights = weight)
return(summary(temporary.model)$coefficients)
}
Using the same variance assumption underlying the chain ladder method:
lwc.significance.summary= lapply(1:8, function(r) lwc.significance(development.period = r, weight.power = 1))
lwc.significance.summary
## [[1]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4328.5185518 516.3149822 8.383484 0.00006751437
## loss 0.2137288 0.4210268 0.507637 0.62730766555
##
## [[2]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4159.69011653 2531.3746645 1.6432534 0.1514361
## loss 0.06961748 0.3584235 0.1942325 0.8524032
##
## [[3]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4235.91787539 2814.5177702 1.5050244 0.1926563
## loss -0.08032389 0.2474278 -0.3246357 0.7585990
##
## [[4]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2188.78926055 1133.1054409 1.9316731 0.1255735
## loss 0.03340911 0.0744334 0.4488457 0.6767858
##
## [[5]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3562.27352834 2031.4095457 1.7535969 0.1777793
## loss -0.07324665 0.1102312 -0.6644822 0.5538935
##
## [[6]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 589.27570754 2510.3455766 0.23473888 0.8362549
## loss 0.01249918 0.1283294 0.09739922 0.9312911
##
## [[7]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 792.282542040 148.928492579 5.319886 0.1182876
## loss -0.008903059 0.008028189 -1.108975 0.4671339
##
## [[8]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3723.720377 NaN NaN NaN
## loss -0.197212 NaN NaN NaN
These results differ from Table 2 in Venter’s paper; it appears as though Venter is performing unweighted regression, which corresponds to an assumption of constant loss variance in each accident year:
lwc.significance.summary= lapply(1:8, function(r) lwc.significance(development.period = r, weight.power = 0))
lwc.significance.summary
## [[1]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5110.4943268 1067.0808198 4.7892289 0.00199091
## loss -0.1084107 0.3488151 -0.3107971 0.76500523
##
## [[2]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4311.47097581 2440.121266 1.7669085 0.1276690
## loss 0.04940631 0.309099 0.1598397 0.8782525
##
## [[3]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1687.1787584 3543.1408684 0.4761817 0.6540251
## loss 0.1309993 0.2830779 0.4627677 0.6629728
##
## [[4]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2061.0688312 1164.74158341 1.7695503 0.1515190
## loss 0.0414772 0.07078186 0.5859863 0.5893551
##
## [[5]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4064.45972680 2241.921036 1.8129362 0.1674982
## loss -0.09955676 0.113622 -0.8762103 0.4454157
##
## [[6]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 620.43279829 2300.8713491 0.26965123 0.8127021
## loss 0.01094283 0.1123065 0.09743715 0.9312645
##
## [[7]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 777.333754530 144.682631190 5.372682 0.1171514
## loss -0.008107459 0.007600268 -1.066733 0.4794512
##
## [[8]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3723.720377 NaN NaN NaN
## loss -0.197212 NaN NaN NaN
In either case, the factors are generally not significant, though in some cases the intercepts are. This suggests that the assumptions underlying the chain ladder method are not supported by the data, and a pattern of constant emergence in each age appears to be more suitable.
To fit this model, Venter recommends using an iterative process to minimize \[ \sum_{i,k} ((c_{i,k+1} - c_k) - h_i f_k)^2 \] The approach is to fix some initial set of values for \(f_k\), such as the values coming from the chain ladder method. Then, select the best \(h_i\) for these values: \[ h_i = \frac{\sum_i f_i (c_{i,k+1} - c_{i,k})}{\sum_i f_i^2} \] Then, keeing \(h_i\) fixed, optimize \(f_k\): \[ f_k = \frac{\sum_k h_k (c_{i,k+1} - c_{i, k})}{\sum_k h_k^2} \] Iterate until convergence is reached. A simpler approach to fit a multiplicative model would be to use a log-link GLM, though a drawback of this approach is that it does not work with a negative incremental loss (and there is one in this model), so I fit the model to the data without this point. Venter’s model, in which there is a different parameter for each accident year and development period corresponds to treating accident year and development period as categorical variables. I have also slightly modified the data prep to include the most recent diagonal as well, which can be used for this approach.
venter.regression = mack.table[, prev_dev := k-1]
venter.regression = merge(venter.regression, venter.regression[, .(i, k, prev_loss = loss)], by.x = c("i", "prev_dev"), by.y = c("i", "k"), all.x = TRUE)
venter.regression[is.na(prev_loss), prev_loss := 0]
venter.regression[, incremental_loss := loss - prev_loss]
venter.regression[, i_factor := as.factor(i)]
venter.regression[, k_factor := as.factor(k)]
bf.model = glm(incremental_loss ~ i_factor + k_factor, data = venter.regression[incremental_loss > 0], family = gaussian(link = "log"))
summary(bf.model)
##
## Call:
## glm(formula = incremental_loss ~ i_factor + k_factor, family = gaussian(link = "log"),
## data = venter.regression[incremental_loss > 0])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2389.4 -770.7 -9.0 916.9 3311.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.43868 0.34199 21.751 < 0.0000000000000002 ***
## i_factor2 0.04172 0.35845 0.116 0.90800
## i_factor3 0.38377 0.31202 1.230 0.22692
## i_factor4 0.53040 0.29961 1.770 0.08539 .
## i_factor5 0.67987 0.29012 2.343 0.02491 *
## i_factor6 0.22597 0.33508 0.674 0.50450
## i_factor7 0.17310 0.35088 0.493 0.62485
## i_factor8 0.45242 0.32809 1.379 0.17665
## i_factor9 -0.19098 0.55464 -0.344 0.73265
## i_factor10 0.19324 0.86107 0.224 0.82374
## k_factor2 0.77729 0.25466 3.052 0.00432 **
## k_factor3 0.67434 0.26154 2.578 0.01429 *
## k_factor4 0.37717 0.29293 1.288 0.20634
## k_factor5 0.09639 0.34023 0.283 0.77861
## k_factor6 -0.24978 0.44018 -0.567 0.57403
## k_factor7 -0.83626 0.93348 -0.896 0.37645
## k_factor8 -1.19388 1.55748 -0.767 0.44849
## k_factor9 -1.75805 3.86074 -0.455 0.65166
## k_factor10 -2.29119 9.48449 -0.242 0.81052
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2657788)
##
## Null deviance: 239165641 on 53 degrees of freedom
## Residual deviance: 93018000 on 35 degrees of freedom
## AIC: 968.65
##
## Number of Fisher Scoring iterations: 9
To determine the percentage reported in each year, exponentiate the development period parameters and normalize so that they are equal to 1. Note that the factor for the first development period is the base level, so its factor is implicitly equal to 1.
dev.parameters = bf.model$coefficients[11:19]
dev.factors = c(1, exp(dev.parameters))
dev.normalization = sum(dev.factors)
dev.factors = dev.factors / dev.normalization
dev.factors
## k_factor2 k_factor3 k_factor4 k_factor5 k_factor6
## 0.10541262 0.22933229 0.20689807 0.15370840 0.11607924 0.08211337
## k_factor7 k_factor8 k_factor9 k_factor10
## 0.04567810 0.03194463 0.01817117 0.01066211
In a similar manner, we can determine the accident year parameters, grouping the intercept with these parameters. The normalization from the previous step is re-incorporated so that the normalization performed above does not change the overall prediction.
ay.parameters = c(0, bf.model$coefficients[2:10])
intercept = bf.model$coefficients[1]
ay.parameters = ay.parameters + intercept
ay.factors = exp(ay.parameters) * dev.normalization
ay.factors
## i_factor2 i_factor3 i_factor4 i_factor5 i_factor6
## 16131.89 16819.18 23678.63 27418.03 31838.35 20221.86
## i_factor7 i_factor8 i_factor9 i_factor10
## 19180.65 25361.24 13327.32 19570.71
The numbers don’t match Venter’s exactly, due to the removal of the negative incremental loss and the differences in fitting methodology, but the overall trend in the numbers is similar. Venter points out that alternative normalizations may be performed; for example, rather than normalizing so that the development period factors sum to 1, we could normalize so that the accident year parameter for the oldest accident year is equal to the loss for that year on the most recent diagonal. (Note the unintuitive estimate of an ultimate loss of 16K on an accident year that already has an incurred loss above 18K.)
To compare these results to the chain ladder approach, we should delete the predictions in the first development period from the calculation, since the chain ladder method does not provide predictions for these values.
venter.regression[incremental_loss > 0 , prediction := predict(bf.model, type = "response")]
venter.SSE = venter.regression[k != 1 & incremental_loss > 0, sum((prediction - incremental_loss)^2)]
venter.adjusted.SSE = venter.SSE / (44 - 18)^2
venter.AIC = venter.SSE * exp(2 * 18 / 44)
venter.BIC = venter.SSE * 44 ^ ( 18 / 44)
print(paste0("The adjusted SSE is ", venter.adjusted.SSE))
## [1] "The adjusted SSE is 86428.3749247407"
print(paste0("The AIC approximation is ", venter.AIC))
## [1] "The AIC approximation is 132414300.914265"
print(paste0("The BIC approximation is ", venter.BIC))
## [1] "The BIC approximation is 274741707.25359"
In this framework, the Cape Cod method, in which there is a single parameter for all accident years, corresponds removing accident year from the model and allowing the intercept to play the role of the single “h” parameter:
cape.cod.model = glm(incremental_loss ~ k_factor, data = venter.regression[incremental_loss > 0], family = gaussian(link = "log"))
summary(cape.cod.model)
##
## Call:
## glm(formula = incremental_loss ~ k_factor, family = gaussian(link = "log"),
## data = venter.regression[incremental_loss > 0])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3571.5 -995.3 -11.0 747.0 3625.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.77947 0.22545 34.507 < 0.0000000000000002 ***
## k_factor2 0.70672 0.25410 2.781 0.00794 **
## k_factor3 0.67212 0.25960 2.589 0.01300 *
## k_factor4 0.31220 0.29953 1.042 0.30295
## k_factor5 0.12806 0.34118 0.375 0.70920
## k_factor6 -0.09966 0.41822 -0.238 0.81276
## k_factor7 -0.72877 0.88238 -0.826 0.41331
## k_factor8 -1.34172 1.59075 -0.843 0.40354
## k_factor9 -2.09419 4.09900 -0.511 0.61197
## k_factor10 -2.63197 9.91337 -0.265 0.79187
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2905864)
##
## Null deviance: 239165641 on 53 degrees of freedom
## Residual deviance: 127855554 on 44 degrees of freedom
## AIC: 967.83
##
## Number of Fisher Scoring iterations: 5
As before, we can normalize the development period factors:
cc.dev.parameters = cape.cod.model$coefficients[2:10]
cc.dev.factors = c(1, exp(cc.dev.parameters))
cc.dev.normalization = sum(cc.dev.factors)
cc.dev.factors = cc.dev.factors / cc.dev.normalization
cc.dev.factors
## k_factor2 k_factor3 k_factor4 k_factor5 k_factor6
## 0.107147613 0.217223001 0.209836343 0.146410104 0.121786489 0.096984050
## k_factor7 k_factor8 k_factor9 k_factor10
## 0.051699134 0.028008054 0.013197395 0.007707817
The single parameter \(h\) is:
h = exp(cape.cod.model$coefficients[1]) * cc.dev.normalization
h
## (Intercept)
## 22315.01
Evaluating this model as before:
venter.regression[incremental_loss > 0 , cc.prediction := predict(cape.cod.model, type = "response")]
cc.SSE = venter.regression[k != 1 & incremental_loss > 0, sum((cc.prediction - incremental_loss)^2)]
cc.adjusted.SSE = cc.SSE / (44 - 10)^2
cc.AIC = cc.SSE * exp(2 * 10 / 44)
cc.BIC = cc.SSE * 44 ^ ( 10 / 44)
print(paste0("The adjusted SSE is ", cc.adjusted.SSE))
## [1] "The adjusted SSE is 83514.122973307"
print(paste0("The AIC approximation is ", cc.AIC))
## [1] "The AIC approximation is 152098293.522096"
print(paste0("The BIC approximation is ", cc.BIC))
## [1] "The BIC approximation is 228154863.400844"
The adjusted SSE and BIC indicate that the Cape Cod approach is superior to the BF method, but the AIC does not.
Intermediate approaches would generally involve simplifying the “i_factor” and “k_factor” variables prior to fitting, such as by grouping levels with similar parameter estimates, or by fitting them continuously rather than discretely. To illustrate this idea, Venter proposes the following grouping:In the log-link GLM framework, it’s tricky to enforce Venter’s condition on Accident Year 3 on a predicted value scale, but it is easy to enforce it on a linear predictor scale, which should still be a reasonable approximation for factors close to 0.
venter.regression[, grouped_k_factor := as.factor(ifelse(k == 2 | k == 3, "2-3",
ifelse(k == 7 | k == 8, "7-8",
ifelse(k == 9 | k == 10, "9-10", as.character(k)))))]
venter.regression[, i_continuous_capped := ifelse(i < 3, 2,
ifelse(i > 5, 6, i))] # Linear extrapolation between levels 3, 4, 5
venter.regression[, grouped_i_factor := as.factor(ifelse(i < 3, "<3",
ifelse(i > 5, ">5", "3-5")))]
revised.bf.model = glm(incremental_loss ~ grouped_k_factor + i_continuous_capped + grouped_i_factor, data = venter.regression[incremental_loss > 0], family = gaussian(link = "log"))
summary(revised.bf.model)
##
## Call:
## glm(formula = incremental_loss ~ grouped_k_factor + i_continuous_capped +
## grouped_i_factor, family = gaussian(link = "log"), data = venter.regression[incremental_loss >
## 0])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2453.3 -905.8 -80.2 919.3 3257.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.17985 0.32599 22.024 < 0.0000000000000002 ***
## grouped_k_factor2-3 0.70870 0.21280 3.330 0.00176 **
## grouped_k_factor4 0.34512 0.25883 1.333 0.18926
## grouped_k_factor5 0.07793 0.30414 0.256 0.79897
## grouped_k_factor6 -0.26326 0.39900 -0.660 0.51282
## grouped_k_factor7-8 -0.98515 0.75602 -1.303 0.19933
## grouped_k_factor9-10 -1.93400 3.43072 -0.564 0.57580
## i_continuous_capped 0.14508 0.10054 1.443 0.15610
## grouped_i_factor>5 -0.35623 0.44901 -0.793 0.43183
## grouped_i_factor3-5 0.22708 0.28684 0.792 0.43279
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2259412)
##
## Null deviance: 239165641 on 53 degrees of freedom
## Residual deviance: 99412810 on 44 degrees of freedom
## AIC: 954.24
##
## Number of Fisher Scoring iterations: 8
Performance of this model is as follows:
venter.regression[incremental_loss > 0 , rev_prediction := predict(revised.bf.model, type = "response")]
rev.bf.SSE = venter.regression[k != 1 & incremental_loss > 0, sum((rev_prediction - incremental_loss)^2)]
rev.bf.adjusted.SSE = rev.bf.SSE / (44 - 9)^2
rev.bf.AIC = rev.bf.SSE * exp(2 * 9 / 44)
rev.bf.BIC = rev.bf.SSE * 44 ^ ( 9 / 44)
print(paste0("The adjusted SSE is ", rev.bf.adjusted.SSE))
## [1] "The adjusted SSE is 55435.7383395955"
print(paste0("The AIC approximation is ", rev.bf.AIC))
## [1] "The AIC approximation is 102233175.156621"
print(paste0("The BIC approximation is ", rev.bf.BIC))
## [1] "The BIC approximation is 147260645.436505"
Based on these statistics, this is the best-performing model so far.
In the additive chain ladder approach, calculate the average loss emerged in each development period:
additive.cl = venter.regression[, .(additive_cl_prediction = mean(incremental_loss)), by = k]
datatable(additive.cl)
The additive CL model can be assessed as follows. Note that even though it’s not necessary to remove the record where incremental_loss is negative, doing so allows for fairer comparisons with earlier results.
additive.cl.assessment = merge(venter.regression, additive.cl, by = "k")
add.cl.SSE = additive.cl.assessment[k != 1 & incremental_loss > 0, sum((additive_cl_prediction - incremental_loss)^2)]
add.cl.adjusted.SSE = add.cl.SSE / (44 - 9)^2
add.cl.AIC = add.cl.SSE * exp(2 * 9 / 44)
add.cl.BIC = add.cl.SSE * 44 ^ ( 9 / 44)
print(paste0("The adjusted SSE is ", add.cl.adjusted.SSE))
## [1] "The adjusted SSE is 79051.7781554908"
print(paste0("The AIC approximation is ", add.cl.AIC))
## [1] "The AIC approximation is 145785273.628082"
print(paste0("The BIC approximation is ", add.cl.BIC))
## [1] "The BIC approximation is 209994783.559439"
The performance is similar to (and better than) the Cape Cod approach.
This is similar to the above approach, except that we add a factor for each calendar year: \[ E(c_{i, k+1} | c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = h_w f_k g_{w+d} \]
mack.residuals = merge(mack.age.to.age, mack.factors, by = "k")
mack.residuals[, residual_0 := next_loss - loss * f_k_0]
mack.residuals[, residual_1 := (next_loss - loss * f_k_1) / sqrt(loss)]
mack.residuals[, residual_2 := (next_loss - loss * f_k_2) / loss]
datatable(mack.residuals)
We can plot the residuals in each development period, for each residual type:
plot.mack.residuals = function(dev_period, residual_number = 1) {
residual_name = paste0("residual_", residual_number)
return(ggplot(data = mack.residuals[k == dev_period], aes(x = loss, y = eval(parse(text = residual_name)))) + geom_point() + labs(title = paste0("Residuals for f_", dev_period, "_", residual_number)))
}
For the first development period:
plot(plot.mack.residuals(dev_period = 1, residual_number = 0))
plot(plot.mack.residuals(dev_period = 1, residual_number = 1))
plot(plot.mack.residuals(dev_period = 1, residual_number = 2))
In all cases, we see a decreasing trend in the residuals, suggesting that the model fit is not appropriate. For the second development period:
plot(plot.mack.residuals(dev_period = 2, residual_number = 0))
plot(plot.mack.residuals(dev_period = 2, residual_number = 1))
plot(plot.mack.residuals(dev_period = 2, residual_number = 2))
In the second development period, there is now little evidence of a pattern in the residuals.
Mack does not provide plots of residuals over time, but they can be produced using the same data as above:
plot.mack.residuals.over.time = function(dev_period, residual_number = 1) {
residual_name = paste0("residual_", residual_number)
return(ggplot(data = mack.residuals[k == dev_period], aes(x = i, y = eval(parse(text = residual_name)))) + geom_point() + labs(title = paste0("Residuals for f_", dev_period, "_", residual_number)))
}
For the first development period:
plot.mack.residuals.over.time(dev_period = 1, residual_number = 0)
plot.mack.residuals.over.time(dev_period = 1, residual_number = 1)
plot.mack.residuals.over.time(dev_period = 1, residual_number = 2)
Compare these plots to the residuals for the linear-with-constant model fit in the previous section:
first.period = mack.residuals[k == 1]
first.period[, lwc_residuals := resid(lin.with.constant)]
ggplot(data = first.period, aes(x = i, y = lwc_residuals)) + geom_point() + labs(title = "Residuals for the Linear With Constant model, first development period")
The residuals for the linear-with-constant model are a bit more random, and don’t show the increasing trend exhibited by the chain ladder method.
For the second development period:
plot.mack.residuals.over.time(dev_period = 2, residual_number = 0)
plot.mack.residuals.over.time(dev_period = 2, residual_number = 1)
plot.mack.residuals.over.time(dev_period = 2, residual_number = 2)
The residuals tend to be higher in the more recent accident years, suggesting limiting estimation to the most recent diagonals.
A non-residual approach to testing stability is to compare plots of factors by year to a \(n\)-year moving average. (i.e. average of the most recent \(n\) diagonals)A three-year moving average for the first development period in the Mack triangle can be calculated as follows:
mack.ma = mack.age.to.age[k == 1]
mack.ma[, factor_shift_1 := shift(obs_dev_factor, type = "lag")]
mack.ma[, factor_shift_2 := shift(factor_shift_1, type = "lag")]
mack.ma[, moving_average := (obs_dev_factor + factor_shift_1 + factor_shift_2) / 3]
ggplot(data = mack.ma, aes(x = i, y = moving_average)) + geom_point(color = "red") + geom_line(color = "red") + geom_point(aes(y = obs_dev_factor), color = "blue") + geom_line(aes(y = obs_dev_factor), color = "blue")
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_path).
This suggests that using only the most recent data may be indicated; however, the high values of the moving average early on are driven by an unusually large spike in the second accident year, so this may be the only problematic data point.
One implication of the assumption that \[ E(c_{i, k+1} | c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = c_{ik}f_k \] is that adjacent observed development factors \(c_{ik} / c_{i, k-1}\) and \(c_{i,k+1} / c_k\) are independent. In order to demonstrate this, it suffices to show that \[ E\left(\frac{c_{i,k}}{c_{i, k-1}} \frac{c_{i,k+1}}{c_k}\right) = E\left(\frac{c_{i,k}}{c_{i, k-1}}\right) E\left(\frac{c_{i,k+1}}{c_k}\right) \] This can be demonstrated using the law of total expectation, for \(j\leq k\): \[ E\left(\frac{c_{i, k+1}}{c_{i,j}}\right) = E[E[c_{i,k+1} | c_{i,1}, \ldots, c_{i,k}] / c_{i,j}] = E[c_{i,k} / c_{i,j}] f_k \] Specializing to \(j = k-1\) (for the left side), and \(j=k\) (for the right side), we obtain \[ E\left(\frac{c_{i,k}}{c_{i, k-1}} \frac{c_{i,k+1}}{c_k}\right) = E\left(\frac{c_{i, k+1}}{c_{i,k-1}}\right) = E\left(\frac{c_{i,k}}{c_{i, k-1}}\right)f_k = E\left(\frac{c_{i,k}}{c_{i, k-1}}\right) E\left(\frac{c_{i,k+1}}{c_k}\right) \]
Mack proposes a test for correlation based on Spearman’s rank correlation coefficient, which has the benefit of not requiring a distributional assumption. The test is applied to adjacent columns, with the test statistic summed over the whole table. The first step is to calculate the observed development factors \(c_{i,k+1} / c_{i,k}\):
rank.correlation.table = mack.table[, next_dev := k+1]
rank.correlation.table = merge(rank.correlation.table, rank.correlation.table[, .(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
rank.correlation.table[, obs_dev_factor := next_loss / loss]
datatable(rank.correlation.table[order(k)])
Next, within each development period, calculate the rank of each observed development factor. This is denoted by \(r_{i,k}\).
rank.correlation.table = rank.correlation.table %>% group_by(k) %>% mutate(r = order(order(obs_dev_factor)))
In order to compare adjacent columns, due to the triangular nature of the table, we will also need to determine the rank with the most recent diagonal deleted, in order to ensure we are comparing columns of the same length. This rank is denoted by \(s_{i,k}\).
shortened.table = rank.correlation.table %>% group_by(k) %>% mutate(diagonal_ay = max(i))
shortened.table = shortened.table %>% filter(i != diagonal_ay)
shortened.table = shortened.table %>% group_by(k) %>% mutate(s = order(order(obs_dev_factor)))
shortened.table$j = shortened.table$k + 1
rank.correlation.table = merge(rank.correlation.table, shortened.table %>% select(i, k, s), by = c("i", "k"))
datatable(as.data.table(rank.correlation.table)[order(k)])
The formula for Spearman’s rank correlation coefficient on column \(k\) is \[
T_k = 1 - 6\sum_{1\leq i \leq n_k} \frac{(r_{i,k} - s_{i,k})^2}{n_k^3 - n_k}
\] where \(n_k = I-k\) is the number of pairs of ranks in column \(k\). We can calculate this as follows:
rank.correlation.table = as.data.table(rank.correlation.table)
rank.correlation.summary = rank.correlation.table[, .(T = 1 - 6 * sum((r - s)^2) / (.N^3 - .N), number_of_pairs = .N), by = k]
Note that the final column does not provide a result due to the lack of a subsequent column, so we will delete this result:
rank.correlation.summary = rank.correlation.summary[!is.na(T)]
Under the assumption that the columns are not correlated, the correlation coefficients for the individual columns have the properties \[ E[T_k] = 0 \] and \[ \mathrm{Var}(T_k) = \frac{1}{n_k -1} \] Therefore, if \(T_k\) is far from 0, then we would conclude that adjacent columns are correlated.
Mack recommends extending the test to the entire triagle, in a single test statistic, for the following reasons:To extend this test to the whole triangle, we can take a weighted average of \(T_k\), using \(n_k-1\) as the weight. (Note that this is the inverse of \(\mathrm{Var}(T_k)\).) Define \[ T = \frac{\sum (n_k-1) T_k}{\sum (n_k-1)} \] It’s straitforward to validate that \(E[T] = 0\) and that \[ \mathrm{Var}(T) = \left(\sum (n_k-1)\right)^{-1} = \binom{I-2}{2}^{-1} \] In our example, this test statistic is:
T.statistic = rank.correlation.summary[, sum((number_of_pairs - 1) * T) / sum(number_of_pairs - 1)]
T.statistic
## [1] 0.8156463
The variance of the test statistic is:
T.sd = sqrt(1/(rank.correlation.summary[, sum(number_of_pairs - 1)]))
T.sd
## [1] 0.1889822
Mack recommends a test at the 50% confidence level, corresponding to a Z value of 0.67, on the grounds that the test is only approximate in nature. In this case, we reject the hypothesis that the columns are uncorrelated if the \(T\) statistic excceeds
0.67 * T.sd
## [1] 0.1266181
Since \(T\) lies within the 50% confidence interval, we do not reject the hypothesis that the columns are uncorrelated.
Venter’s approach is to calculate the Pearson correlation coefficient for pairs of subsequent columns, and then to look at the number of pairs of columns for which there is a statistically significant correlation. (Note that Venter uses ratios of incremental to cumulative, rather than cumulative to cumulative; however, these ratios differ by exactly 1, so correlation calculations will be identical using either approach.) We begin by calculating observed development factors, pairing up subsequent columns, and deleting the last element from the longer column:
pearson.correlation.table = mack.table[, next_dev := k+1]
pearson.correlation.table = merge(pearson.correlation.table, pearson.correlation.table[, .(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
pearson.correlation.table[, obs_dev_factor := next_loss / loss]
pearson.correlation.table = merge(pearson.correlation.table, pearson.correlation.table[, .(i, k, next_dev_factor = obs_dev_factor)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
datatable(pearson.correlation.table[order(k)])
Using \(X\) to denote the earlier column and \(Y\) to denote the later column, we obtain the following summary statistics:
pearson.correlation.summary = pearson.correlation.table[, .(mean_X = mean(obs_dev_factor),
mean_Y = mean(next_dev_factor),
mean_X_squared = mean(obs_dev_factor^2),
mean_Y_squared = mean(next_dev_factor^2),
mean_XY = mean(obs_dev_factor * next_dev_factor),
number_of_obs = .N), by = k]
pearson.correlation.summary[, var_X := mean_X_squared - mean_X^2]
pearson.correlation.summary[, var_Y := mean_Y_squared - mean_Y^2]
pearson.correlation.summary[, cov_XY := mean_XY - mean_X * mean_Y]
pearson.correlation.summary[, cor_XY := cov_XY / sqrt(var_X * var_Y)]
datatable(pearson.correlation.summary[, .(k, var_X, var_Y, cov_XY, cor_XY, number_of_obs)])
The test described by Venter is based on the statistic \[
T = r \sqrt{(n-2) / (1-r^2)}
\] where \(r\) is the sample correlation coefficient, and \(n\) is the number of rows in the shorter column. Calculating this value in the above table:
pearson.correlation.summary = pearson.correlation.summary[number_of_obs > 1]
pearson.correlation.summary[, T := cor_XY * sqrt((number_of_obs - 2)/(1 - cor_XY^2))]
datatable(pearson.correlation.summary[,.(k, cor_XY, number_of_obs, T)])
The test for correlation is based on considering \(T\) to have the \(t\)-distribution with \(n-2\) degrees of freedom. In the following example, we identify columns as having significant correlation if the observed \(T\) statistic at 10% significance:
pearson.correlation.summary = pearson.correlation.summary[number_of_obs > 2]
pearson.correlation.summary[, p_value := 1 - pt(abs(T), number_of_obs - 2)]
pearson.correlation.summary[, significant_10 := ifelse(p_value < 0.1, 1, 0)]
datatable(pearson.correlation.summary[, .(j, number_of_obs, T, p_value, significant_10)])
One pair of columns is significant at the 10% level. To extend this test to the whole table, we can consider it to have a binomial distribution with success probability \(0.1\) and number of trials equal to the number of pairs of columns in the table. In this case, the mean and standard deviation of this distribution are:
n = nrow(pearson.correlation.summary)
p = 0.1
binomial.mean = p * n
binomial.sd = sqrt(n * p * (1-p))
As an example, Venter uses a threshold of 3.33 standard deviations, in this case:
threshold = binomial.mean + 3.33 * binomial.sd
print(paste0("If more than ", threshold, " pairs of columns have statistically significant correlations, then the table does not satisfy the independence assumption."))
## [1] "If more than 3.0470402530404 pairs of columns have statistically significant correlations, then the table does not satisfy the independence assumption."
The above implicitly assumes a normal approximation. For such a small number of observations, we can more explicitly calculate the probability of seeing 1 or more significant columns:
binomial.p = 1 - (1-p)^n
print(paste0("The probability that 1 or more pairs of columns are significant is ", binomial.p))
## [1] "The probability that 1 or more pairs of columns are significant is 0.468559"
Given that this probability is large, we would conclude that there is no reason to reject the assumption that the columns in the table are independent. When columns are correlated, Venter suggests using the relationship \[ E[XY] = E[X]E[Y] + \mathrm{Cov}(X,Y) \] to correct the results.
The \(j\)th diagonal is the set of all entries where the sum of the row and column is \(j+1\): \[ D_j = \{C_{j,1}, C_{j-1, 2}, \ldots, C_{1, j}\} \] These effects will affect two sets of adjacent development factors. If the entries of \(D_j\) are are larger than usual, then the factors \[ A_j = \{C_{j,2} / C_{j,1}, C_{j-1, 3} / C_{j-1, 2}, \ldots, C_{1, j+1} / C_{1,j}\} \] will be smaller than usual, since the elements of \(D_j\) appear in the denominator. The factors \[ A_{j-1} = \{C_{j-1, 2 } /C_{j-1, 1}, \ldots, C_{1, j} / C_{1, j-1}\} \] will be larger than usual, since the elements of \(D_j\) appear in the numerator.
This approach is based on subdividing the development factors in each column based on whether they are “large” or “small”, and then checking whether any diagonals have predominantly large or small factors. The first step is to calculate the observed development factors, along with a new column to indicate which diagonal the factor belongs to.
diagonal.test.table = mack.table[, next_dev := k+1]
diagonal.test.table = merge(diagonal.test.table, diagonal.test.table[, .(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
diagonal.test.table[, obs_dev_factor := next_loss / loss]
diagonal.test.table[, diagonal := i + k - 1]
datatable(diagonal.test.table[order(k)])
Factors are considered to be large or small based on whether they are above or below the median for the column. If there are an odd number of entries in the column, then the number equal to the median is eliminated from the analysis.
diagonal.test.table = as.data.table(diagonal.test.table %>% group_by(k) %>% mutate(median_factor = median(obs_dev_factor)))
diagonal.test.table[, large_or_small := ifelse(obs_dev_factor > median_factor, "L",
ifelse(obs_dev_factor < median_factor, "S", "N"))]
datatable(diagonal.test.table[order(k)])
Under this construction, each factor that is not eliminated has a 50% probability of being “large” and 50% probability of being “small”. Now, for each diagonal other than the first, we count the number of “large” and “small” elements on the diagonal, denoted by \(L_j\) and \(S_j\), respectively. (More generally, any diagonal that contains only one non-excluded factor should be removed, since the value of the test statistic is not random – it is always 0.)
diagonal.test.table[, large_indicator := ifelse(large_or_small == "L", 1, 0)]
diagonal.test.table[, small_indicator := ifelse(large_or_small == "S", 1, 0)]
diagonal.test.summary = diagonal.test.table[diagonal != 1, .(large_count = sum(large_indicator), small_count = sum(small_indicator), factor_count = sum(large_indicator) + sum(small_indicator)), by = diagonal]
datatable(diagonal.test.summary[order(diagonal)])
Under the hypothesis that there are no calendar year effects, then both \(L_j\) and \(S_j\) should be close to \((L_j + S_j)/2\), aside from random fluctuations. This can be tested by calculating \[
Z_j = \min(L_j, S_j)
\] and testing whether it is significantly smaller than \((L_j + S_j)/2\).
diagonal.test.summary[, Z_j := pmin(large_count, small_count)]
datatable(diagonal.test.summary)
The expected value and variance of \(Z_j\) depends on the total number of non-excluded factors on the diagonal, denoted by \(n\). Since \(S_j + L_j = n\), for small values of \(n\) it is easy to calculate the mean and variance from first principles. This example does it for \(n=5\):
Z_5.analysis = data.table("L" = 0:5)
Z_5.analysis[, S := 5 - L]
Z_5.analysis[, Z := pmin(S, L)]
Z_5.analysis[, Z_squared := Z^2]
Z_5.analysis[, probability := dbinom(x = L, size = 5, prob = 0.5)]
datatable(Z_5.analysis)
E_Z = Z_5.analysis[, sum(Z * probability)]
E_Z_squared = Z_5.analysis[, sum(Z_squared * probability)]
var_Z = E_Z_squared - E_Z^2
print(paste0("The expected value of Z_5 is ", E_Z))
## [1] "The expected value of Z_5 is 1.5625"
print(paste0("The variance of Z_5 is ", var_Z))
## [1] "The variance of Z_5 is 0.371093750000001"
There are also formulas for calculating these moments: \[ E(Z_j) = \frac{n}{2} - \binom{n-1}{m}\frac{n}{2^n} \] and \[ \mathrm{Var}(Z_j) = \frac{n(n-1)}{4} - \binom{n-1}{m} \frac{n(n-1)}{2^n} + E(Z_j) - E(Z_j)^2 \] where \(m = \lfloor \frac{n-1}{2}\rfloor\). In this example, we obtain the following values:
diagonal.test.summary[, m := floor((factor_count - 1) / 2)]
diagonal.test.summary[, E_Z := factor_count / 2 - choose(factor_count - 1, m) * factor_count / 2^factor_count]
diagonal.test.summary[, var_Z := factor_count * (factor_count - 1) / 4 - choose(factor_count - 1, m) * factor_count * (factor_count - 1) / 2^factor_count + E_Z - E_Z^2]
datatable(diagonal.test.summary)
The test statistic is \[
Z = \sum_j Z_j
\] This statistic, along with its mean and variance, can be calculated as follows:
Z = diagonal.test.summary[, sum(Z_j)]
E_Z = diagonal.test.summary[, sum(E_Z)]
var_Z = diagonal.test.summary[, sum(var_Z)]
print(paste0("The test statistic Z is ", Z))
## [1] "The test statistic Z is 14"
print(paste0("The expected value of Z is ", E_Z))
## [1] "The expected value of Z is 12.875"
print(paste0("The variance of Z is ", var_Z))
## [1] "The variance of Z is 3.978515625"
Assuming that \(Z\) is normally distributed, we would reject the hypothesis that there are no calendar year effects if the test statistic falls outside a 95% confidence interval:
lower_CI = E_Z - 2 * sqrt(var_Z)
upper_CI = E_Z + 2 * sqrt(var_Z)
print(paste0("The 95% confidence interval is [", lower_CI, ", ", upper_CI, "]"))
## [1] "The 95% confidence interval is [8.88575665069176, 16.8642433493082]"
Since the test statistic lies within this interval, we do not reject the hypothesis that there are no calendar year effects, and the chain ladder method is not rejected on these grounds.
Venter uses the following incremental loss table:
venter.table = fread("./venter_table.csv")
datatable(venter.table)
To set up the regression, we need to define a diagonal variable, and calculate the cumulative values in each development age:
venter.table[, diagonal := as.factor(i + k)]
venter.cumulative.0 = venter.table[k <= 0, .(cumulative_0 = sum(incremental_loss)), by = i][i <= 3]
venter.cumulative.1 = venter.table[k <= 1, .(cumulative_1 = sum(incremental_loss)), by = i][i <= 2]
venter.cumulative.2 = venter.table[k <= 2, .(cumulative_2 = sum(incremental_loss)), by = i][i <= 1]
venter.CY = merge(venter.table, venter.cumulative.0, by = "i", all.x = TRUE)
venter.CY = merge(venter.CY, venter.cumulative.1, by = "i", all.x = TRUE)
venter.CY = merge(venter.CY, venter.cumulative.2, by = "i", all.x = TRUE)
venter.CY = venter.CY[k >= 1]
venter.CY[, cumulative_0 := ifelse(k == 1, cumulative_0, 0)]
venter.CY[, cumulative_1 := ifelse(k == 2, cumulative_1, 0)]
venter.CY[, cumulative_2 := ifelse(k == 3, cumulative_2, 0)]
datatable(venter.CY[order(k)])
We can now fit a linear regression model to this data:
venter.CY.test = lm(incremental_loss ~ cumulative_0 + cumulative_1 + cumulative_2 + diagonal, data = venter.CY)
summary(venter.CY.test)
##
## Call:
## lm(formula = incremental_loss ~ cumulative_0 + cumulative_1 +
## cumulative_2 + diagonal, data = venter.CY)
##
## Residuals:
## ALL 6 residuals are 0: no residual degrees of freedom!
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.500 NA NA NA
## cumulative_0 2.500 NA NA NA
## cumulative_1 1.500 NA NA NA
## cumulative_2 1.438 NA NA NA
## diagonal3 1.000 NA NA NA
## diagonal4 -7.000 NA NA NA
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 5 and 0 DF, p-value: NA
Unfortunately, this model is not very interesting: since the number of parameters is equal to the number of observations, the model is saturated. For a more interesting example, we can apply the technique to the larger Mack table, which will not result in a saturated model. Note that this table contains cumulative values, not incremental ones, so the transformation on this dataset is different:
mack.CY.regression = mack.table[, diagonal := as.factor(i+k)]
mack.CY.regression = merge(mack.CY.regression, mack.table[,.(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
mack.CY.regression[, incremental_loss := next_loss - loss]
for (n in 1:9) {
mack.CY.regression[, new_variable := ifelse(k == n, loss, 0)]
names(mack.CY.regression)[8 + n] = paste0("cumulative_", n)
}
datatable(mack.CY.regression[order(k)])
Fit the model to the data:
mack.CY.model = lm(incremental_loss ~ cumulative_1 + cumulative_2 + cumulative_3 + cumulative_4 + cumulative_5 + cumulative_6 + cumulative_7 + cumulative_8 + cumulative_9 + diagonal, data = mack.CY.regression)
summary(mack.CY.model)
##
## Call:
## lm(formula = incremental_loss ~ cumulative_1 + cumulative_2 +
## cumulative_3 + cumulative_4 + cumulative_5 + cumulative_6 +
## cumulative_7 + cumulative_8 + cumulative_9 + diagonal, data = mack.CY.regression)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2753.6 -1051.7 -263.3 1053.8 3470.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1462.12865 2425.98218 0.603 0.55174
## cumulative_1 0.35811 0.33440 1.071 0.29368
## cumulative_2 0.07311 0.12119 0.603 0.55137
## cumulative_3 -0.06446 0.08063 -0.799 0.43102
## cumulative_4 -0.10135 0.06435 -1.575 0.12691
## cumulative_5 -0.11895 0.05675 -2.096 0.04557 *
## cumulative_6 -0.17099 0.05906 -2.895 0.00742 **
## cumulative_7 -0.19269 0.07020 -2.745 0.01063 *
## cumulative_8 -0.23195 0.08907 -2.604 0.01479 *
## cumulative_9 -0.19942 0.11009 -1.812 0.08120 .
## diagonal3 1625.12139 2599.11715 0.625 0.53705
## diagonal4 791.07533 2309.09165 0.343 0.73456
## diagonal5 2699.54739 2197.40779 1.229 0.22985
## diagonal6 3149.85090 2325.48953 1.354 0.18681
## diagonal7 3326.99484 2290.37207 1.453 0.15786
## diagonal8 3117.65253 2302.72728 1.354 0.18699
## diagonal9 3248.54529 2279.29915 1.425 0.16555
## diagonal10 2431.49739 2240.00494 1.085 0.28730
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1754 on 27 degrees of freedom
## Multiple R-squared: 0.6113, Adjusted R-squared: 0.3665
## F-statistic: 2.498 on 17 and 27 DF, p-value: 0.01631
Note that none of the diagonal terms are statistically significant. Alternate approaches include incorporating diagonal effects (either multiplicative or addititive) into the additive chain ladder approach.
When the diagonal terms are significant, a natural question is how to treat them in forecasting.The general form of an emergence model with a multiplicative diagonal effect \(g\) and accident year effect \(h\) is \[ E[q(w, d+1) | \text{data to } w+d] = f(d)h(w)g(w+d) \] A typical interpretation is that \(g\) represents cumulative claims inflation to that diagonal since the first calendar year. If this is the case, then the model can be simplified by setting \[ g(w+d) = (1 + r)^{w+d} \] so that only the rate of inflation \(r\) needs to be estimated. Further reductions in the number of parameters to be estimated can be made by replacing \(f(d) = (1 + s)^d\) and \(h(w) = h(1+t)^w\) with trends, and then grouping together based on its exponent in order to further reduce the number of parameters: \[ h(1+s)^d(1+r)^{w+d}(1+t)^w = h[(1+s)(1+r)]^d[(1+r)(1+t)]^w \] noting that \((1+s)(1+r) = 1 + s + r + rs\), so \(s+r+rs\) can be treated as a single parameter, and likewise with \((1+r)(1+t)\).
Mack provides an estimator of the proportionality constant \(\alpha_k\) from the chain ladder variance assumption: \[ \alpha_k^2 = \frac{1}{I-k-1} \sum_{1 \leq j \leq I-k} c_{j,k}\left(\frac{c_{j, k+1} / c_{j,k}} - f_k \right)^2 \] This can be calculated using the estimated and observed development factors determined earlier:
mack.alpha.table = mack.residuals[, .(alpha_k_squared = sum(loss * (obs_dev_factor - f_k_1)^2) / (.N - 1)), by = k]
datatable(mack.alpha.table)
It’s not possible to estimate \(\alpha_9^2\) because there is only a single data point at this development age. Mack recommends two approaches. The first is linear extrapolation of the logarithm of the estimate. As a check of the reasonabilty of this approach, plot the logarithm of the estimate against \(k\) to assess whether a linear fit is appropriate:
mack.alpha.table[, log_alpha_k_squared := log(alpha_k_squared)]
ggplot(data = mack.alpha.table, aes(x = k, y = log_alpha_k_squared)) + geom_point() + geom_smooth(method = "lm")
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
Overall, the linear fit looks quite good. The extrapolated value corresponding to this method can be calculated as follows:
alpha.extrapolation = lm(log_alpha_k_squared ~ k, data = mack.alpha.table)
alpha.predictions = predict(alpha.extrapolation, newdata = mack.alpha.table)
print(paste0("The extrapolated value for alpha_9 squared is ", exp(alpha.predictions[9])))
## [1] "The extrapolated value for alpha_9 squared is 0.645486792551869"
A simplified approach involves selecting \(\alpha_9^2\) such that the following two assumptions are satisfied:
Combined, this means that the projection for \(\alpha_{I-1}^2\) is the minimum of \(\alpha_{I-2}^4 / \alpha_{I-3}^2\), \(\alpha_{I-2}^2\), and \(\alpha_{I-3}^2\). In this example:
alpha.8.squared = mack.alpha.table[k == 8, .(alpha_k_squared)]
alpha.7.squared = mack.alpha.table[k == 7, .(alpha_k_squared)]
alpha.9.squared.projection = pmin(alpha.8.squared^2 / alpha.7.squared, alpha.7.squared, alpha.8.squared)
print(paste0("The extrapolated value for alpha_9 squared is ", alpha.9.squared.projection))
## [1] "The extrapolated value for alpha_9 squared is 1.34342546562448"
mack.alpha.table[k == 9, alpha_k_squared := alpha.9.squared.projection]
Once \(\alpha_k^2\) has been estimated, the standard error of the ultimate loss estimate for accident year \(i\) can be calculated as \[
\mathrm{se}(c_{i, I})^2 = c_{i, I}^2 \sum_{I+1-i \leq k \leq I-1} \frac{\alpha_k^2}{f_k^2}\left(\frac{1}{c_{i,k}} + \frac{1}{\sum_{1\leq j \leq I-k} c_{j,k}} \right)
\] Considerations when assessing the variance of the reserve estimate include:
In order to calculate the ultimate claim amount and standard error, bring everything together in one table:
mack.reserve = merge(mack.factors, mack.alpha.table, by = "k")
mack.reserve = mack.reserve[order(-k)]
mack.reserve[, LDF := cumprod(f_k_1)]
mack.final.diagonal = mack.table[i + k == 11, .(k, i, loss)]
mack.reserve = merge(mack.reserve, mack.final.diagonal, by = "k")
mack.reserve[, ultimate := loss * LDF]
mack.reserve[, reserve := ultimate - loss]
datatable(mack.reserve)
Parameters for the lognormal distribution can be determined by matching moments: \[ \exp(\mu_i + \sigma_i^2/2) = R_i \] and \[ \exp(2\mu_i + \sigma_i^2) (\exp(\sigma_i^2)-1) = \mathrm{se}(R_i)^2 \] Solving these equations for the lognormal distribution parameters gives \[ \sigma_i^2 = \log(1 + \mathrm{se}(R_i)^2 / R_i^2) \] and \[ \mu_i = \log(R_i) - \sigma_i^2/2 \] Pecentiles can be calculated by determining the value \(z\) corresponding to the same percentile on the standard normal distribution, then calculating \[ \exp(\mu_i + z \sigma_i) \] Confidence intervals resulting from the lognormal distribution are not symmetric, and business policy may be used to determine the upper and lower percentiles.
Using the example from the previous section, and the given value of \(\mathrm{se}(R) / R = 0.52\), we have:
cv.R = 0.52
R = mack.reserve[, sum(reserve)]
sigma.R = sqrt(log(1 + cv.R^2))
mu.R = log(R) - sigma.R^2 / 2
R.percentile.90 = exp(mu.R + 1.28 * sigma.R)
R.percentile.10 = exp(mu.R - 1.28 * sigma.R)
print(paste0("The 90th percentile of the reserve estimate is ", R.percentile.90))
## [1] "The 90th percentile of the reserve estimate is 86494.4254313207"
print(paste0("The 10th percentile of the reserve estimate is ", R.percentile.10))
## [1] "The 10th percentile of the reserve estimate is 24721.8698299094"
To determine confidence intervals for the individual accident years, one approach is to select a common value of \(z\) across all accident years such that the sum of results across accident years equals the value of the desired percentile for the whole reserve. In Mack’s example, values of \(z = 1.13208\) (87th percentile) and \(z = -0.8211\) (21st percentile) in individual accident years produce results that sum to the 90th and 10th percentiles given above. Thus, this 80% confidence interval for the whole reserve corresponds to 66% confidence intervals on the individual accident years. This is illustrated below, using the standard errors for \(R_i\) calculated by Mack:
z.lower = -0.8211
z.upper = 1.13208
mack.reserve$R_cv = c(1.5, 0.59, 0.49, 0.41, 0.55, 0.53, 0.46, 1.01, 1.34)
mack.reserve[, sigma_R := sqrt(log(1 + R_cv^2))]
mack.reserve[, mu_r := log(reserve) - sigma_R^2 / 2]
mack.reserve[, lower_percentile_R := exp(mu_r + z.lower * sigma_R)]
mack.reserve[, upper_percentile_R := exp(mu_r + z.upper * sigma_R)]
mack.reserve[, lower_percentile_U := lower_percentile_R + loss]
mack.reserve[, upper_percentile_U := upper_percentile_R + loss]
datatable(mack.reserve[, .(i, lower_percentile_R, reserve, upper_percentile_R, lower_percentile_U, ultimate, upper_percentile_U)])
Validate that the reserve percentiles sum up to the corresponding total reserve percentiles:
R.total.lower = mack.reserve[, sum(lower_percentile_R)]
R.total.upper = mack.reserve[, sum(upper_percentile_R)]
print(paste0("The sum of the lower percentiles is ", R.total.lower))
## [1] "The sum of the lower percentiles is 24904.6333877776"
print(paste0("The sum of the upper percentiles is ", R.total.upper))
## [1] "The sum of the upper percentiles is 86234.9057646017"
The term empirical limits refers to the estimates that would be produced if we used the maximum and minimum observed age-to-age factors to develop claims to ultimate. Using the Mack triangle, these value are:
limit.factors = mack.age.to.age[, .(max_factor = max(obs_dev_factor), min_factor = min(obs_dev_factor)), by = "k"][order(-k)]
limit.factors[, max_LDF := cumprod(max_factor)]
limit.factors[, min_LDF := cumprod(min_factor)]
mack.empirical = merge(mack.reserve, limit.factors, by = "k")
mack.empirical[, empirical_limit_upper := loss * max_LDF]
mack.empirical[, empirical_limit_lower := loss * min_LDF]
datatable(mack.empirical[, .(k, LDF, loss, ultimate, min_LDF, max_LDF, empirical_limit_lower, empirical_limit_upper)])