Introduction

This notebook illustrates the concepts in the Exam 7 readings Measuring the Variability of Chain Ladder Reserve Estimates by Thomas Mack and Testing the Assumptions of Age-to-Age Factors by Gary Venter. These concepts are illustrated by reproducing the examples in the papers. The following packages are used in this notebook:

easypackages::packages("data.table", "DT", "ggplot2", "dplyr")
options(scipen=999)

Mack illustrates his ideas using the following data (provided here in tabular format):

mack.table = fread("./mack_table_1993.csv")
datatable(mack.table)

The above format may be easier for performing calculations, but for visualization purposes we can also convert it to a traditional triangle format, with rows representing accident years and columns representing development years.

mack.triangle = data.table("AY" = 1:10)
for(j in 1:10) {
  current.column = mack.table[k == j, .(i, loss)]
  current.column = current.column[order(i)]
  current.column = current.column[, loss]
  length(current.column) = 10
  mack.triangle = cbind(mack.triangle, current.column)
  names(mack.triangle)[j+1] = paste0("DY_", j)
}
mack.triangle

##     AY DY_1  DY_2  DY_3  DY_4  DY_5  DY_6  DY_7  DY_8  DY_9 DY_10
##  1:  1 5012  8269 10907 11805 13539 16181 18009 18608 18662 18834
##  2:  2  106  4285  5396 10666 13782 15599 15496 16169 16704    NA
##  3:  3 3410  8992 13873 16141 18735 22214 22863 23466    NA    NA
##  4:  4 5655 11555 15766 21266 23425 26083 27067    NA    NA    NA
##  5:  5 1092  9565 15836 22169 25955 26180    NA    NA    NA    NA
##  6:  6 1531  6445 11702 12935 15852    NA    NA    NA    NA    NA
##  7:  7  557  4020 10946 12314    NA    NA    NA    NA    NA    NA
##  8:  8 1351  6947 13112    NA    NA    NA    NA    NA    NA    NA
##  9:  9 3133  5395    NA    NA    NA    NA    NA    NA    NA    NA
## 10: 10 2063    NA    NA    NA    NA    NA    NA    NA    NA    NA

Notation used by Mack is as follows:

\(c_{i,k}\) is the accumulated amount of claims for accident year \(i\), up to the end of development year \(k\). These may be either on a paid or incurred basis. Note that, depending on the accident year and development period, these may be either scalars (if known) or random variables (if not yet known).
\(I\) is the most recent accident year / development year in the dataset. As such, the values of \(c_{i,k}\) are known only when \(k+i \leq I+1\)
\(R_i = c_{i,I} - c_{i, I+1-i}\) is the claim reserve for accident year \(i\). (Note that \(c_{i,I}\) is the ultimate claim amount, and \(c_{i, I+1-i}\) is on the most recent diagonal.)
The estimate of ultimate claim amounts is \(c_{i,I} = c_{i, I+1-i} f_{I+1-i} f_{I+1} \cdots f_{I-1}\).

In this notation, the age-to-age factors are defined as \(c_{j,k}\)-weighted averages of the accident year observed development factors: \[ f_{k,1} = \frac{\sum_{1\leq i \leq I-k} c_{i, k+1}}{\sum_{1\leq i \leq I-k} c_{i, k}} \] The unweighted averages of the development factors are denoted by \[ f_{k,2} = \frac{1}{I-k} \sum_{1\leq i \leq I-k} \frac{c_{i, k+1}}{c_{i,k}} \] Averages of development factors weighted by \(c_{i,k}^2\) are denoted by \[ f_{k,0} = \frac{\sum_{1\leq i \leq I-k} c_{i,k} c_{i,k+1}}{\sum_{1\leq i \leq I-k} c_{i,k}^2} \]

These factors can be calculated from the example above as follows:

mack.age.to.age = mack.table[, next_dev := k+1]
mack.age.to.age = merge(mack.age.to.age, mack.age.to.age[, .(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
mack.age.to.age[, obs_dev_factor := next_loss / loss]
mack.factors = mack.age.to.age[, .(f_k_0 = sum(loss * next_loss) / sum(loss^2), f_k_1 = sum(next_loss) / sum(loss), f_k_2 = mean(obs_dev_factor)), by = k]
datatable(mack.factors)

Venter uses the following notation:

\(c(w,d)\) is the cumulative loss from accident year \(w\) as of develpment year \(d\). This is \(c_{w,d}\) in Mack’s notation.
\(c(w,\infty)\) is total loss for accident year \(w\) when the end of the triangle is reached. (It’s not clear whether this means “when the triangle is completed to the oldest possible age”, or “most recent diagonal”. If it’s the former, then this is the ultimate loss; if the latter, then it is \(c_{w, I+1-w}\) in Mack’s notation.)
\(q(w,d) = c(w,d) - c(w,d-1)\) is the incremental loss for accident year \(w\) from age \(d-1\) to \(d\).
\(f(d)\) is the factor applied to \(c(w,d)\) to estimate \(q(w,d+1)\). In other words, Venter focuses on cumulative-to-incremental factors, in contrast to the cumulative-to-cumulative factors used by Mack.
\(F(d)\) is the factor applied to \(c(w,d)\) to estimate \(c(w,\infty)\).
In a Bornheutter-Ferguson approach, we assume there is a parameter \(h(w)\) for each accident year such that \(E[q(w,d+1)|\text{data to } w+d] = f(d)h(w)\).

In this notebook, I’ll use Mack’s notation.

Underlying Assumptions

There exist factors \(f_1, \ldots, f_{I-1}\) such that \[ E(c_{i, k+1} | c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = c_{i,k}f_k \] In particular, the expected value of \(c_{i,k+1}\) depends only on the most recent valuation \(c_{i,k}\), and not the earlier emergence pattern or the emergence in other accident years. Moreover, the same value of \(f_k\) is used for all accident years. Since \(c_{ik}\) is fixed, then this can also be written as \[ E(c_{i, k+1}/c_{i,k} | c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = f_k \] This assumption has several implications:
- The factor \(f_k\) is constant across accident years – there are no accident year effects.
- There is no constant term in the formula for expected losses in the next development period.
- The factor applies to cumulative data, not an estimated parameter (as it does in the Bornhuetter-Ferguson method).
Reported losses in different accident years are independent.
There exist constants \(\alpha_k\) such that \[ \mathrm{Var}(c_{j, k+1} | c_{j,1},\ldots, c_{j,k}) = c_{j,k} \alpha_k^2 \] Note that \(\alpha_k\) is constant across all accident years. Different assumptions here will lead to different methods being optimal from a least-squares standpoint.

More generally, Venter considers the variance to be a function, of development period and the cumulative losses to date: \[ \mathrm{Var}(c_{j, k+1} | c_{j,1},\ldots, c_{j,k}) = a(d, c_{j,k}) \] where \(a\) is a function that does not vary by accident year. Under this assumption, the minimum variance unbiased estimator is the one that minimizes \[ \sum_w [f_k c_{j,k} - (c_{j, k+1} - 1)]^2 g(k)/ a(k, c_{j,k}) \] for some function \(g(k)\), typically chosen to simplify the minimization.

Testing for Significance of Factors

This test checks that the factors cumulative-to-cumulative factors \(f_k\) are statistically signficant:

Calculate the standard error of \(f_k\).
\(|1 - f_k|\) should be at least twice the standard error in order to be regarded as significantly different from 1. (Corresponds to a 95.5% normal confidence interval. A less stringent test, based on a 90% confidence interval, would compare the factor to 1.65 times the standard error.)

If this test fails, one approach is to test an alternate emergence pattern. (See next section.)

Calculating standard errors for the chain ladder method requires an assumption about the variance of the losses. Mack points out that since \(c_{j, k+1} / c_{j,k}\) is an unbiased estimator of \(f_k\), then each of \(f_{k,0}\), \(f_{k,1}\), and \(f_{k,2}\) are also unbiased estimators of \(f_k\); more generally, any weighted average of \(c_{j,k+1}/c_{j,k}\) will be unbiased. Mack explains that the variance of the estimator is minimized when the weights are inversely proportional to \(\mathrm{Var}(c_{j, k+1} / c_{j,k} | c_{j,1}, \ldots, c_{j,k})\). Moreover, these weights can also be interpreted as the weights in a least-squares regression, because if we want to select \(f_k\) in order to minimize the quantity \[ \sum_i w_i (c_{i, k+1} - c_{i,k} f_k)^2 \] then, setting the derivative with respect to \(f_k\) equal to 0, we obtain \[ 0 = \sum_i w_ic_{i,k}(c_{i,k} f_k - c_{i,k+1}) \] Rearranging, we obtain \[ f_k = \frac{\sum_i w_i c_{i,k}^2 \frac{c_{i, k+1}}{c_{i,k}}}{\sum_i w_i c_{i,k}^2} \] Therefore, the least-squares regression fit is the average of \(c_{i,k+1} / c_{i,k}\), weighted by \(w_i c_{i,k}^2\).

This means that:

Suppose we assume that \[ \mathrm{Var}(c_{j, k+1} / c_{j,k} | c_{j,1}, \ldots, c_{j,k}) = \alpha_k^2 / c_{j,k}^2 \] for some constant \(\alpha_k\), which is equivalent to \[ \mathrm{Var}(c_{j, k+1} | c_{j,1}, \ldots, c_{j,k}) = \alpha_k^2 \] In this case, the regression weights are \(w_i = 1\), so we obtain the square-weighted average \(f_{k,0}\).
Suppose we assume that \[ \mathrm{Var}(c_{j, k+1} / c_{j,k} | c_{j,1}, \ldots, c_{j,k}) = \alpha_k^2 / c_{j,k} \] for some constant \(\alpha_k\), which is equivalent to \[ \mathrm{Var}(c_{j, k+1} | c_{j,1}, \ldots, c_{j,k}) = \alpha_k^2 c_{j,k} \] Using weights \(w_i = 1 / c_{i,k}\), the regression formula reduces to \(f_{k,1}\), so the weighted-average chain ladder method is optimal under this variance assumption.
Suppose we assume that \[ \mathrm{Var}(c_{j, k+1} / c_{j,k} | c_{j,1}, \ldots, c_{j,k}) = \alpha_k^2 \] for some constant \(\alpha_k\), which is equivalent to \[ \mathrm{Var}(c_{j, k+1} | c_{j,1}, \ldots, c_{j,k}) = \alpha_k^2 c_{j,k}^2 \] In this case, the regression weights become \(w_i = 1 / c_{i,k}^2\), so the regression formula reduces to the unweighted average \(f_{k,2}\).

A key point to take away from the above is that least squares regression implicitly assumes the variance is constant for all observations, so we need to use weighted regression, with weights inversely proportional to the variance, if we want to assume the variance changes based on the size of the loss, as we do in the chain ladder method.

Standard error estimates can be obtained by performing linear regression. Note that under Mack’s assumptions, there is no intercept, which is indicated by “+ 0” in the model formula. In ordet to correctly calculate \(p\)-values, incremental loss is used as the response variable – the reason is that in this case, the significance test will be against the hypothesis that the factor is zero (default in the LM function), rather than testing against the hypothesis that the factor is 1. Here are the details of the models for each of the three weighting options discussed by Mack, for the first development period.

mack.age.to.age[, weight := 1]
mack.age.to.age[, incremental_loss := next_loss - loss]
f_0.model = lm(incremental_loss ~ loss + 0, data = mack.age.to.age[k == 1], weights = weight)
summary(f_0.model)

## 
## Call:
## lm(formula = incremental_loss ~ loss + 0, data = mack.age.to.age[k == 
##     1], weights = weight)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2843.5  -983.1  2785.0  3951.6  7143.8 
## 
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)  
## loss   1.2172     0.4106   2.964    0.018 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3769 on 8 degrees of freedom
## Multiple R-squared:  0.5234, Adjusted R-squared:  0.4638 
## F-statistic: 8.786 on 1 and 8 DF,  p-value: 0.01803

mack.age.to.age[, weight := 1 / loss]
f_1.model = lm(incremental_loss ~ loss + 0, data = mack.age.to.age[k == 1], weights = weight)
summary(f_1.model)

## 
## Call:
## lm(formula = incremental_loss ~ loss + 0, data = mack.age.to.age[k == 
##     1], weights = weight)
## 
## Weighted Residuals:
##    Min     1Q Median     3Q    Max 
## -95.36 -71.36  47.45  99.60 385.34 
## 
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)
## loss    1.997      1.129   1.768    0.115
## 
## Residual standard error: 166.9 on 8 degrees of freedom
## Multiple R-squared:  0.281,  Adjusted R-squared:  0.1911 
## F-statistic: 3.127 on 1 and 8 DF,  p-value: 0.115

mack.age.to.age[, weight := 1 / loss^2]
f_2.model = lm(incremental_loss ~ loss + 0, data = mack.age.to.age[k == 1], weights = weight)
summary(f_2.model)

## 
## Call:
## lm(formula = incremental_loss ~ loss + 0, data = mack.age.to.age[k == 
##     1], weights = weight)
## 
## Weighted Residuals:
##    Min     1Q Median     3Q    Max 
## -6.551 -6.157 -3.991 -0.983 32.224 
## 
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)
## loss    7.201      4.114    1.75    0.118
## 
## Residual standard error: 12.34 on 8 degrees of freedom
## Multiple R-squared:  0.2769, Adjusted R-squared:  0.1865 
## F-statistic: 3.063 on 1 and 8 DF,  p-value: 0.1182

Notice that both \(f_0\) is statistically significant, but \(f_1\) and \(f_2\) are borderline. This is not surprising, since \(f_2\) corresponds to a higher variance assumption. As a result, if the variance assumption corresponding to \(f_2\) is considered reasonable, then we might question whether Mack’s first assumption is valid in this case.

To quickly compare development periods, use the following function:

cl.significance = function(development.period = 1, weight.power = 1) {
  temporary.copy = copy(mack.age.to.age)
  temporary.copy[, weight := 1 / loss^weight.power]
  temporary.model = lm(incremental_loss ~ loss + 0, data = temporary.copy[k == development.period], weights = weight)
  return(summary(temporary.model)$coefficients)
}

For unweighted regression / square-weighted averages:

constant.variance.summary = lapply(1:8, function(r) cl.significance(development.period = r, weight.power = 0))
constant.variance.summary

## [[1]]
##      Estimate Std. Error  t value   Pr(>|t|)
## loss 1.217176  0.4106412 2.964087 0.01803326
## 
## [[2]]
##       Estimate Std. Error  t value    Pr(>|t|)
## loss 0.5689516  0.1087864 5.229989 0.001212464
## 
## [[3]]
##       Estimate Std. Error  t value   Pr(>|t|)
## loss 0.2608889 0.07063776 3.693336 0.01016841
## 
## [[4]]
##       Estimate Std. Error  t value     Pr(>|t|)
## loss 0.1619717 0.02307658 7.018879 0.0009054525
## 
## [[5]]
##        Estimate Std. Error  t value   Pr(>|t|)
## loss 0.09970741 0.03610088 2.761911 0.05075004
## 
## [[6]]
##        Estimate Std. Error  t value  Pr(>|t|)
## loss 0.04053438 0.01984237 2.042819 0.1336836
## 
## [[7]]
##        Estimate Std. Error t value   Pr(>|t|)
## loss 0.03219615 0.00471755 6.82476 0.02080209
## 
## [[8]]
##        Estimate Std. Error  t value  Pr(>|t|)
## loss 0.01588833 0.01494527 1.063101 0.4805347

For the usual chain ladder estimates:

linear.variance.summary = lapply(1:8, function(r) cl.significance(development.period = r, weight.power = 1))
linear.variance.summary

## [[1]]
##      Estimate Std. Error  t value  Pr(>|t|)
## loss 1.996887    1.12933 1.768206 0.1150002
## 
## [[2]]
##       Estimate Std. Error  t value    Pr(>|t|)
## loss 0.6235228  0.1358361 4.590257 0.002513007
## 
## [[3]]
##       Estimate Std. Error  t value   Pr(>|t|)
## loss 0.2708881 0.09049822 2.993298 0.02421683
## 
## [[4]]
##       Estimate Std. Error  t value   Pr(>|t|)
## loss 0.1716746 0.02538993 6.761525 0.00107481
## 
## [[5]]
##       Estimate Std. Error  t value   Pr(>|t|)
## loss 0.1133849 0.03537668 3.205074 0.03274211
## 
## [[6]]
##        Estimate Std. Error  t value  Pr(>|t|)
## loss 0.04193464 0.02257781 1.857338 0.1602526
## 
## [[7]]
##        Estimate  Std. Error  t value   Pr(>|t|)
## loss 0.03326355 0.004881918 6.813624 0.02086803
## 
## [[8]]
##        Estimate Std. Error t value  Pr(>|t|)
## loss 0.01693648 0.01505585 1.12491 0.4626201

For the unweighted average method:

square.variance.summary = lapply(1:8, function(r) cl.significance(development.period = r, weight.power = 2))
square.variance.summary

## [[1]]
##      Estimate Std. Error  t value  Pr(>|t|)
## loss 7.200535   4.114158 1.750184 0.1181997
## 
## [[2]]
##       Estimate Std. Error  t value    Pr(>|t|)
## loss 0.6958945  0.1676164 4.151708 0.004285933
## 
## [[3]]
##       Estimate Std. Error  t value  Pr(>|t|)
## loss 0.3145103  0.1198492 2.624218 0.0393632
## 
## [[4]]
##       Estimate Std. Error  t value    Pr(>|t|)
## loss 0.1829256 0.02726923 6.708134 0.001114491
## 
## [[5]]
##       Estimate Std. Error  t value   Pr(>|t|)
## loss 0.1269622 0.03338933 3.802479 0.01906297
## 
## [[6]]
##        Estimate Std. Error  t value  Pr(>|t|)
## loss 0.04332764 0.02512291 1.724626 0.1830604
## 
## [[7]]
##       Estimate  Std. Error  t value   Pr(>|t|)
## loss 0.0343554 0.004953969 6.934924 0.02016614
## 
## [[8]]
##        Estimate Std. Error  t value  Pr(>|t|)
## loss 0.01799499 0.01509302 1.192273 0.4443079

Testing against Alternate Emergence Patterns

Metrics for assessing fit

The assumption of proportionality can be tested by evalating whether alternative assumptions provide a better fit. Sum of squared errors (SSE) is a natural way to compare models, with the lower SSE being better. However, a key weakness of this metric is that it does not consider the complexity of the model – it’s always possible to improve SSE by adding more parameters, so it doesn’t provide a method of determining whether additional parameters are warranted. Ways to adjust the SSE to account for number of parameters \(p\), on \(n\) observations, include:

Dividing the SSE by \((n-p)^2\).
Approximating AIC by multiplying SSE by \(\exp(2p/n)\).
Approximating BIC by multiplying SSE by \(n^{p/n}\). This is preferred for larger datasets, since AIC tends to be too permissive of overparameterization on large datasets.

Note that the above approximations are not the AIC or BIC, but they will produce the same relative rankings of models.

Visual Checks

An easy check of the linearity assumpion is to plot \(c_{i, k+1}\) against \(c_{i,k}\) for each development period, along with the slope from the chain ladder method and a line fit to the points:

linearity.check = function(dev_period) {
  slope = mack.age.to.age[k == dev_period, sum(next_loss)] / mack.age.to.age[k == dev_period, sum(loss)]
  return(ggplot(data = mack.age.to.age[k == dev_period], aes(x = loss, y = next_loss)) + geom_point() + geom_abline(intercept = 0, slope = slope, color = "red") + geom_smooth(method = 'lm') + coord_cartesian(ylim = c(0, 25000), xlim = c(0,25000)) + labs(title = paste0("Visual Linearity Check for Development Period", dev_period)))
}

plot(linearity.check(dev_period = 1))

## Warning in plyr::split_indices(scale_id, n): '.Random.seed' is not an
## integer vector but of type 'NULL', so ignored

plot(linearity.check(dev_period = 2))

plot(linearity.check(dev_period = 3))

The departure of the points from the line through the origin for development periods 1 and 2 suggests that a linearity assumption is not appropriate, and that a linear-with-constant approach may provide a better fit. Moreover, given that the earlier development periods have more data points, it is more reasonable to add an extra parameter here than it is in later years. An additional consideraiton is where the value of the loss in the year being projected lies relative to the other points – if it is in an area where the proportionality assumption produces reasonable estimates, then an extra parameter may not be needed.

Linear with constant

Corresponds to the assumption \[ E(c_{i, k+1} | c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = c_{i,k}f_k + g_d \] A constant term often tends to be significant in slowly-reporting lines. The following example shows a linear fit to the first development period, using weights corresponding to the usual Chain Ladder variance assumption:

mack.age.to.age[, weight := 1 / loss]
lin.with.constant = lm(next_loss ~ loss, data = mack.age.to.age[k == 1], weights = weight)
summary(lin.with.constant)

## 
## Call:
## lm(formula = next_loss ~ loss, data = mack.age.to.age[k == 1], 
##     weights = weight)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.883 -30.266   4.825   8.985 118.355 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 4328.519    516.315   8.383 0.0000675 ***
## loss           1.214      0.421   2.883    0.0236 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 53.71 on 7 degrees of freedom
## Multiple R-squared:  0.5428, Adjusted R-squared:  0.4775 
## F-statistic:  8.31 on 1 and 7 DF,  p-value: 0.02356

The intercept is statistically significant, as suggested by the visual check. Metrics for evaluating this model include:

lwc.prediction = predict(lin.with.constant)
observed = mack.age.to.age[k == 1, next_loss]
lwc.SSE = sum((lwc.prediction - observed)^2)
lwc.adj.SSE = lwc.SSE / (9-2)^2
lwc.AIC = lwc.SSE * exp(2 * 2 / 9)
lwc.BIC = lwc.SSE * 9^(2/9)
print(paste0("The SSE is ", lwc.SSE))

## [1] "The SSE is 29804878.0451142"

print(paste0("The adjusted SSE is ", lwc.adj.SSE))

## [1] "The adjusted SSE is 608262.817247228"

print(paste0("The AIC approximation is ", lwc.AIC))

## [1] "The AIC approximation is 46484388.1424645"

print(paste0("The BIC approximation is ", lwc.BIC))

## [1] "The BIC approximation is 48566995.7879631"

Compare this to the classic chain ladder approach:

lin.without.constant = lm(next_loss ~ loss + 0, data = mack.age.to.age[k == 1], weights = weight)
lnc.prediction = predict(lin.without.constant)
lnc.SSE = sum((lnc.prediction - observed)^2)
lnc.adj.SSE = lnc.SSE / (9-1)^2
lnc.AIC = lnc.SSE * exp(2 * 1 / 9)
lnc.BIC = lnc.SSE * 9^(1/9)
print(paste0("The SSE is ", lnc.SSE))

## [1] "The SSE is 164826968.411835"

print(paste0("The adjusted SSE is ", lnc.adj.SSE))

## [1] "The adjusted SSE is 2575421.38143493"

print(paste0("The AIC approximation is ", lnc.AIC))

## [1] "The AIC approximation is 205843973.082097"

print(paste0("The BIC approximation is ", lnc.BIC))

## [1] "The BIC approximation is 210404593.218451"

On all metrics, the linear-with-constant assumpion performs better in the first development period.

The significance of the parameters across all development periods can be calculated as follows: To quickly compare development periods, use the following function:

lwc.significance = function(development.period = 1, weight.power = 1) {
  temporary.copy = copy(mack.age.to.age)
  temporary.copy[, weight := 1 / loss^weight.power]
  temporary.model = lm(incremental_loss ~ loss , data = temporary.copy[k == development.period], weights = weight)
  return(summary(temporary.model)$coefficients)
}

Using the same variance assumption underlying the chain ladder method:

lwc.significance.summary= lapply(1:8, function(r) lwc.significance(development.period = r, weight.power = 1))
lwc.significance.summary

## [[1]]
##                 Estimate  Std. Error  t value      Pr(>|t|)
## (Intercept) 4328.5185518 516.3149822 8.383484 0.00006751437
## loss           0.2137288   0.4210268 0.507637 0.62730766555
## 
## [[2]]
##                  Estimate   Std. Error   t value  Pr(>|t|)
## (Intercept) 4159.69011653 2531.3746645 1.6432534 0.1514361
## loss           0.06961748    0.3584235 0.1942325 0.8524032
## 
## [[3]]
##                  Estimate   Std. Error    t value  Pr(>|t|)
## (Intercept) 4235.91787539 2814.5177702  1.5050244 0.1926563
## loss          -0.08032389    0.2474278 -0.3246357 0.7585990
## 
## [[4]]
##                  Estimate   Std. Error   t value  Pr(>|t|)
## (Intercept) 2188.78926055 1133.1054409 1.9316731 0.1255735
## loss           0.03340911    0.0744334 0.4488457 0.6767858
## 
## [[5]]
##                  Estimate   Std. Error    t value  Pr(>|t|)
## (Intercept) 3562.27352834 2031.4095457  1.7535969 0.1777793
## loss          -0.07324665    0.1102312 -0.6644822 0.5538935
## 
## [[6]]
##                 Estimate   Std. Error    t value  Pr(>|t|)
## (Intercept) 589.27570754 2510.3455766 0.23473888 0.8362549
## loss          0.01249918    0.1283294 0.09739922 0.9312911
## 
## [[7]]
##                  Estimate    Std. Error   t value  Pr(>|t|)
## (Intercept) 792.282542040 148.928492579  5.319886 0.1182876
## loss         -0.008903059   0.008028189 -1.108975 0.4671339
## 
## [[8]]
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3723.720377        NaN     NaN      NaN
## loss          -0.197212        NaN     NaN      NaN

These results differ from Table 2 in Venter’s paper; it appears as though Venter is performing unweighted regression, which corresponds to an assumption of constant loss variance in each accident year:

lwc.significance.summary= lapply(1:8, function(r) lwc.significance(development.period = r, weight.power = 0))
lwc.significance.summary

## [[1]]
##                 Estimate   Std. Error    t value   Pr(>|t|)
## (Intercept) 5110.4943268 1067.0808198  4.7892289 0.00199091
## loss          -0.1084107    0.3488151 -0.3107971 0.76500523
## 
## [[2]]
##                  Estimate  Std. Error   t value  Pr(>|t|)
## (Intercept) 4311.47097581 2440.121266 1.7669085 0.1276690
## loss           0.04940631    0.309099 0.1598397 0.8782525
## 
## [[3]]
##                 Estimate   Std. Error   t value  Pr(>|t|)
## (Intercept) 1687.1787584 3543.1408684 0.4761817 0.6540251
## loss           0.1309993    0.2830779 0.4627677 0.6629728
## 
## [[4]]
##                 Estimate    Std. Error   t value  Pr(>|t|)
## (Intercept) 2061.0688312 1164.74158341 1.7695503 0.1515190
## loss           0.0414772    0.07078186 0.5859863 0.5893551
## 
## [[5]]
##                  Estimate  Std. Error    t value  Pr(>|t|)
## (Intercept) 4064.45972680 2241.921036  1.8129362 0.1674982
## loss          -0.09955676    0.113622 -0.8762103 0.4454157
## 
## [[6]]
##                 Estimate   Std. Error    t value  Pr(>|t|)
## (Intercept) 620.43279829 2300.8713491 0.26965123 0.8127021
## loss          0.01094283    0.1123065 0.09743715 0.9312645
## 
## [[7]]
##                  Estimate    Std. Error   t value  Pr(>|t|)
## (Intercept) 777.333754530 144.682631190  5.372682 0.1171514
## loss         -0.008107459   0.007600268 -1.066733 0.4794512
## 
## [[8]]
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3723.720377        NaN     NaN      NaN
## loss          -0.197212        NaN     NaN      NaN

In either case, the factors are generally not significant, though in some cases the intercepts are. This suggests that the assumptions underlying the chain ladder method are not supported by the data, and a pattern of constant emergence in each age appears to be more suitable.

Factor times parameter

This is a Bornhuetter-Ferguson style approach, in which we disregard emergence to date, and have one parameter (e.g. ultimate expected losses) for each accident year. \[ E(c_{i, k+1} - c_{i,k}| c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = h_i f_k \] For a triangle with \(N\) accident years, this model will have \(2N - 2\) parameters, twice the amount of the \(N-1\) parameters for the chain ladder method.

The Cape Cod method is a special case of this approach, in which there is a single \(h\) for all accident years, representing an expected loss ratio. (This has the same number of parameters as the chain ladder method, since a change in \(h\) can be offset by an inverse change in all the \(f_k\)s.)
Between these two extremes, a Bornhuetter-Ferguson model can also be fit by setting some, but not all, of the accident year parameters equal to each other. This would be particularly valuable in the later years, where data is sparse.
A similar approcah could be taken to reduce the parameters of the development year parameters, e.g. by grouping years that have similar percentages of emergence.
Linear extrapolation between accident years or development periods could also be used to reduce the number of parameters.

To fit this model, Venter recommends using an iterative process to minimize \[ \sum_{i,k} ((c_{i,k+1} - c_k) - h_i f_k)^2 \] The approach is to fix some initial set of values for \(f_k\), such as the values coming from the chain ladder method. Then, select the best \(h_i\) for these values: \[ h_i = \frac{\sum_i f_i (c_{i,k+1} - c_{i,k})}{\sum_i f_i^2} \] Then, keeing \(h_i\) fixed, optimize \(f_k\): \[ f_k = \frac{\sum_k h_k (c_{i,k+1} - c_{i, k})}{\sum_k h_k^2} \] Iterate until convergence is reached. A simpler approach to fit a multiplicative model would be to use a log-link GLM, though a drawback of this approach is that it does not work with a negative incremental loss (and there is one in this model), so I fit the model to the data without this point. Venter’s model, in which there is a different parameter for each accident year and development period corresponds to treating accident year and development period as categorical variables. I have also slightly modified the data prep to include the most recent diagonal as well, which can be used for this approach.

venter.regression = mack.table[, prev_dev := k-1]
venter.regression = merge(venter.regression, venter.regression[, .(i, k, prev_loss = loss)], by.x = c("i", "prev_dev"), by.y = c("i", "k"), all.x = TRUE)
venter.regression[is.na(prev_loss), prev_loss := 0]
venter.regression[, incremental_loss := loss - prev_loss]
venter.regression[, i_factor := as.factor(i)]
venter.regression[, k_factor := as.factor(k)]
bf.model = glm(incremental_loss ~ i_factor + k_factor, data = venter.regression[incremental_loss > 0], family = gaussian(link = "log"))
summary(bf.model)

## 
## Call:
## glm(formula = incremental_loss ~ i_factor + k_factor, family = gaussian(link = "log"), 
##     data = venter.regression[incremental_loss > 0])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2389.4   -770.7     -9.0    916.9   3311.5  
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  7.43868    0.34199  21.751 < 0.0000000000000002 ***
## i_factor2    0.04172    0.35845   0.116              0.90800    
## i_factor3    0.38377    0.31202   1.230              0.22692    
## i_factor4    0.53040    0.29961   1.770              0.08539 .  
## i_factor5    0.67987    0.29012   2.343              0.02491 *  
## i_factor6    0.22597    0.33508   0.674              0.50450    
## i_factor7    0.17310    0.35088   0.493              0.62485    
## i_factor8    0.45242    0.32809   1.379              0.17665    
## i_factor9   -0.19098    0.55464  -0.344              0.73265    
## i_factor10   0.19324    0.86107   0.224              0.82374    
## k_factor2    0.77729    0.25466   3.052              0.00432 ** 
## k_factor3    0.67434    0.26154   2.578              0.01429 *  
## k_factor4    0.37717    0.29293   1.288              0.20634    
## k_factor5    0.09639    0.34023   0.283              0.77861    
## k_factor6   -0.24978    0.44018  -0.567              0.57403    
## k_factor7   -0.83626    0.93348  -0.896              0.37645    
## k_factor8   -1.19388    1.55748  -0.767              0.44849    
## k_factor9   -1.75805    3.86074  -0.455              0.65166    
## k_factor10  -2.29119    9.48449  -0.242              0.81052    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2657788)
## 
##     Null deviance: 239165641  on 53  degrees of freedom
## Residual deviance:  93018000  on 35  degrees of freedom
## AIC: 968.65
## 
## Number of Fisher Scoring iterations: 9

To determine the percentage reported in each year, exponentiate the development period parameters and normalize so that they are equal to 1. Note that the factor for the first development period is the base level, so its factor is implicitly equal to 1.

dev.parameters = bf.model$coefficients[11:19]
dev.factors = c(1, exp(dev.parameters))
dev.normalization = sum(dev.factors)
dev.factors = dev.factors / dev.normalization
dev.factors

##             k_factor2  k_factor3  k_factor4  k_factor5  k_factor6 
## 0.10541262 0.22933229 0.20689807 0.15370840 0.11607924 0.08211337 
##  k_factor7  k_factor8  k_factor9 k_factor10 
## 0.04567810 0.03194463 0.01817117 0.01066211

In a similar manner, we can determine the accident year parameters, grouping the intercept with these parameters. The normalization from the previous step is re-incorporated so that the normalization performed above does not change the overall prediction.

ay.parameters = c(0, bf.model$coefficients[2:10])
intercept = bf.model$coefficients[1]
ay.parameters = ay.parameters + intercept
ay.factors = exp(ay.parameters) * dev.normalization
ay.factors

##             i_factor2  i_factor3  i_factor4  i_factor5  i_factor6 
##   16131.89   16819.18   23678.63   27418.03   31838.35   20221.86 
##  i_factor7  i_factor8  i_factor9 i_factor10 
##   19180.65   25361.24   13327.32   19570.71

The numbers don’t match Venter’s exactly, due to the removal of the negative incremental loss and the differences in fitting methodology, but the overall trend in the numbers is similar. Venter points out that alternative normalizations may be performed; for example, rather than normalizing so that the development period factors sum to 1, we could normalize so that the accident year parameter for the oldest accident year is equal to the loss for that year on the most recent diagonal. (Note the unintuitive estimate of an ultimate loss of 16K on an accident year that already has an incurred loss above 18K.)

To compare these results to the chain ladder approach, we should delete the predictions in the first development period from the calculation, since the chain ladder method does not provide predictions for these values.

venter.regression[incremental_loss > 0 , prediction := predict(bf.model, type = "response")]
venter.SSE = venter.regression[k != 1 & incremental_loss > 0, sum((prediction - incremental_loss)^2)]
venter.adjusted.SSE = venter.SSE / (44 - 18)^2
venter.AIC = venter.SSE * exp(2 * 18 / 44)
venter.BIC = venter.SSE * 44 ^ ( 18 / 44)
print(paste0("The adjusted SSE is ", venter.adjusted.SSE))

## [1] "The adjusted SSE is 86428.3749247407"

print(paste0("The AIC approximation is ", venter.AIC))

## [1] "The AIC approximation is 132414300.914265"

print(paste0("The BIC approximation is ", venter.BIC))

## [1] "The BIC approximation is 274741707.25359"

In this framework, the Cape Cod method, in which there is a single parameter for all accident years, corresponds removing accident year from the model and allowing the intercept to play the role of the single “h” parameter:

cape.cod.model = glm(incremental_loss ~ k_factor, data = venter.regression[incremental_loss > 0], family = gaussian(link = "log"))
summary(cape.cod.model)

## 
## Call:
## glm(formula = incremental_loss ~ k_factor, family = gaussian(link = "log"), 
##     data = venter.regression[incremental_loss > 0])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3571.5   -995.3    -11.0    747.0   3625.7  
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  7.77947    0.22545  34.507 < 0.0000000000000002 ***
## k_factor2    0.70672    0.25410   2.781              0.00794 ** 
## k_factor3    0.67212    0.25960   2.589              0.01300 *  
## k_factor4    0.31220    0.29953   1.042              0.30295    
## k_factor5    0.12806    0.34118   0.375              0.70920    
## k_factor6   -0.09966    0.41822  -0.238              0.81276    
## k_factor7   -0.72877    0.88238  -0.826              0.41331    
## k_factor8   -1.34172    1.59075  -0.843              0.40354    
## k_factor9   -2.09419    4.09900  -0.511              0.61197    
## k_factor10  -2.63197    9.91337  -0.265              0.79187    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2905864)
## 
##     Null deviance: 239165641  on 53  degrees of freedom
## Residual deviance: 127855554  on 44  degrees of freedom
## AIC: 967.83
## 
## Number of Fisher Scoring iterations: 5

As before, we can normalize the development period factors:

cc.dev.parameters = cape.cod.model$coefficients[2:10]
cc.dev.factors = c(1, exp(cc.dev.parameters))
cc.dev.normalization = sum(cc.dev.factors)
cc.dev.factors = cc.dev.factors / cc.dev.normalization
cc.dev.factors

##               k_factor2   k_factor3   k_factor4   k_factor5   k_factor6 
## 0.107147613 0.217223001 0.209836343 0.146410104 0.121786489 0.096984050 
##   k_factor7   k_factor8   k_factor9  k_factor10 
## 0.051699134 0.028008054 0.013197395 0.007707817

The single parameter \(h\) is:

h = exp(cape.cod.model$coefficients[1]) * cc.dev.normalization
h

## (Intercept) 
##    22315.01

Evaluating this model as before:

venter.regression[incremental_loss > 0 , cc.prediction := predict(cape.cod.model, type = "response")]
cc.SSE = venter.regression[k != 1 & incremental_loss > 0, sum((cc.prediction - incremental_loss)^2)]
cc.adjusted.SSE = cc.SSE / (44 - 10)^2
cc.AIC = cc.SSE * exp(2 * 10 / 44)
cc.BIC = cc.SSE * 44 ^ ( 10 / 44)
print(paste0("The adjusted SSE is ", cc.adjusted.SSE))

## [1] "The adjusted SSE is 83514.122973307"

print(paste0("The AIC approximation is ", cc.AIC))

## [1] "The AIC approximation is 152098293.522096"

print(paste0("The BIC approximation is ", cc.BIC))

## [1] "The BIC approximation is 228154863.400844"

The adjusted SSE and BIC indicate that the Cape Cod approach is superior to the BF method, but the AIC does not.

Intermediate approaches would generally involve simplifying the “i_factor” and “k_factor” variables prior to fitting, such as by grouping levels with similar parameter estimates, or by fitting them continuously rather than discretely. To illustrate this idea, Venter proposes the following grouping:

Development period: 2 and 3; 7 and 8; 9 and 10. Removes 3 parameters.
Accident year: 1 and 2 are the same; 6 through 10 are the same; 3 and 5 have their own parameters, with 4 being the average of the two (i.e. linear extrapolation). Removes 6 parameters.

In the log-link GLM framework, it’s tricky to enforce Venter’s condition on Accident Year 3 on a predicted value scale, but it is easy to enforce it on a linear predictor scale, which should still be a reasonable approximation for factors close to 0.

venter.regression[, grouped_k_factor := as.factor(ifelse(k == 2 | k == 3, "2-3",
                                                  ifelse(k == 7 | k == 8, "7-8", 
                                                  ifelse(k == 9 | k == 10, "9-10", as.character(k)))))]
venter.regression[, i_continuous_capped := ifelse(i < 3, 2,
                                                  ifelse(i > 5, 6, i))] # Linear extrapolation between levels 3, 4, 5
venter.regression[, grouped_i_factor := as.factor(ifelse(i < 3, "<3", 
                                                         ifelse(i > 5, ">5", "3-5")))]
revised.bf.model = glm(incremental_loss ~ grouped_k_factor + i_continuous_capped + grouped_i_factor, data = venter.regression[incremental_loss > 0], family = gaussian(link = "log"))
summary(revised.bf.model)

## 
## Call:
## glm(formula = incremental_loss ~ grouped_k_factor + i_continuous_capped + 
##     grouped_i_factor, family = gaussian(link = "log"), data = venter.regression[incremental_loss > 
##     0])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2453.3   -905.8    -80.2    919.3   3257.4  
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           7.17985    0.32599  22.024 < 0.0000000000000002 ***
## grouped_k_factor2-3   0.70870    0.21280   3.330              0.00176 ** 
## grouped_k_factor4     0.34512    0.25883   1.333              0.18926    
## grouped_k_factor5     0.07793    0.30414   0.256              0.79897    
## grouped_k_factor6    -0.26326    0.39900  -0.660              0.51282    
## grouped_k_factor7-8  -0.98515    0.75602  -1.303              0.19933    
## grouped_k_factor9-10 -1.93400    3.43072  -0.564              0.57580    
## i_continuous_capped   0.14508    0.10054   1.443              0.15610    
## grouped_i_factor>5   -0.35623    0.44901  -0.793              0.43183    
## grouped_i_factor3-5   0.22708    0.28684   0.792              0.43279    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2259412)
## 
##     Null deviance: 239165641  on 53  degrees of freedom
## Residual deviance:  99412810  on 44  degrees of freedom
## AIC: 954.24
## 
## Number of Fisher Scoring iterations: 8

Performance of this model is as follows:

venter.regression[incremental_loss > 0 , rev_prediction := predict(revised.bf.model, type = "response")]
rev.bf.SSE = venter.regression[k != 1 & incremental_loss > 0, sum((rev_prediction - incremental_loss)^2)]
rev.bf.adjusted.SSE = rev.bf.SSE / (44 - 9)^2
rev.bf.AIC = rev.bf.SSE * exp(2 * 9 / 44)
rev.bf.BIC = rev.bf.SSE * 44 ^ ( 9 / 44)
print(paste0("The adjusted SSE is ", rev.bf.adjusted.SSE))

## [1] "The adjusted SSE is 55435.7383395955"

print(paste0("The AIC approximation is ", rev.bf.AIC))

## [1] "The AIC approximation is 102233175.156621"

print(paste0("The BIC approximation is ", rev.bf.BIC))

## [1] "The BIC approximation is 147260645.436505"

Based on these statistics, this is the best-performing model so far.

In the additive chain ladder approach, calculate the average loss emerged in each development period:

additive.cl = venter.regression[, .(additive_cl_prediction = mean(incremental_loss)), by = k]
datatable(additive.cl)

The additive CL model can be assessed as follows. Note that even though it’s not necessary to remove the record where incremental_loss is negative, doing so allows for fairer comparisons with earlier results.

additive.cl.assessment = merge(venter.regression, additive.cl, by = "k")
add.cl.SSE = additive.cl.assessment[k != 1 & incremental_loss > 0, sum((additive_cl_prediction - incremental_loss)^2)]
add.cl.adjusted.SSE = add.cl.SSE / (44 - 9)^2
add.cl.AIC = add.cl.SSE * exp(2 * 9 / 44)
add.cl.BIC = add.cl.SSE * 44 ^ ( 9 / 44)
print(paste0("The adjusted SSE is ", add.cl.adjusted.SSE))

## [1] "The adjusted SSE is 79051.7781554908"

print(paste0("The AIC approximation is ", add.cl.AIC))

## [1] "The AIC approximation is 145785273.628082"

print(paste0("The BIC approximation is ", add.cl.BIC))

## [1] "The BIC approximation is 209994783.559439"

The performance is similar to (and better than) the Cape Cod approach.

Factor times parameter with calendar year effect

This is similar to the above approach, except that we add a factor for each calendar year: \[ E(c_{i, k+1} | c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = h_w f_k g_{w+d} \]

Testing for Patterns in Residuals

Residuals against previous loss (linearity test)

Mack plots the scaled residuals for the predictions corresponding each of the three weightings against \(c_{j,k}\).

If the model assumption that a linear fit is correct, these should be random and show no pattern.
If there were a pattern, it could be indicative of a non-linear process. For example, trying to fit a line to a curve will produce a positive-negative-positive pattern in the residuals.
This is essentially equivalent to looking for a linear pattern in plots of incremental against previous cumulative.
The test can be performed individually for each development age.

These residuals are obtained by dividing the differenc between the observed and predicted by a value proportional to the assumed standard deviation:

For the \(c_{i,k}^2\)-weighted factors, calculate \(c_{i,k+1} - c_{i,k} f_{k,0}\)
For the \(c_{i,k}\)-weighted factors, calculate \((c_{i,k+1} - c_{i,k}f_{k,1})/\sqrt{c_{i,k}}\)
For the unweighted factors, calculate \((c_{i,k+1} - c_{i,k} f_{k,2}) / c_{i,k}\)

mack.residuals = merge(mack.age.to.age, mack.factors, by = "k")
mack.residuals[, residual_0 := next_loss - loss * f_k_0]
mack.residuals[, residual_1 := (next_loss - loss * f_k_1) / sqrt(loss)]
mack.residuals[, residual_2 := (next_loss - loss * f_k_2) / loss]
datatable(mack.residuals)

We can plot the residuals in each development period, for each residual type:

plot.mack.residuals = function(dev_period, residual_number = 1) {
  residual_name = paste0("residual_", residual_number)
  return(ggplot(data = mack.residuals[k == dev_period], aes(x = loss, y = eval(parse(text = residual_name)))) + geom_point() + labs(title = paste0("Residuals for f_", dev_period, "_", residual_number)))
}

For the first development period:

plot(plot.mack.residuals(dev_period = 1, residual_number = 0))

plot(plot.mack.residuals(dev_period = 1, residual_number = 1))

plot(plot.mack.residuals(dev_period = 1, residual_number = 2))

In all cases, we see a decreasing trend in the residuals, suggesting that the model fit is not appropriate. For the second development period:

plot(plot.mack.residuals(dev_period = 2, residual_number = 0))

plot(plot.mack.residuals(dev_period = 2, residual_number = 1))

plot(plot.mack.residuals(dev_period = 2, residual_number = 2))

In the second development period, there is now little evidence of a pattern in the residuals.

Residuals over time (Stability test)

Looking for sequences of high and low residuals, when plotted against accident year, indicates instability of the factors.

If factors are stable, then all years can be used to produce estimates.
If factors are stable, a possible approach is to use a weighted average of factors, with more weight given to the recent years. This should only be done with good reason, since it can increase estimation errors.
An alternative approach is to apply the Berquist-Sherman adjustment, but in this case the adjusted triangle should still be tested for stability.

Mack does not provide plots of residuals over time, but they can be produced using the same data as above:

plot.mack.residuals.over.time = function(dev_period, residual_number = 1) {
  residual_name = paste0("residual_", residual_number)
  return(ggplot(data = mack.residuals[k == dev_period], aes(x = i, y = eval(parse(text = residual_name)))) + geom_point() + labs(title = paste0("Residuals for f_", dev_period, "_", residual_number)))
}

For the first development period:

plot.mack.residuals.over.time(dev_period = 1, residual_number = 0)

plot.mack.residuals.over.time(dev_period = 1, residual_number = 1)

plot.mack.residuals.over.time(dev_period = 1, residual_number = 2)

Compare these plots to the residuals for the linear-with-constant model fit in the previous section:

first.period = mack.residuals[k == 1]
first.period[, lwc_residuals := resid(lin.with.constant)]
ggplot(data = first.period, aes(x = i, y = lwc_residuals)) + geom_point() + labs(title = "Residuals for the Linear With Constant model, first development period")

The residuals for the linear-with-constant model are a bit more random, and don’t show the increasing trend exhibited by the chain ladder method.

For the second development period:

plot.mack.residuals.over.time(dev_period = 2, residual_number = 0)

plot.mack.residuals.over.time(dev_period = 2, residual_number = 1)

plot.mack.residuals.over.time(dev_period = 2, residual_number = 2)

The residuals tend to be higher in the more recent accident years, suggesting limiting estimation to the most recent diagonals.

A non-residual approach to testing stability is to compare plots of factors by year to a \(n\)-year moving average. (i.e. average of the most recent \(n\) diagonals)

Major shifts in the moving average suggest using averages of recent diagonals
Variability of the moving average is the concern, not the variability of individual factors around the moving average.
Large fluctuations in individual factors suggest using more data, not less.

A three-year moving average for the first development period in the Mack triangle can be calculated as follows:

mack.ma = mack.age.to.age[k == 1]
mack.ma[, factor_shift_1 := shift(obs_dev_factor, type = "lag")]
mack.ma[, factor_shift_2 := shift(factor_shift_1, type = "lag")]
mack.ma[, moving_average := (obs_dev_factor + factor_shift_1 + factor_shift_2) / 3]
ggplot(data = mack.ma, aes(x = i, y = moving_average)) + geom_point(color = "red") + geom_line(color = "red") + geom_point(aes(y = obs_dev_factor), color = "blue") + geom_line(aes(y = obs_dev_factor), color = "blue")

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_path).

This suggests that using only the most recent data may be indicated; however, the high values of the moving average early on are driven by an unusually large spike in the second accident year, so this may be the only problematic data point.

Testing for Column Correlations

Spearman Correlation Approach

One implication of the assumption that \[ E(c_{i, k+1} | c_{i,1}, c_{i,2}, \ldots, c_{i,k}) = c_{ik}f_k \] is that adjacent observed development factors \(c_{ik} / c_{i, k-1}\) and \(c_{i,k+1} / c_k\) are independent. In order to demonstrate this, it suffices to show that \[ E\left(\frac{c_{i,k}}{c_{i, k-1}} \frac{c_{i,k+1}}{c_k}\right) = E\left(\frac{c_{i,k}}{c_{i, k-1}}\right) E\left(\frac{c_{i,k+1}}{c_k}\right) \] This can be demonstrated using the law of total expectation, for \(j\leq k\): \[ E\left(\frac{c_{i, k+1}}{c_{i,j}}\right) = E[E[c_{i,k+1} | c_{i,1}, \ldots, c_{i,k}] / c_{i,j}] = E[c_{i,k} / c_{i,j}] f_k \] Specializing to \(j = k-1\) (for the left side), and \(j=k\) (for the right side), we obtain \[ E\left(\frac{c_{i,k}}{c_{i, k-1}} \frac{c_{i,k+1}}{c_k}\right) = E\left(\frac{c_{i, k+1}}{c_{i,k-1}}\right) = E\left(\frac{c_{i,k}}{c_{i, k-1}}\right)f_k = E\left(\frac{c_{i,k}}{c_{i, k-1}}\right) E\left(\frac{c_{i,k+1}}{c_k}\right) \]

Mack proposes a test for correlation based on Spearman’s rank correlation coefficient, which has the benefit of not requiring a distributional assumption. The test is applied to adjacent columns, with the test statistic summed over the whole table. The first step is to calculate the observed development factors \(c_{i,k+1} / c_{i,k}\):

rank.correlation.table = mack.table[, next_dev := k+1]
rank.correlation.table = merge(rank.correlation.table, rank.correlation.table[, .(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
rank.correlation.table[, obs_dev_factor := next_loss / loss]
datatable(rank.correlation.table[order(k)])

Next, within each development period, calculate the rank of each observed development factor. This is denoted by \(r_{i,k}\).

rank.correlation.table = rank.correlation.table %>% group_by(k) %>% mutate(r = order(order(obs_dev_factor)))

In order to compare adjacent columns, due to the triangular nature of the table, we will also need to determine the rank with the most recent diagonal deleted, in order to ensure we are comparing columns of the same length. This rank is denoted by \(s_{i,k}\).

shortened.table = rank.correlation.table %>% group_by(k) %>% mutate(diagonal_ay = max(i))
shortened.table = shortened.table %>% filter(i != diagonal_ay)
shortened.table = shortened.table %>% group_by(k) %>% mutate(s = order(order(obs_dev_factor)))
shortened.table$j = shortened.table$k + 1
rank.correlation.table = merge(rank.correlation.table, shortened.table %>% select(i, k, s), by = c("i", "k"))
datatable(as.data.table(rank.correlation.table)[order(k)])

The formula for Spearman’s rank correlation coefficient on column \(k\) is \[ T_k = 1 - 6\sum_{1\leq i \leq n_k} \frac{(r_{i,k} - s_{i,k})^2}{n_k^3 - n_k} \] where \(n_k = I-k\) is the number of pairs of ranks in column \(k\). We can calculate this as follows:

rank.correlation.table = as.data.table(rank.correlation.table)
rank.correlation.summary = rank.correlation.table[, .(T = 1 - 6 * sum((r - s)^2) / (.N^3 - .N), number_of_pairs = .N), by = k]

Note that the final column does not provide a result due to the lack of a subsequent column, so we will delete this result:

rank.correlation.summary = rank.correlation.summary[!is.na(T)]

Under the assumption that the columns are not correlated, the correlation coefficients for the individual columns have the properties \[ E[T_k] = 0 \] and \[ \mathrm{Var}(T_k) = \frac{1}{n_k -1} \] Therefore, if \(T_k\) is far from 0, then we would conclude that adjacent columns are correlated.

Mack recommends extending the test to the entire triagle, in a single test statistic, for the following reasons:

Avoid an accumulation of error probabilities.
It tells us whether the correlations prevail globally, rather than on a small part of the triangle.

To extend this test to the whole triangle, we can take a weighted average of \(T_k\), using \(n_k-1\) as the weight. (Note that this is the inverse of \(\mathrm{Var}(T_k)\).) Define \[ T = \frac{\sum (n_k-1) T_k}{\sum (n_k-1)} \] It’s straitforward to validate that \(E[T] = 0\) and that \[ \mathrm{Var}(T) = \left(\sum (n_k-1)\right)^{-1} = \binom{I-2}{2}^{-1} \] In our example, this test statistic is:

T.statistic = rank.correlation.summary[, sum((number_of_pairs - 1) * T) / sum(number_of_pairs - 1)]
T.statistic

## [1] 0.8156463

The variance of the test statistic is:

T.sd = sqrt(1/(rank.correlation.summary[, sum(number_of_pairs - 1)]))
T.sd

## [1] 0.1889822

Mack recommends a test at the 50% confidence level, corresponding to a Z value of 0.67, on the grounds that the test is only approximate in nature. In this case, we reject the hypothesis that the columns are uncorrelated if the \(T\) statistic excceeds

0.67 * T.sd

## [1] 0.1266181

Since \(T\) lies within the 50% confidence interval, we do not reject the hypothesis that the columns are uncorrelated.

Pearson Correlation Approach

Venter’s approach is to calculate the Pearson correlation coefficient for pairs of subsequent columns, and then to look at the number of pairs of columns for which there is a statistically significant correlation. (Note that Venter uses ratios of incremental to cumulative, rather than cumulative to cumulative; however, these ratios differ by exactly 1, so correlation calculations will be identical using either approach.) We begin by calculating observed development factors, pairing up subsequent columns, and deleting the last element from the longer column:

pearson.correlation.table = mack.table[, next_dev := k+1]
pearson.correlation.table = merge(pearson.correlation.table, pearson.correlation.table[, .(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
pearson.correlation.table[, obs_dev_factor := next_loss / loss]
pearson.correlation.table = merge(pearson.correlation.table, pearson.correlation.table[, .(i, k, next_dev_factor = obs_dev_factor)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
datatable(pearson.correlation.table[order(k)])

Using \(X\) to denote the earlier column and \(Y\) to denote the later column, we obtain the following summary statistics:

pearson.correlation.summary = pearson.correlation.table[, .(mean_X = mean(obs_dev_factor),
                                                            mean_Y = mean(next_dev_factor),
                                                            mean_X_squared = mean(obs_dev_factor^2),
                                                            mean_Y_squared = mean(next_dev_factor^2),
                                                            mean_XY = mean(obs_dev_factor * next_dev_factor),
                                                            number_of_obs = .N), by = k]
pearson.correlation.summary[, var_X := mean_X_squared - mean_X^2]
pearson.correlation.summary[, var_Y := mean_Y_squared - mean_Y^2]
pearson.correlation.summary[, cov_XY := mean_XY - mean_X * mean_Y]
pearson.correlation.summary[, cor_XY := cov_XY / sqrt(var_X * var_Y)]
datatable(pearson.correlation.summary[, .(k, var_X, var_Y, cov_XY, cor_XY, number_of_obs)])

The test described by Venter is based on the statistic \[ T = r \sqrt{(n-2) / (1-r^2)} \] where \(r\) is the sample correlation coefficient, and \(n\) is the number of rows in the shorter column. Calculating this value in the above table:

pearson.correlation.summary = pearson.correlation.summary[number_of_obs > 1]
pearson.correlation.summary[, T := cor_XY * sqrt((number_of_obs - 2)/(1 - cor_XY^2))]
datatable(pearson.correlation.summary[,.(k, cor_XY, number_of_obs, T)])

The test for correlation is based on considering \(T\) to have the \(t\)-distribution with \(n-2\) degrees of freedom. In the following example, we identify columns as having significant correlation if the observed \(T\) statistic at 10% significance:

pearson.correlation.summary = pearson.correlation.summary[number_of_obs > 2]
pearson.correlation.summary[, p_value := 1 - pt(abs(T), number_of_obs - 2)]
pearson.correlation.summary[, significant_10 := ifelse(p_value < 0.1, 1, 0)]
datatable(pearson.correlation.summary[, .(j, number_of_obs, T, p_value, significant_10)])

One pair of columns is significant at the 10% level. To extend this test to the whole table, we can consider it to have a binomial distribution with success probability \(0.1\) and number of trials equal to the number of pairs of columns in the table. In this case, the mean and standard deviation of this distribution are:

n = nrow(pearson.correlation.summary)
p = 0.1
binomial.mean = p * n
binomial.sd = sqrt(n * p * (1-p))

As an example, Venter uses a threshold of 3.33 standard deviations, in this case:

threshold = binomial.mean + 3.33 * binomial.sd
print(paste0("If more than ", threshold, " pairs of columns have statistically significant correlations, then the table does not satisfy the independence assumption."))

## [1] "If more than 3.0470402530404 pairs of columns have statistically significant correlations, then the table does not satisfy the independence assumption."

The above implicitly assumes a normal approximation. For such a small number of observations, we can more explicitly calculate the probability of seeing 1 or more significant columns:

binomial.p = 1 - (1-p)^n
print(paste0("The probability that 1 or more pairs of columns are significant is ", binomial.p))

## [1] "The probability that 1 or more pairs of columns are significant is 0.468559"

Given that this probability is large, we would conclude that there is no reason to reject the assumption that the columns in the table are independent. When columns are correlated, Venter suggests using the relationship \[ E[XY] = E[X]E[Y] + \mathrm{Cov}(X,Y) \] to correct the results.

Testing for Calendar Year Effects

A calendar year effect will affect the reported losses along a diagonal, and the presence of such an effect violates the assumption that accident years are indpendent, since a calendar year effect will affect different accident years in the same way. Examples of calendar year effects include:

Changes in claims handling or reserving practices
Changes in the inflation rate

The \(j\)th diagonal is the set of all entries where the sum of the row and column is \(j+1\): \[ D_j = \{C_{j,1}, C_{j-1, 2}, \ldots, C_{1, j}\} \] These effects will affect two sets of adjacent development factors. If the entries of \(D_j\) are are larger than usual, then the factors \[ A_j = \{C_{j,2} / C_{j,1}, C_{j-1, 3} / C_{j-1, 2}, \ldots, C_{1, j+1} / C_{1,j}\} \] will be smaller than usual, since the elements of \(D_j\) appear in the denominator. The factors \[ A_{j-1} = \{C_{j-1, 2 } /C_{j-1, 1}, \ldots, C_{1, j} / C_{1, j-1}\} \] will be larger than usual, since the elements of \(D_j\) appear in the numerator.

Binomial Distribution Test

This approach is based on subdividing the development factors in each column based on whether they are “large” or “small”, and then checking whether any diagonals have predominantly large or small factors. The first step is to calculate the observed development factors, along with a new column to indicate which diagonal the factor belongs to.

diagonal.test.table = mack.table[, next_dev := k+1]
diagonal.test.table = merge(diagonal.test.table, diagonal.test.table[, .(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
diagonal.test.table[, obs_dev_factor := next_loss / loss]
diagonal.test.table[, diagonal := i + k - 1]
datatable(diagonal.test.table[order(k)])

Factors are considered to be large or small based on whether they are above or below the median for the column. If there are an odd number of entries in the column, then the number equal to the median is eliminated from the analysis.

diagonal.test.table = as.data.table(diagonal.test.table %>% group_by(k) %>% mutate(median_factor = median(obs_dev_factor)))
diagonal.test.table[, large_or_small := ifelse(obs_dev_factor > median_factor, "L", 
                                               ifelse(obs_dev_factor < median_factor, "S", "N"))]
datatable(diagonal.test.table[order(k)])

Under this construction, each factor that is not eliminated has a 50% probability of being “large” and 50% probability of being “small”. Now, for each diagonal other than the first, we count the number of “large” and “small” elements on the diagonal, denoted by \(L_j\) and \(S_j\), respectively. (More generally, any diagonal that contains only one non-excluded factor should be removed, since the value of the test statistic is not random – it is always 0.)

diagonal.test.table[, large_indicator := ifelse(large_or_small == "L", 1, 0)]
diagonal.test.table[, small_indicator := ifelse(large_or_small == "S", 1, 0)]
diagonal.test.summary = diagonal.test.table[diagonal != 1, .(large_count = sum(large_indicator), small_count = sum(small_indicator), factor_count = sum(large_indicator) + sum(small_indicator)), by = diagonal]
datatable(diagonal.test.summary[order(diagonal)])

Under the hypothesis that there are no calendar year effects, then both \(L_j\) and \(S_j\) should be close to \((L_j + S_j)/2\), aside from random fluctuations. This can be tested by calculating \[ Z_j = \min(L_j, S_j) \] and testing whether it is significantly smaller than \((L_j + S_j)/2\).

diagonal.test.summary[, Z_j := pmin(large_count, small_count)]
datatable(diagonal.test.summary)

The expected value and variance of \(Z_j\) depends on the total number of non-excluded factors on the diagonal, denoted by \(n\). Since \(S_j + L_j = n\), for small values of \(n\) it is easy to calculate the mean and variance from first principles. This example does it for \(n=5\):

Z_5.analysis = data.table("L" = 0:5)
Z_5.analysis[, S := 5 - L]
Z_5.analysis[, Z := pmin(S, L)]
Z_5.analysis[, Z_squared := Z^2]
Z_5.analysis[, probability := dbinom(x = L, size = 5, prob = 0.5)]
datatable(Z_5.analysis)

E_Z = Z_5.analysis[, sum(Z * probability)]
E_Z_squared = Z_5.analysis[, sum(Z_squared * probability)]
var_Z = E_Z_squared - E_Z^2
print(paste0("The expected value of Z_5 is ", E_Z))

## [1] "The expected value of Z_5 is 1.5625"

print(paste0("The variance of Z_5 is ", var_Z))

## [1] "The variance of Z_5 is 0.371093750000001"

There are also formulas for calculating these moments: \[ E(Z_j) = \frac{n}{2} - \binom{n-1}{m}\frac{n}{2^n} \] and \[ \mathrm{Var}(Z_j) = \frac{n(n-1)}{4} - \binom{n-1}{m} \frac{n(n-1)}{2^n} + E(Z_j) - E(Z_j)^2 \] where \(m = \lfloor \frac{n-1}{2}\rfloor\). In this example, we obtain the following values:

diagonal.test.summary[, m := floor((factor_count - 1) / 2)]
diagonal.test.summary[, E_Z := factor_count / 2 - choose(factor_count - 1, m) * factor_count / 2^factor_count]
diagonal.test.summary[, var_Z := factor_count * (factor_count - 1) / 4 - choose(factor_count - 1, m) * factor_count * (factor_count - 1) / 2^factor_count + E_Z - E_Z^2]
datatable(diagonal.test.summary)

The test statistic is \[ Z = \sum_j Z_j \] This statistic, along with its mean and variance, can be calculated as follows:

Z = diagonal.test.summary[, sum(Z_j)]
E_Z = diagonal.test.summary[, sum(E_Z)]
var_Z = diagonal.test.summary[, sum(var_Z)]
print(paste0("The test statistic Z is ", Z))

## [1] "The test statistic Z is 14"

print(paste0("The expected value of Z is ", E_Z))

## [1] "The expected value of Z is 12.875"

print(paste0("The variance of Z is ", var_Z))

## [1] "The variance of Z is 3.978515625"

Assuming that \(Z\) is normally distributed, we would reject the hypothesis that there are no calendar year effects if the test statistic falls outside a 95% confidence interval:

lower_CI = E_Z - 2 * sqrt(var_Z)
upper_CI = E_Z + 2 * sqrt(var_Z)
print(paste0("The 95% confidence interval is [", lower_CI, ", ", upper_CI, "]"))

## [1] "The 95% confidence interval is [8.88575665069176, 16.8642433493082]"

Since the test statistic lies within this interval, we do not reject the hypothesis that there are no calendar year effects, and the chain ladder method is not rejected on these grounds.

Additive CY Effects via Regression

Venter uses regression, with indicator variables for each diagonal, as a way for testing whether there are significant additive calendar year effects. (Note that this is equivalant to treating the “diagonal” field in the dataset as a categorical variable.) This will also allow the effect to be measured, and information about company operations can be used to determine whether it is likely to be a permanent or temporary effect. The specification of this model is:

Response variable: incremental losses in all but the first year of the data
Indicator variable for each diagonal. The parameter for this provides the additive calendar year effect.
Continuous variable for each year other than the last. This variable is zero in the rows corresponding to incremental losses for years other than the subsequent year, and is equal to the cumulative losses otherwise. The parameters for these variables provide the cumulative-to-incremental factors.

Venter uses the following incremental loss table:

venter.table = fread("./venter_table.csv")
datatable(venter.table)

To set up the regression, we need to define a diagonal variable, and calculate the cumulative values in each development age:

venter.table[, diagonal := as.factor(i + k)]
venter.cumulative.0 = venter.table[k <= 0, .(cumulative_0 = sum(incremental_loss)), by = i][i <= 3]
venter.cumulative.1 = venter.table[k <= 1, .(cumulative_1 = sum(incremental_loss)), by = i][i <= 2]
venter.cumulative.2 = venter.table[k <= 2, .(cumulative_2 = sum(incremental_loss)), by = i][i <= 1]
venter.CY = merge(venter.table, venter.cumulative.0, by = "i", all.x = TRUE)
venter.CY = merge(venter.CY, venter.cumulative.1, by = "i", all.x = TRUE)
venter.CY = merge(venter.CY, venter.cumulative.2, by = "i", all.x = TRUE)
venter.CY = venter.CY[k >= 1]
venter.CY[, cumulative_0 := ifelse(k == 1, cumulative_0, 0)]
venter.CY[, cumulative_1 := ifelse(k == 2, cumulative_1, 0)]
venter.CY[, cumulative_2 := ifelse(k == 3, cumulative_2, 0)]
datatable(venter.CY[order(k)])

We can now fit a linear regression model to this data:

venter.CY.test = lm(incremental_loss ~ cumulative_0 + cumulative_1 + cumulative_2 + diagonal, data = venter.CY)
summary(venter.CY.test)

## 
## Call:
## lm(formula = incremental_loss ~ cumulative_0 + cumulative_1 + 
##     cumulative_2 + diagonal, data = venter.CY)
## 
## Residuals:
## ALL 6 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)    -0.500         NA      NA       NA
## cumulative_0    2.500         NA      NA       NA
## cumulative_1    1.500         NA      NA       NA
## cumulative_2    1.438         NA      NA       NA
## diagonal3       1.000         NA      NA       NA
## diagonal4      -7.000         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 5 and 0 DF,  p-value: NA

Unfortunately, this model is not very interesting: since the number of parameters is equal to the number of observations, the model is saturated. For a more interesting example, we can apply the technique to the larger Mack table, which will not result in a saturated model. Note that this table contains cumulative values, not incremental ones, so the transformation on this dataset is different:

mack.CY.regression = mack.table[, diagonal := as.factor(i+k)]
mack.CY.regression = merge(mack.CY.regression, mack.table[,.(i, k, next_loss = loss)], by.x = c("i", "next_dev"), by.y = c("i", "k"))
mack.CY.regression[, incremental_loss := next_loss - loss]
for (n in 1:9) {
  mack.CY.regression[, new_variable := ifelse(k == n, loss, 0)]
  names(mack.CY.regression)[8 + n] = paste0("cumulative_", n)
}
datatable(mack.CY.regression[order(k)])

Fit the model to the data:

mack.CY.model = lm(incremental_loss ~ cumulative_1 + cumulative_2 + cumulative_3 + cumulative_4 + cumulative_5 + cumulative_6 + cumulative_7 + cumulative_8 + cumulative_9 + diagonal, data = mack.CY.regression)
summary(mack.CY.model)

## 
## Call:
## lm(formula = incremental_loss ~ cumulative_1 + cumulative_2 + 
##     cumulative_3 + cumulative_4 + cumulative_5 + cumulative_6 + 
##     cumulative_7 + cumulative_8 + cumulative_9 + diagonal, data = mack.CY.regression)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2753.6 -1051.7  -263.3  1053.8  3470.0 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  1462.12865 2425.98218   0.603  0.55174   
## cumulative_1    0.35811    0.33440   1.071  0.29368   
## cumulative_2    0.07311    0.12119   0.603  0.55137   
## cumulative_3   -0.06446    0.08063  -0.799  0.43102   
## cumulative_4   -0.10135    0.06435  -1.575  0.12691   
## cumulative_5   -0.11895    0.05675  -2.096  0.04557 * 
## cumulative_6   -0.17099    0.05906  -2.895  0.00742 **
## cumulative_7   -0.19269    0.07020  -2.745  0.01063 * 
## cumulative_8   -0.23195    0.08907  -2.604  0.01479 * 
## cumulative_9   -0.19942    0.11009  -1.812  0.08120 . 
## diagonal3    1625.12139 2599.11715   0.625  0.53705   
## diagonal4     791.07533 2309.09165   0.343  0.73456   
## diagonal5    2699.54739 2197.40779   1.229  0.22985   
## diagonal6    3149.85090 2325.48953   1.354  0.18681   
## diagonal7    3326.99484 2290.37207   1.453  0.15786   
## diagonal8    3117.65253 2302.72728   1.354  0.18699   
## diagonal9    3248.54529 2279.29915   1.425  0.16555   
## diagonal10   2431.49739 2240.00494   1.085  0.28730   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1754 on 27 degrees of freedom
## Multiple R-squared:  0.6113, Adjusted R-squared:  0.3665 
## F-statistic: 2.498 on 17 and 27 DF,  p-value: 0.01631

Note that none of the diagonal terms are statistically significant. Alternate approaches include incorporating diagonal effects (either multiplicative or addititive) into the additive chain ladder approach.

When the diagonal terms are significant, a natural question is how to treat them in forecasting.

If age effects are considered to be the dominant influence, and diagonal effects are expected to be occasional distortions, then it is best to keep the diagonal effects in the model (in order to produce accurate estimates of the age effects), but to use only the age effects in predictions.

Multiplicative CY effects

The general form of an emergence model with a multiplicative diagonal effect \(g\) and accident year effect \(h\) is \[ E[q(w, d+1) | \text{data to } w+d] = f(d)h(w)g(w+d) \] A typical interpretation is that \(g\) represents cumulative claims inflation to that diagonal since the first calendar year. If this is the case, then the model can be simplified by setting \[ g(w+d) = (1 + r)^{w+d} \] so that only the rate of inflation \(r\) needs to be estimated. Further reductions in the number of parameters to be estimated can be made by replacing \(f(d) = (1 + s)^d\) and \(h(w) = h(1+t)^w\) with trends, and then grouping together based on its exponent in order to further reduce the number of parameters: \[ h(1+s)^d(1+r)^{w+d}(1+t)^w = h[(1+s)(1+r)]^d[(1+r)(1+t)]^w \] noting that \((1+s)(1+r) = 1 + s + r + rs\), so \(s+r+rs\) can be treated as a single parameter, and likewise with \((1+r)(1+t)\).

Variablity of the Ultimate Claim Estimate

Mack provides an estimator of the proportionality constant \(\alpha_k\) from the chain ladder variance assumption: \[ \alpha_k^2 = \frac{1}{I-k-1} \sum_{1 \leq j \leq I-k} c_{j,k}\left(\frac{c_{j, k+1} / c_{j,k}} - f_k \right)^2 \] This can be calculated using the estimated and observed development factors determined earlier:

mack.alpha.table = mack.residuals[, .(alpha_k_squared = sum(loss * (obs_dev_factor - f_k_1)^2) / (.N - 1)), by = k]
datatable(mack.alpha.table)

It’s not possible to estimate \(\alpha_9^2\) because there is only a single data point at this development age. Mack recommends two approaches. The first is linear extrapolation of the logarithm of the estimate. As a check of the reasonabilty of this approach, plot the logarithm of the estimate against \(k\) to assess whether a linear fit is appropriate:

mack.alpha.table[, log_alpha_k_squared := log(alpha_k_squared)]
ggplot(data = mack.alpha.table, aes(x = k, y = log_alpha_k_squared)) + geom_point() + geom_smooth(method = "lm")

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).

Overall, the linear fit looks quite good. The extrapolated value corresponding to this method can be calculated as follows:

alpha.extrapolation = lm(log_alpha_k_squared ~ k, data = mack.alpha.table)
alpha.predictions = predict(alpha.extrapolation, newdata = mack.alpha.table)
print(paste0("The extrapolated value for alpha_9 squared is ", exp(alpha.predictions[9])))

## [1] "The extrapolated value for alpha_9 squared is 0.645486792551869"

A simplified approach involves selecting \(\alpha_9^2\) such that the following two assumptions are satisfied:

On a log scale, differences between adjacent pairs of \(\alpha\) are constant. This means that \(\alpha_{I-3} / \alpha_{I-2} = \alpha_{I-2} / \alpha_{I-1}\).
The values of \(\alpha\) are decreasing.

Combined, this means that the projection for \(\alpha_{I-1}^2\) is the minimum of \(\alpha_{I-2}^4 / \alpha_{I-3}^2\), \(\alpha_{I-2}^2\), and \(\alpha_{I-3}^2\). In this example:

alpha.8.squared = mack.alpha.table[k == 8, .(alpha_k_squared)]
alpha.7.squared = mack.alpha.table[k == 7, .(alpha_k_squared)]
alpha.9.squared.projection = pmin(alpha.8.squared^2 / alpha.7.squared, alpha.7.squared, alpha.8.squared)
print(paste0("The extrapolated value for alpha_9 squared is ", alpha.9.squared.projection))

## [1] "The extrapolated value for alpha_9 squared is 1.34342546562448"

mack.alpha.table[k == 9, alpha_k_squared := alpha.9.squared.projection]

Once \(\alpha_k^2\) has been estimated, the standard error of the ultimate loss estimate for accident year \(i\) can be calculated as \[ \mathrm{se}(c_{i, I})^2 = c_{i, I}^2 \sum_{I+1-i \leq k \leq I-1} \frac{\alpha_k^2}{f_k^2}\left(\frac{1}{c_{i,k}} + \frac{1}{\sum_{1\leq j \leq I-k} c_{j,k}} \right) \] Considerations when assessing the variance of the reserve estimate include:

This is estimate is conditional on all the data observed so far: this is essential, because the unconditional estimate is not relevant to our use case, since we do know the amount of claims that have been reported in the past, so they should no longer be treated as random.
The mean squared error for \(c_{i,I}\) is the sum of the purely random error (process variance) and the estimation error (parameter variance).
This approach does not take into account variation resulting from devation from model assumptions (e.g. emergence of asbestos)
Since reserves and ultimate claim amounts differ by a constant (claims reported to date), then this is also the standard error of the reserve estimate. Coefficients of variation should be calculated using the reserve estimate, rather than the ultimate estimate, since the reported claim portion of the ultimate losses is not a random variable.

In order to calculate the ultimate claim amount and standard error, bring everything together in one table:

mack.reserve = merge(mack.factors, mack.alpha.table, by = "k")
mack.reserve = mack.reserve[order(-k)]
mack.reserve[, LDF := cumprod(f_k_1)]
mack.final.diagonal = mack.table[i + k == 11, .(k, i, loss)]
mack.reserve = merge(mack.reserve, mack.final.diagonal, by = "k")
mack.reserve[, ultimate := loss * LDF]
mack.reserve[, reserve := ultimate - loss]
datatable(mack.reserve)

Confidence Intervals

Confidence intervals should be created for the reserve amount \(R_i\), and then shifted if an interval for the ultimate amount is needed. This is because the reported part of the ultimate claim amount is not random. Constructing a confidence interval requires a distributional assumption. Options include:

A normal assumption may be appropriate if the volume of outstanding claims is large enough for the central limit theorem to apply.
If the distribution is highly skewed, or if the standard error is greater than 50% of the reserve amount, then a lognormal distribution may be used. (A standard error greater than 50% can produce confidence intervals that include negative values.)

Parameters for the lognormal distribution can be determined by matching moments: \[ \exp(\mu_i + \sigma_i^2/2) = R_i \] and \[ \exp(2\mu_i + \sigma_i^2) (\exp(\sigma_i^2)-1) = \mathrm{se}(R_i)^2 \] Solving these equations for the lognormal distribution parameters gives \[ \sigma_i^2 = \log(1 + \mathrm{se}(R_i)^2 / R_i^2) \] and \[ \mu_i = \log(R_i) - \sigma_i^2/2 \] Pecentiles can be calculated by determining the value \(z\) corresponding to the same percentile on the standard normal distribution, then calculating \[ \exp(\mu_i + z \sigma_i) \] Confidence intervals resulting from the lognormal distribution are not symmetric, and business policy may be used to determine the upper and lower percentiles.

Using the example from the previous section, and the given value of \(\mathrm{se}(R) / R = 0.52\), we have:

cv.R = 0.52
R = mack.reserve[, sum(reserve)]
sigma.R = sqrt(log(1 + cv.R^2))
mu.R = log(R) - sigma.R^2 / 2
R.percentile.90 = exp(mu.R + 1.28 * sigma.R)
R.percentile.10 = exp(mu.R - 1.28 * sigma.R)
print(paste0("The 90th percentile of the reserve estimate is ", R.percentile.90))

## [1] "The 90th percentile of the reserve estimate is 86494.4254313207"

print(paste0("The 10th percentile of the reserve estimate is ", R.percentile.10))

## [1] "The 10th percentile of the reserve estimate is 24721.8698299094"

To determine confidence intervals for the individual accident years, one approach is to select a common value of \(z\) across all accident years such that the sum of results across accident years equals the value of the desired percentile for the whole reserve. In Mack’s example, values of \(z = 1.13208\) (87th percentile) and \(z = -0.8211\) (21st percentile) in individual accident years produce results that sum to the 90th and 10th percentiles given above. Thus, this 80% confidence interval for the whole reserve corresponds to 66% confidence intervals on the individual accident years. This is illustrated below, using the standard errors for \(R_i\) calculated by Mack:

z.lower = -0.8211
z.upper = 1.13208
mack.reserve$R_cv = c(1.5, 0.59, 0.49, 0.41, 0.55, 0.53, 0.46, 1.01, 1.34)
mack.reserve[, sigma_R := sqrt(log(1 + R_cv^2))]
mack.reserve[, mu_r := log(reserve) - sigma_R^2 / 2]
mack.reserve[, lower_percentile_R := exp(mu_r + z.lower * sigma_R)]
mack.reserve[, upper_percentile_R := exp(mu_r + z.upper * sigma_R)]
mack.reserve[, lower_percentile_U := lower_percentile_R + loss]
mack.reserve[, upper_percentile_U := upper_percentile_R + loss]
datatable(mack.reserve[, .(i, lower_percentile_R, reserve, upper_percentile_R, lower_percentile_U, ultimate, upper_percentile_U)])

Validate that the reserve percentiles sum up to the corresponding total reserve percentiles:

R.total.lower = mack.reserve[, sum(lower_percentile_R)]
R.total.upper = mack.reserve[, sum(upper_percentile_R)]
print(paste0("The sum of the lower percentiles is ", R.total.lower))

## [1] "The sum of the lower percentiles is 24904.6333877776"

print(paste0("The sum of the upper percentiles is ", R.total.upper))

## [1] "The sum of the upper percentiles is 86234.9057646017"

Empirical limits

The term empirical limits refers to the estimates that would be produced if we used the maximum and minimum observed age-to-age factors to develop claims to ultimate. Using the Mack triangle, these value are:

limit.factors = mack.age.to.age[, .(max_factor = max(obs_dev_factor), min_factor = min(obs_dev_factor)), by = "k"][order(-k)]
limit.factors[, max_LDF := cumprod(max_factor)]
limit.factors[, min_LDF := cumprod(min_factor)]
mack.empirical = merge(mack.reserve, limit.factors, by = "k")
mack.empirical[, empirical_limit_upper := loss * max_LDF]
mack.empirical[, empirical_limit_lower := loss * min_LDF]
datatable(mack.empirical[, .(k, LDF, loss, ultimate, min_LDF, max_LDF, empirical_limit_lower, empirical_limit_upper)])

Exam 7 Notes - Mack (1993) and Venter

Craig Sloss