Introduction

This notebook illustrates the concepts in the Exam 7 reading Loss Development Using Credibility by Eric Brosius. These concepts are illustrated by reproducing the examples in the paper and through the use of simulated data. The following packages are used in this notebook:

easypackages::packages("data.table", "DT", "ggplot2")
## Warning: package 'data.table' was built under R version 3.4.2

Least Squares Development

Description of the Method

The general framework involves estimating the loss amount \(y\) for an accident year at a given valuation point based on the value \(x\) for the accident year an at earlier valuation point. The general form of the estimator of \(y\) is \[ L(x) = a + bx \] If we use least squares regression to determine the parameters \(a\) and \(b\), then \[ b = \frac{\overline{xy} - \overline{x} \overline{y}}{\overline{x^2}- \overline{x}^2} = \frac{\mathrm{Cov}(X,Y)}{\mathrm{Var}(X)}, \] and \[ a = \overline{y} - b \overline{x}, \] where the covariance and variance are based on the empirical distribution. This method is useful when:
  • Random year-to-year fluctuations are significant, and are the primary cause of fluctuations
  • The book of business is stable over time
  • Inflation is either not material, or is corrected for
  • Caseload effects are material (see section below)
The least squares method can also be given an interpretation in terms of probability. Note that the least squares regression line can be re-expressed as \[ L(x) = (x - E(X)) \frac{\mathrm{Cov}(X,Y)}{\mathrm{Var}(X)} + E(Y), \] which is the best linear least squares approximation to a Bayesian estimate of the expected total claims, \(Q(x)\). This equation provides guidance on how to relate changes in the expected reported value to changes in the reserve \(L(x)-x\):
  1. If reported amounts increase, and \(\mathrm{Cov}(X,Y) < \mathrm{Var}(X)\), then reserves should decrease.
  2. If \(\mathrm{Cov}(X,Y) = \mathrm{Var}(X)\), then a change in reported amounts should not change the reserve.
  3. If reported amounts increase, and \(\mathrm{Cov}(X,Y) > \mathrm{Var}(X)\), then reserves should increase.

The following table is provided as an example by Brosius:

brosius.example <- fread("./Brosius.csv")
datatable(brosius.example)

Our objective is to determine an estimate of AY1991 at 27 months, given that AY1991 at 15 months is equal to 40,490. Fitting a linear model to this data, we obtain:

brosius.lm = lm(Valuation_27_mo ~ Valuation_15_mo, data = brosius.example[AY != 1991])
summary(brosius.lm)
## 
## Call:
## lm(formula = Valuation_27_mo ~ Valuation_15_mo, data = brosius.example[AY != 
##     1991])
## 
## Residuals:
##     1     2     3     4     5     6 
## -1171  3560 -1253 -3534 -1763  4161 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     6.024e+03  2.286e+03   2.635 0.057898 .  
## Valuation_15_mo 9.678e-01  8.469e-02  11.427 0.000335 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3483 on 4 degrees of freedom
## Multiple R-squared:  0.9703, Adjusted R-squared:  0.9628 
## F-statistic: 130.6 on 1 and 4 DF,  p-value: 0.0003346

The predictions produced by this model are:

y = predict(brosius.lm, newdata = brosius.example)
brosius.example[,prediction:=y]
datatable(brosius.example)

We can also produce these results using the regression formulas directly:

xy.bar = brosius.example[AY!=1991, mean(as.numeric(Valuation_15_mo) * as.numeric(Valuation_27_mo))]
x.bar = brosius.example[AY!=1991, mean(Valuation_15_mo)]
y.bar = brosius.example[AY!=1991, mean(Valuation_27_mo)]
x.squared.bar = brosius.example[AY!=1991, mean(Valuation_15_mo^2)]
b = (xy.bar - x.bar * y.bar)/(x.squared.bar - x.bar^2)
a = y.bar - b * x.bar
predicted.value = a + b * brosius.example[AY == 1991, Valuation_15_mo]
predicted.value
## [1] 45210.5

(The “as.numeric” function was required here because integers in R are limited to 32 bits.)

Relationship to Other Methods

Other methods for estimating unpaid claims appear as special cases of the linear model framework:
  • Link ratio method:
    • Corresponds to \(a=0\).
    • Corresponds to a situation in which higher-than-expected reported claims correspond to higher ultimate claims.
    • If the regression approach gives \(a<0\), then this may be an indication that the link ratio method is appropriate, since the regresson method would provide negative estimates for small reported values.
  • Loss ratio method / budgeted loss method:
    • Corresponds to \(b=0\)
    • Implies that there is no correlation between emerging losses and ultimate losses. Higher-than-expected reported losses simply indicate an acceleration of the reporting pattern.
    • If the regression approach gives a value of \(b<0\), then this may be an indication that the loss ratio method may be appropriate.
  • Bornhuetter-Ferguson method:
    • Corresponds to the case \(b=1\)
    • Implies that the unreported amount is independent of the reported amount.

A key insight is that all three of the options are part of a continuum of models of the form \(L(x) = a + bx\), and other options may be available.

Simulation Results

Simulation of claim count development can be used to assess these models. The following notation is used in this section:

In a Poisson-Binomial model, we assume that \(Y\) has the Poisson distribution with mean \(\mu\), and each claim has a fixed probability \(d\) of being reported by year-end. Therefore, the expected number of unreported claims is also Poisson-distributed with mean \(\mu(1-d)\), so \[ R(x) = \mu(1-d) \] and \[ Q(x) = x + \mu(1-d) \] Let’s simulate 8 years of data with the same parameters provided by Brosius:

set.seed(12345)
## Warning in set.seed(12345): '.Random.seed' is not an integer vector but of
## type 'NULL', so ignored
mu = 4
d = 0.5
number.of.years = 8
year.list = data.table(year = 1:number.of.years, allow.cartesian = TRUE)
number.of.simulations = 10
simulation.list = data.table(simulation = 1:number.of.simulations, allow.cartesian = TRUE)
simulation.table = merge(year.list, simulation.list, by = .EACHI, allow.cartesian = TRUE)
simulation.table[,ultimate.claims := rpois(number.of.years * number.of.simulations, mu)]
simulation.table[,reported.claims := rbinom(number.of.years * number.of.simulations, ultimate.claims, d)]

The results for a typical simulation look like this:

datatable(simulation.table[simulation == 1])

Next, we’l calculate the fitted line for each simulation, along with the link ratio parameter \(c = \frac{\overline{Y}}{\overline{X}}\):

fitted.parameters = simulation.table[, .(xbar = mean(reported.claims),
                                         ybar = mean(ultimate.claims),
                                         xybar = mean(reported.claims * ultimate.claims),
                                         xsquaredbar = mean(reported.claims^2)), by = simulation]
fitted.parameters[, b := (xybar - xbar * ybar) / (xsquaredbar - xbar^2)]
fitted.parameters[, a := ybar - b * xbar]
fitted.parameters[, c := ybar / xbar]
datatable(fitted.parameters)

The true parameter values are easy to determine for this theoretical model: since it is a Poisson process, the expected number of unreported claims is \(R(x)=2\), so \(Q(x) = x+2\). This gives parameter values of \(a=2\) and \(b=1\). While there is considerable volatility in the parameter estimates among the 10 simulations, on average they are reasonable:

fitted.parameters[, mean(a)]
## [1] 1.753269
fitted.parameters[, mean(b)]
## [1] 1.169251

For the link ratio method, to produce an unbiased estimate we require that \[ E[(c-1)X] = 2 \] Therefore, \[ c = 1 + 2/E[X] \] In this example, \(c=2\). The average estimate of \(c\) is

fitted.parameters[, mean(c)]
## [1] 2.027709

Note that this approach will never produce the correct estimate, \(x+2\), since all estimates produced using this method are of the form \(L(x) = cx\). We can also compare the in-sample predictions from the two methods to assess the overall fit to the data by calculating the mean squared error in each simulation:

simulation.predictions = merge(simulation.table, fitted.parameters, by = "simulation")
simulation.predictions[, link_ratio_prediction := c * reported.claims]
simulation.predictions[, linear_prediction := a + b * reported.claims]
simulation.summary = simulation.predictions[, .(link_MSE = sum((link_ratio_prediction - ultimate.claims)^2)/.N, linear_MSE = sum((linear_prediction - ultimate.claims)^2)/.N), by = "simulation"]
datatable(simulation.summary)

It’s not suprising that the linear estimate produces better results, since parameters in this model are selected by minimizing mean squared error. We can also visualize the results of the simulations and predictions:

visualize.simulation = function(simulation.number) {
  ggplot(data = simulation.predictions[simulation == simulation.number], aes(x = reported.claims, y = ultimate.claims)) + geom_point(col = "green") + geom_line(aes(x = reported.claims, y = linear_prediction), col = "blue") + geom_line(aes(x = reported.claims, y = link_ratio_prediction), col = "red") + labs(title = paste0("Results of Simulation ", simulation.number))
}
visualize.simulation(1)

visualize.simulation(2)

visualize.simulation(3)

For a more rigourous approach to assessing the model, we should be selecting parameters based on a training set and assessing on a holdout set. (This isn’t practical for estimation of unpaid claims in real applications due to the limited amount of data available, but it is easy to do when using simulated data.) For this purpose, I’ll use simulations 1 through 7 for training, and simulations 8 through 10 as holdout. MSE will be calculated on the holdout data.

training.data = simulation.table[simulation <= 7]
xbar = training.data[, mean(reported.claims)]
ybar = training.data[, mean(ultimate.claims)]
xybar = training.data[, mean(reported.claims * ultimate.claims)]
xsquaredbar = training.data[, mean(reported.claims^2)]
b.sim = (xybar - xbar * ybar) / (xsquaredbar - xbar^2)
a.sim = ybar - b * xbar
c.sim = ybar / xbar
holdout.data = simulation.table[simulation >= 8]
holdout.data[, link_ratio_prediction := c.sim * reported.claims]
holdout.data[, linear_prediction := a.sim + b.sim * reported.claims]
holdout.MSE.linear = holdout.data[, mean((linear_prediction - ultimate.claims)^2)]
holdout.MSE.linear
## [1] 3.424594
holdout.MSE.link = holdout.data[, mean((link_ratio_prediction - ultimate.claims)^2)]
holdout.MSE.link
## [1] 4.708333

On unseen data, the linear fit is an improvement over the link ratio method.

Credibility-weighting Interpretation

Recall that the expected value of the process variance is \[ \mathrm{EVPV} = E_Y(\mathrm{Var}(X|Y)) \] and the variance of the hypothetical means is \[ \mathrm{VHM} = \mathrm{Var}_Y(E[X|Y]) \] Interpreting \(X\) as the reported amount and \(Y\) as the ultimate amount, then EVPV represents the variability of the loss reporting process, while VHM represents the variability of the loss occurrence process. A credibility formula can be derived by assuming that there is some constant \(d\) such that \(E[X|Y=y] = dy\). In this case, \[ L(x) = Z \frac{x}{d} + (1-Z) E[Y] \] where \[ Z = \frac{\mathrm{VHM}}{\mathrm{VHM} + \mathrm{EVPV}} \] In other words, this is a credibility-weighted combination of the link ratio estimate (\(x/d\)) and the budgeted loss estimate (\(E[Y]\)). Special cases include:
  1. When \(\mathrm{VHM} = 0\), then \(Z=0\), so this produces the budgeted loss estimate. This corresponds to a situation in which there is no variability in the loss occurrence process.
  2. When \(\mathrm{EVPV} = 0\), then \(Z=1\), so this produces the link ratio estimate. This corresponds to a situation in which there is no variability in the loss reporting process.
  3. To the extent that there is uncertainty in either of the above, a weighted average is used.

Applying this interpretation to the theoretical Poisson-Binomial model described above, we first calculate \[ \mathrm{EVPV} = E_Y[\mathrm{Var}(X|Y)] = E_Y[Yd(1-d)] = \mu d (1-d) \] and \[ \mathrm{VHM} = \mathrm{Var}_Y(E[X|Y]) = \mathrm{Var}_Y(dY) = d^2\mu \] Therefore, \[ Z = \frac{d^2 \mu}{d^2\mu + \mu d - \mu d^2} = d \] Therefore, the credibility formula reduces to \[ L(x) = x + (1-d) \mu \] This is consistent with the earlier result for \(Q(x)\).

Given numerical results, we can indirectly determine the credibility by fitting a linear regression line and solving for \(Z\). In this case we, we have a \(d\) value of

d = x.bar / y.bar
d
## [1] 0.798229

Compare the linear regression formula to the credibility-weighting formula: \[ a + bx = Z \frac{x}{d} + (1-Z) E[Y] \] Comparing the coefficient of \(x\) on either side, we obtain \[ Z = bd \] Therefore, in the numerical example,

Z = b * d
Z
## [1] 0.7725372
In cases where historical data is not considered to be a reliable indicator of the future, the credibility formula can be used to relate a set of assumptions about the variability of the claims occurrence and reporting process to the credibility value implied by those assumptions. Brosius provides an example in which we assume the following:

In this case, we can calculate VHM, EVPV, and the credibility weight as follows:

VHM = 0.75^2 * 3^2
EVPV = 0.14^2 *(3^2 + 12^2)
Z.2 = VHM / (VHM + EVPV)
Z.2
## [1] 0.6280004

Therefore, with a reported value of \(x = 6\), the credibility-weighted estimate is

assumption.based.estimate = Z.2 * 6 / 0.75 + (1 - Z.2) * 12
assumption.based.estimate
## [1] 9.487998

Note that this lies between the budgeted loss estimate of $12M, and the link ratio estimate of $8M. The Bornheutter-Ferguson estimate would be $9M; the fact that this estimate is higher corresponds to the observation that \(b = Z/d<1\) (so less weight is given to the reported loss than in the Bornhuetter-Ferguson method).

Given that the assumptions of means and standard deviation are arbitrary, performing sensitivity testing on these assumptions would be advisable. Let’s assess \(\sigma(X/Y)\) on a range of 0.1 to 0.2, and \(\sigma(Y)\) on a range of 2 to 4. First, write a function to calculate the credibility-weighted estimate given arbitrary parameters:

cred.reserve.estimate = function(X, E_Y, d, sigma_Y, sigma_X_div_Y) {
  VHM = d^2 * sigma_Y^2
  EVPV = sigma_X_div_Y ^2 * (sigma_Y^2 + E_Y^2)
  Z = VHM / (VHM + EVPV)
  estimate = Z * X / d + (1 - Z) * E_Y
  return(estimate)
}

Test the function to validate that it re-produces the above results:

cred.reserve.estimate(6, 12, 0.75, 3, 0.14)
## [1] 9.487998

Define the range of values to test, apply the function over this range, then plot the results:

sigma_Y_range = 2 + 0.1 * 0:20
sigma_X_div_Y_range = 0.1 + 0.005 * 0:20
cred.sensitivity.test = as.data.table(merge(sigma_Y_range, sigma_X_div_Y_range))
colnames(cred.sensitivity.test) = c("sigma_Y", "sigma_X_div_Y")
cred.sensitivity.test$reserve_estimate = apply(cred.sensitivity.test, 1, function(y) cred.reserve.estimate(X = 6, E_Y = 12, d = 0.75, sigma_Y = y['sigma_Y'], sigma_X_div_Y = y['sigma_X_div_Y']))
ggplot(data = cred.sensitivity.test, aes(x = sigma_Y, y = sigma_X_div_Y, fill = reserve_estimate)) + geom_raster() + labs(title = "Credibility-Weighted Reserve Estimates - Sensitivity Analysis")

The maximum estimate in over this range of assumptions is

cred.sensitivity.test[, max(reserve_estimate)]
## [1] 10.89841

and the minimum estimate over this range is

cred.sensitivity.test[, min(reserve_estimate)]
## [1] 8.603774

Caseload Effect

The caseload effect is the tendency of claims to be less likely to be reported in a timely fashion when the caseload is low: as a result, \(E[X|Y]/y\) is not expected to be constant, but a decreasing function of \(y\). This can be addressed by assuming that \[ E[X|Y=y] = dy + x_0 \] for some value \(x_0\). Since \(x_0\) is constant, then \(X\) has the same EVPV and VHM as \(X - x_0\). In this case, the credibility-weighted formula becomes \[ L(x) = Z \frac{x - x_0}{d} + (1-Z) E[Y] \] In practice, it is not typically possible to determine the values of \(x_0\) and \(d\) separately, but the key implication of the above formula is that their effect can be assessed using the least squares method.

Application Across Multiple Years

Brosius demonstrates how to apply the linear regression approach using the following reported loss triangle:

brosius.triangle <- fread("./Brosius_triangle.csv")
brosius.triangle
##      AY    EP rep_loss_12 rep_loss_24 rep_loss_36 rep_loss_48 rep_loss_60
## 1: 1985  4260         102         104         209         650         847
## 2: 1986  5563           0         543        1309        2443        3003
## 3: 1987  7777         412        2310        3083        3358        4099
## 4: 1988  8871         219         763        1637        1423          NA
## 5: 1989 10465         969        4090        3801          NA          NA
## 6: 1990 11986           0        3467          NA          NA          NA
## 7: 1991 12873         932          NA          NA          NA          NA

The first step is to divide each entry by earned premium, to account for the signifincant growth in the book of business.

brosius.triangle.2 = brosius.triangle[,.(AY)]
brosius.triangle.2[, LR_12 := brosius.triangle$rep_loss_12 / brosius.triangle$EP]
brosius.triangle.2[, LR_24 := brosius.triangle$rep_loss_24 / brosius.triangle$EP]
brosius.triangle.2[, LR_36 := brosius.triangle$rep_loss_36 / brosius.triangle$EP]
brosius.triangle.2[, LR_48 := brosius.triangle$rep_loss_48 / brosius.triangle$EP]
brosius.triangle.2[, LR_60 := brosius.triangle$rep_loss_60 / brosius.triangle$EP]
brosius.triangle.2
##      AY      LR_12      LR_24      LR_36     LR_48     LR_60
## 1: 1985 0.02394366 0.02441315 0.04906103 0.1525822 0.1988263
## 2: 1986 0.00000000 0.09760920 0.23530469 0.4391515 0.5398166
## 3: 1987 0.05297673 0.29702970 0.39642536 0.4317860 0.5270670
## 4: 1988 0.02468718 0.08601060 0.18453387 0.1604103        NA
## 5: 1989 0.09259436 0.39082656 0.36321070        NA        NA
## 6: 1990 0.00000000 0.28925413         NA        NA        NA
## 7: 1991 0.07239960         NA         NA        NA        NA

The approach involves first developing the most mature accident years to ultimate, then working backward to determine ultimate amounts for less mature years. In this case, we use the assumption that the 60-to-ultimate factor is 1.1.

ult_60 = 1.1
brosius.triangle.2[, LR_ult := LR_60 * ult_60]
brosius.triangle.2
##      AY      LR_12      LR_24      LR_36     LR_48     LR_60    LR_ult
## 1: 1985 0.02394366 0.02441315 0.04906103 0.1525822 0.1988263 0.2187089
## 2: 1986 0.00000000 0.09760920 0.23530469 0.4391515 0.5398166 0.5937983
## 3: 1987 0.05297673 0.29702970 0.39642536 0.4317860 0.5270670 0.5797737
## 4: 1988 0.02468718 0.08601060 0.18453387 0.1604103        NA        NA
## 5: 1989 0.09259436 0.39082656 0.36321070        NA        NA        NA
## 6: 1990 0.00000000 0.28925413         NA        NA        NA        NA
## 7: 1991 0.07239960         NA         NA        NA        NA        NA

The ultimate value for AY1988 is obtained through linear regression of the ultimate loss ratio on the 48-month loss ratios, for AY 1985 to 1987. The value for AY 1988 at 48 months is used to produce a prediction of AY 1988 at ultimate.

regression.data.48 = brosius.triangle.2[,.(AY, LR_48, LR_ult)]
regression.48 = lm(LR_ult ~ LR_48, data = regression.data.48)
summary(regression.48)
## 
## Call:
## lm(formula = LR_ult ~ LR_48, data = regression.data.48)
## 
## Residuals:
##          1          2          3 
##  5.779e-05  2.190e-03 -2.248e-03 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 0.020073   0.004978   4.032  0.15476   
## LR_48       1.301453   0.013589  95.774  0.00665 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003139 on 1 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9998 
## F-statistic:  9173 on 1 and 1 DF,  p-value: 0.006647
regression.data.48$prediction = predict(regression.48, newdata = regression.data.48)
brosius.triangle.2[AY == 1988, LR_ult := regression.data.48[AY == 1988, prediction]]
brosius.triangle.2
##      AY      LR_12      LR_24      LR_36     LR_48     LR_60    LR_ult
## 1: 1985 0.02394366 0.02441315 0.04906103 0.1525822 0.1988263 0.2187089
## 2: 1986 0.00000000 0.09760920 0.23530469 0.4391515 0.5398166 0.5937983
## 3: 1987 0.05297673 0.29702970 0.39642536 0.4317860 0.5270670 0.5797737
## 4: 1988 0.02468718 0.08601060 0.18453387 0.1604103        NA 0.2288391
## 5: 1989 0.09259436 0.39082656 0.36321070        NA        NA        NA
## 6: 1990 0.00000000 0.28925413         NA        NA        NA        NA
## 7: 1991 0.07239960         NA         NA        NA        NA        NA

Brosuius calculates supplementary information, such as the credibility weight, as a reasonability check:

d.48 = regression.data.48[AY <= 1987, mean(LR_48) / mean(LR_ult)]
d.48
## [1] 0.7351388
Z = d.48 * regression.48$coefficients[[2]]
Z
## [1] 0.9567489
regression.48$coefficients[[1]]
## [1] 0.02007257

Credibility assigned to the link ratio method is expected to be high for mature accident years, because most of the variance arises from the claim occurrence process rather than the reporting process by this point. We now work backward, using the predicitions of the ultimate values from previous steps as the repsponse variable in each regression. For AY 1989, we have:

regression.data.36 = brosius.triangle.2[,.(AY, LR_36, LR_ult)]
regression.36 = lm(LR_ult ~ LR_36, data = regression.data.36)
summary(regression.36)
## 
## Call:
## lm(formula = LR_ult ~ LR_36, data = regression.data.36)
## 
## Residuals:
##        1        2        3        4 
##  0.00787  0.16646 -0.03485 -0.13948 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.1538     0.1562   0.985    0.429
## LR_36         1.1624     0.6261   1.857    0.204
## 
## Residual standard error: 0.1556 on 2 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.6328, Adjusted R-squared:  0.4492 
## F-statistic: 3.447 on 1 and 2 DF,  p-value: 0.2045
regression.data.36$prediction = predict(regression.36, newdata = regression.data.36)
brosius.triangle.2[AY == 1989, LR_ult := regression.data.36[AY == 1989, prediction]]
brosius.triangle.2
##      AY      LR_12      LR_24      LR_36     LR_48     LR_60    LR_ult
## 1: 1985 0.02394366 0.02441315 0.04906103 0.1525822 0.1988263 0.2187089
## 2: 1986 0.00000000 0.09760920 0.23530469 0.4391515 0.5398166 0.5937983
## 3: 1987 0.05297673 0.29702970 0.39642536 0.4317860 0.5270670 0.5797737
## 4: 1988 0.02468718 0.08601060 0.18453387 0.1604103        NA 0.2288391
## 5: 1989 0.09259436 0.39082656 0.36321070        NA        NA 0.5760180
## 6: 1990 0.00000000 0.28925413         NA        NA        NA        NA
## 7: 1991 0.07239960         NA         NA        NA        NA        NA
d.36 = regression.data.36[AY <= 1988, mean(LR_36) / mean(LR_ult)]
d.36
## [1] 0.5337822
Z = d.36 * regression.36$coefficients[[2]]
Z
## [1] 0.6204876
regression.36$coefficients[[1]]
## [1] 0.1538088

For AY 1990:

regression.data.24 = brosius.triangle.2[,.(AY, LR_24, LR_ult)]
regression.24 = lm(LR_ult ~ LR_24, data = regression.data.24)
summary(regression.24)
## 
## Call:
## lm(formula = LR_ult ~ LR_24, data = regression.data.24)
## 
## Residuals:
##        1        2        3        4        5 
## -0.08386  0.22650  0.03613 -0.12820 -0.05057 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.2810     0.1176   2.389   0.0968 .
## LR_24         0.8843     0.5172   1.710   0.1859  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1619 on 3 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.4935, Adjusted R-squared:  0.3247 
## F-statistic: 2.923 on 1 and 3 DF,  p-value: 0.1859
regression.data.24$prediction = predict(regression.24, newdata = regression.data.24)
brosius.triangle.2[AY == 1990, LR_ult := regression.data.24[AY == 1990, prediction]]
brosius.triangle.2
##      AY      LR_12      LR_24      LR_36     LR_48     LR_60    LR_ult
## 1: 1985 0.02394366 0.02441315 0.04906103 0.1525822 0.1988263 0.2187089
## 2: 1986 0.00000000 0.09760920 0.23530469 0.4391515 0.5398166 0.5937983
## 3: 1987 0.05297673 0.29702970 0.39642536 0.4317860 0.5270670 0.5797737
## 4: 1988 0.02468718 0.08601060 0.18453387 0.1604103        NA 0.2288391
## 5: 1989 0.09259436 0.39082656 0.36321070        NA        NA 0.5760180
## 6: 1990 0.00000000 0.28925413         NA        NA        NA 0.5367703
## 7: 1991 0.07239960         NA         NA        NA        NA        NA
d.24 = regression.data.24[AY <= 1989, mean(LR_24) / mean(LR_ult)]
d.24
## [1] 0.4077528
Z = d.24 * regression.24$coefficients[[2]]
Z
## [1] 0.3605842
regression.24$coefficients[[1]]
## [1] 0.280977

For AY 1991:

regression.data.12 = brosius.triangle.2[,.(AY, LR_12, LR_ult)]
regression.12 = lm(LR_ult ~ LR_12, data = regression.data.12)
summary(regression.12)
## 
## Call:
## lm(formula = LR_ult ~ LR_12, data = regression.data.12)
## 
## Residuals:
##       1       2       3       4       5       6 
## -0.2283  0.1714  0.1030 -0.2189  0.0585  0.1144 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.4224     0.1142   3.697   0.0209 *
## LR_12         1.0272     2.4967   0.411   0.7018  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1978 on 4 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0406, Adjusted R-squared:  -0.1992 
## F-statistic: 0.1693 on 1 and 4 DF,  p-value: 0.7018
regression.data.12$prediction = predict(regression.12, newdata = regression.data.12)
brosius.triangle.2[AY == 1991, LR_ult := regression.data.12[AY == 1991, prediction]]
brosius.triangle.2
##      AY      LR_12      LR_24      LR_36     LR_48     LR_60    LR_ult
## 1: 1985 0.02394366 0.02441315 0.04906103 0.1525822 0.1988263 0.2187089
## 2: 1986 0.00000000 0.09760920 0.23530469 0.4391515 0.5398166 0.5937983
## 3: 1987 0.05297673 0.29702970 0.39642536 0.4317860 0.5270670 0.5797737
## 4: 1988 0.02468718 0.08601060 0.18453387 0.1604103        NA 0.2288391
## 5: 1989 0.09259436 0.39082656 0.36321070        NA        NA 0.5760180
## 6: 1990 0.00000000 0.28925413         NA        NA        NA 0.5367703
## 7: 1991 0.07239960         NA         NA        NA        NA 0.4967743
d.12 = regression.data.12[AY <= 1990, mean(LR_12) / mean(LR_ult)]
d.12
## [1] 0.07103454
Z = d.12 * regression.12$coefficients[[2]]
Z
## [1] 0.07296916
regression.12$coefficients[[1]]
## [1] 0.4224029
Two reasonability checks demonstrated by the above results:

The ultimate loss amounts can be calculated by multiplying the ultimate loss ratio by the earned premium:

brosius.triangle = merge(brosius.triangle, brosius.triangle.2[, .(AY, LR_ult)], by = "AY")
brosius.triangle[, ultimate_loss := EP * LR_ult]
brosius.triangle
##      AY    EP rep_loss_12 rep_loss_24 rep_loss_36 rep_loss_48 rep_loss_60
## 1: 1985  4260         102         104         209         650         847
## 2: 1986  5563           0         543        1309        2443        3003
## 3: 1987  7777         412        2310        3083        3358        4099
## 4: 1988  8871         219         763        1637        1423          NA
## 5: 1989 10465         969        4090        3801          NA          NA
## 6: 1990 11986           0        3467          NA          NA          NA
## 7: 1991 12873         932          NA          NA          NA          NA
##       LR_ult ultimate_loss
## 1: 0.2187089       931.700
## 2: 0.5937983      3303.300
## 3: 0.5797737      4508.900
## 4: 0.2288391      2030.032
## 5: 0.5760180      6028.028
## 6: 0.5367703      6433.729
## 7: 0.4967743      6394.975