Introduction
This is a tutorial for the R software that uses some of the features in the ChainLadder package along with sample loss reserving data from the CAS website. It also includes a short summary of the models used in the script.
The script is based on the demo found in the chain ladder package with some added examples and links. We hope to improve this. Please contact moc.norwob-reprek|nwahs#moc.norwob-reprek|nwahs for more information.
R code will appear like this
Click here for a version of this guide that can be pasted directly into R.
Data and Packages
You will need loss reserving and other liability data which can be found on the CAS website.
This example uses the Chain Ladder and the plyr package. Both need to be installed in order to be called. To load the packages, execute the following commands:
library(ChainLadder)
library(plyr)
Data Input
L.M.B. Template
Here is a data set (right click link, "Save as") derived from the CAS data that mimics the format of data commonly used in practice. You may even use your own data if the format is the same as our data.
If you wish to do this, please first read the CAS Data section for an explanation of the following code, then proceed to the Mack Chain Ladder section.
OL <- read.csv("LMB_Template.csv", header = TRUE, sep = ",")
LossTri <- as.triangle(OL, origin="AccidentYear", dev="DevelopmentLag", value="CumIncurLoss")
PaidTri <- as.triangle(OL, origin="AccidentYear", dev="DevelopmentLag", value="CumPaidLoss")
Execute this code to print the data in triangle format:
LossTri
PaidTri
This code is explained in the "Converting to Triangle Format" section.
CAS Data
Bring in Data
First bring your data into R. Arguments in the function:
- header: TRUE if there are column names in the file being read in
- sep: This is the delimiter in your file
OthLiabData <- read.csv("http://www.casact.org/research/reserve_data/othliab_pos.csv", header = TRUE, sep = ",")}}
Summarizing Data
Instead of having the data separated by insurance company, this function will sum the data subsets of the reported incurred losses, cumulative paid losses, and the earned premiums; then it will arrange the information by accident year, development year, and development lag.
Note: The incurred loss data in the Excel file is the total loss reserve, and the bulk loss data is the IBNR data. Therefore, to get the reported incurred data subtract the incurred loss data by the bulk loss data.
Arguments in the function:
- OthLiabData is the data frame that will be split.
- .(…): the variables to spilt the data frame
- summarise: (Not a misspelling) a function that creates new columns for the summed data
- sum: Sums each subset of data by the variable given in quotations and rename the new columns of output
SumData <- ddply(OthLiabData,.(AccidentYear,DevelopmentYear,DevelopmentLag),summarise,IncurLoss=sum(IncurLoss_H1-BulkLoss_H1),CumPaidLoss=sum(CumPaidLoss_H1), EarnedPremDIR=sum(EarnedPremDIR_H1))
Converting to Triangle Format
To make loss triangle, exclude any developments after 1997, execute:
OL <- SumData[SumData$DevelopmentYear<1998,]
as.triangle organizes data table into triangle format. The arguments follow:
- origin: row names of the triangle
- dev: column names of the triangle
- value: column and rows input, generally the claim amounts, paid losses, etc
LossTri <- as.triangle(OL, origin="AccidentYear", dev="DevelopmentLag", value="IncurLoss")
PaidTri <- as.triangle(OL, origin="AccidentYear", dev="DevelopmentLag", value="CumPaidLoss")
Execute this code to print the data in triangle format:
LossTri
PaidTri
Functions
Mack Chain Ladder
The MackChainLadder model uses the chain ladder approach for predicting ultimate and IBNR values for each row(in this case accident year) for a cumulative loss triangle. This model provides several methods for predicting the ultimates, IBNRs, and the standard error of the IBNRs. The default method of the model predicts the ultimate values using chain ladder ratios with the assumption of no tail factor, and the standard error of the ultimates are approximated using a log linear model. The model also has the option to use two other ratios in the place of the chain ladder ratio, which are the simple average and the weighted average of the development ratios. There is the Mack method available to estimate the standard error of the ultimates, and this method is described in detail here.
A tail factor can also be included in the predictions of the ultimate values, with the options of entering in the tail estimations manually or using a log linear regression for the tail estimations.
Note: In order to use Mack's estimation for standard error, three assumptions need to be true. These are found on MackChainLadder info page in the details section, which is accessible by entering ?MackChainLadder into R.
MackChainLadder uses the following arguments:
- LossTri: the cumulative loss triangle.
- alpha: it is the ratio used in the prediction of ultimate values, alpha=1 (default) is the chain ladder ratio, alpha=0 is the simple average of the development ratios, and alpha=2 is the weighted average of the development ratios.
- weights: Default: 1, which sets the weights for all triangle entries to 1. Otherwise specify weights as a matrix of the same dimension as Triangle with all weight entries set to either 0 or 1.
- tail: can be logical or a numeric value. If tail=FALSE no tail factor will be applied (default), if tail=TRUE a tail factor will be estimated via a linear extrapolation of log(Chain Factors-1). If tail is a numeric value (>1) than this value will be used instead.
- tail.se and tail.sigma: the standard error and variation of each tail factor (respectively). Both are only needed if there is a tail factor and you have a numeric (>1) value to enter. Otherwise they are NULL and the model estimates them by a log-linear regression.
- est.sig: the method used to estimate the standard error of the IBNRs. Default is "log-linear", or it can be estimated by Mack's method mentioned above by making the argument "Mack".
Keep in mind by not passing in arguments, the defaults are assumed. Execute the following code to use the Mack Chain Ladder function:
MCL <- MackChainLadder(LossTri,est.sig="Mack")
Executing MCL will print the the following columns of information per accident year (origin period):
- Latest: the claim amount for the last development period
- Dev.To.Date: the development to date or the ratio of the latest over the predicted ultimate
- Ultimate: predicted ultimate claim
- IBNR: the predicted IBNR reserve
- Mack.S.E.: the standard error, or the standard deviation of the bounds for the predicted ultimate and IBNR since the estimate is unbiased(shown in Mack's 1999 paper). In other words, since the S.E given is equal to one standard deviation, a confidence interval for the true ultimate value can be found using the standard error and the predicted ultimate.
- CV(IBNR): coefficient of variation, or the ratio of the standard error over the predicted IBNR
The bottom output gives a total or sum of the latest, ultimates, IBNRs. It also gives the standard error of the total ultimate (this is not the total of the standard errors).The development to date factor is the ratio of the total latest against the total ultimate, and the CV(IBNR) is the percentage of the total standard error in the total IBNR
If the CV(absolute value) is greater than 25%, then another model or a log linear regression should be used. Here are some more useful functions:
plot(MCL)
Plots six different graphs starting from the top left with a stacked bar-chart of the latest claims position plus IBNR and Mack’s standard error by origin period; next right to it is a plot of the forecasted development patterns for all origin periods (numbered, starting with 1 for the oldest origin period), and 4 residual plots. The residual plots show the standardized residuals against fitted values, origin period, calendar period and development period.
The residual plots should be scattered with no pattern or direction for Mack's method of calculating the standard error to apply. Patterns could be a result of a trend that should be investigated further. More information on that can here found here.
The MackChainLadder also stores other valuable data that you can print such as the chain ladder ratios, the standard error of the chain ladder ratios, explained and unexplained variability, etc.
names(MCL)
This lists the data contained in the MackChainLadder variable. Any values listed under the "names" can be accessed and printed. Simply type the model name (MCL in our example) followed by the "$" sign and then type the name like the examples below.
Prints the chain ladder ratios:
MCL$f
This prints the entire forecasted triangle:
MCL$FullTriangle
Tail Factors
Below are some examples using tail factors:
MackChainLadder(MCL, est.sigma="MACK", tail=TRUE)
Using this code will make Mack Chain Ladder estimate the tail factors using the process described earlier.
MackChainLadder(MCL, est.sigma="MACK", tail=1.05, tail.se=.75, tail.sigma=1.25)
In this hypothetical situation, the tail factor, standard error, and standard deviation have been estimated beforehand.
MackChainLadder(MCL, est.sigma="MACK", tail=1.05)
In this hypothetical situation, the tail factor was found beforehand, but Mack Chain Ladder estimates tail.se and tail.sigma.
Munich Chain Ladder
The Munich-chain-ladder model predicts ultimate claims based on a cumulative paid and incurred claims triangle. This model uses the correlation between incurred losses and paid losses to make future projections for both the total paid and incurred ultimate. The "Munich" ratios are calculated using chain ladder ratios, paid/incurred ratios, and the slope of the regression line in the residual plot of the incurred (or paid) losses. For a better idea on how theseratios are calculated, please read this paper.
The standard error of the incurred and paid ultimate for this model can be calculated either by a log linear regression (default), or by Mack's method which is the same as in the MackChainLadder model, or a combination of the two. This model also has the option for the inclusion of a tail factor in ether ultimate calculation.
MunichChainLadder uses the following arguments:
- The first (and necessary) argument is the incurred loss triangle
- The second (and necessary) argument is the cumulative paid triangle
- est.sigmaI: how the standard error for the incurred loss triangle is calculated, either "loglinear"(default) or "Mack"
- est.sigmaP: how the standard error for the paid loss triangle is calculated, either "loglinear"(default) or "Mack"
- tailP: defines how the tail for the paid loss triangle is calculated, if TRUE then a log linear regression is used to estimate the tail factor, or a numeric value (>1) can be entered for a tail factor. Otherwise FALSE (default) and no tail is included in the model.
- tailI: Defines how the tail for the incurred loss triangle is calculated, if TRUE then a log linear regression is used to estimate the tail factor, or a numeric value (>1) can be entered for a tail factor. Otherwise FALSE(default) and no tail is included in the model.
Recall that if any argument is not present then the default value is assumed.
MuCL <- MunichChainLadder(PaidTri,LossTri)
MuCL will print the following output for each accident year (origin year):
- Latest Paid/Incurred: the latest development for the paid and incurred triangles
- P/I Ratio: the latest paid over the latest incurred developments
- Ult. Paid/Ult. Incurred: the predicted ultimate values in the paid and incurred triangles
- Ult. P/I Ratio: the ratio of the predicted paid ultimate over the predicted incurred ultimate
At the bottom under "Totals", the output gives the sums of the incurred and paid last developments and the predicted total ultimate for each. It also gives the total latest P/I ratio and the predicted total ultimate P/I ratio.
It is important to note that the ultimate P/I ratios should be close to 1. If they are not, then further investigation is needed in order to determine if this is a good model for the data.
Plotting the Munich results serves to give a quick overview of the data, and to check the residual plots:
plot(MuCL)
Prints four graphs starting from the top left: a barchart of forecasted ultimate claims costs by Munich Chain Ladder (MCL) on paid and incurred data by origin period. The barchart next to it compares the ratio of forecasted ultimate claims cost on paid and incurred data based on the Mack Chain Ladder and Munich Chain Ladder methods. The two residual plots at the bottom show the correlation of "Munich"(P/I) ratios against the paid chain ladder ratios and the correlation of "Munich"(P/I) ratios against the incurred chain ladder ratios.
The residual plots should be random and show no pattern or direction. Otherwise, a better model might be needed and this matter should be investigated more.
Like the Mack Model, multiple values are stored in the Munich model. For example the Munich forecasts of the Paid and Incurred triangles:
MuCL$MCLPaid
MuCL$MCLIncurred
There are several other values contained in the Model as well, these can found by the function names(MuCL).
To print any of the values, add the name after MuCL$ like in the Mack example above.
BootChainLadder
The BootChainLadder is a model that provides a predicted distribution for the IBNR values for a claims triangle. However, this model predicts IBNR values by a different method than the previous two models. First, the development factors are calculated and then they are used in a backwards recursion to predict values for the past loss triangle. Then the predicted values and the actual values are used to calculate Pearson residuals. The residuals are adjusted by a formula specified in appendix 3 of this paper.
Using the adjusted residuals and the predicted losses from before, the model solves for the actual losses in the Pearson formula and forms a new loss triangle. The steps for predicting past losses and residuals are then repeated for this new triangle. After that, the model uses chain ladder ratios to predict the future losses then calculates the ultimate and IBNR values like in the previous Mack model. This cycle is performed R times, depending on the argument values in the model (default is 999 times). The IBNR for each origin period is calculated from each triangle (the default 999) and used to form a predictive distribution, from which summary statistics are obtained such as mean, prediction error, and quantiles.
The BootChainLadder model takes in the following arguments:
- The first argument is the cumulative claims triangle
- R: the number of bootstraps(the default is 999)
- process.distr: the way the process error is calculated for each predicted IBNR values with the options of "gamma"(default) and "od.pois" (over dispersed Poisson)
B <- BootChainLadder(LossTri, R=5000)
B
The output has some of the same values as the Munich and Mack models did. The Mean and SD IBNR is the average and the standard deviation of the predictive distribution of the IBNRs for each origin year
The output also gives the 75% and 95% quantiles of the predictive distribution of IBNRs, in other words 95% or 75% of the predicted IBNRs lie at or below the given values.
Comparing Predictions with Actual Developments
We can compare the actual incurred loss triangle with our models.
The TrueUlt variable is the sum of all the actual ultimates to compare to the predicted total ultimate:
actualTri <- as.triangle(SumData, origin="AccidentYear", dev="DevelopmentLag", value="IncurLoss")
ActualUlt <- sum(actualTri[,10])
actualTri
ActualUlt
The actual losses can be compared to the predicted losses by the chain ladder method