Blog » Modeling ALAE Using Copulas
Posted by Greg McNulty on 09 Dec 2015 22:16
Introduction:
Imagine you are a reinsurance pricing actuary tasked with pricing (or costing) an excess of loss contract. A typical method would be to determine the expected number of claims excess of some threshold, and then to also chose a severity distribution representing the probability of different sizes of loss above that threshold. A pareto curve would be a typical example or you also might use a semi-parametric mixed exponential distribution. Assuming these distributions represent only the ceding company's incurred loss, you can also apply their limit profile to get what's called an exposure estimate of the loss to the layer.
But what about the ceding company's loss adjustment expenses, commonly known as ALAE? For many lines of business these expenses are covered in addition to the insured's policy limit, and that is the case we will assume here. Usually an excess of loss reinsurance contract will cover some of the ceding company's ALAE for claims in the layer, or even below the layer. There are two common reinsurance treatments: ALAE included which means you add the indemnity and ALAE together and the reinsurer is responsible for however much of that is in the layer, or ALAE pro-rata which means the reinsurer pays the same percentage of the ALAE as it paid of the loss.
So we need to adjust our exposure loss cost estimate for ALAE. The traditional, and still very common, way this is done is to select an overall ratio of ALAE to loss, e.g. 5% or perhaps 20%, and then multiply each indemnity value by that amount to determine the amount of ALAE for that claim. For example, with a 20% ALAE load a $1M indemnity loss would have exactly $200k of ALAE, and every $1M claim would have exactly that same amount of ALAE.
While this seems reasonable it actually makes two very strong implicit assumptions. It forces the distribution of ALAE to be a scaled copy of the distribution of indemnity and it forces the two to be 100% correlated. We might suspect that the ALAE distribution is not a scaled copy of the indemnity distribution especially if there is a significant effect of policy limit capping, which there often is. A $1M indemnity limit and a 20% ALAE load implies a maximum possible ALAE incurred of $200k. We will look at the data further down to evaluate the correlation assumption.
Quick Theory of Bivariate Distributions:
Claim data with indemnity and ALAE information is an example of a bivariate distribution: random points living on the x-y plane with a certain probability distribution. At first, you might recall all the one dimensional distributions actuaries use, pareto, gamma, normal, exponential, etc., and think there must be that many bivariate distributions squared. Luckily, Sklar’s theorem states that a bivariate distribution is completely defined by the marginal distribution of the variables, i.e. the univariate distribution of x by itself and y by itself, and the copula which relates their cumulative distribution functions:
(1)A copula is simply a bivariate (or multivariate) distribution on the unit square (or cube, etc.). This means that we can fit univariate distributions to ALAE and indemnity each in isolation, something more actuaries are comfortable with, and then fit a copula to the bivariate ALAE-indemnity data transformed to $[0,1]\times[0,1]$ without worrying about any loss of information.
See the references at the bottom for more information.
The Goal
Back to the premise: you are a reinsurance actuary trying to price an excess of loss contract. You already have a severity curve for indemnity, as discussed in the first paragraph, and an estimate of the expected number of claims excess of a certain threshold. You have a dataset of indemnity and ALAE amounts for a set of claims. We will not worry about trend or development (assume everything is trended and at ultimate already). What you are trying to do is refine the traditional assumption of ALAE as a fixed percent of indemnity and use a copula to model the bivariate nature of claims.
What we will do in this blog is present and walk through R code that performs this analysis step by step. If you download the .csv files attached (see link at bottom of page), you should be able to follow along and reproduce the results. We will discuss places where changes could be made as well. The full R code is embedded in the blog and also attached.
Step 1: Load R Libraries Required for Running the Code
Most of the functions we use have thankfully been programmed by somebody else. Loading these packages gives us access to all those functions. R can automatically download these or you might have to manually download and unzip them.
library("actuar")
library("stats")
library("stats4")
library("copula")
library("distr")
Step 2. Input and Setup Data
We started by importing a file containing the loss data with a column for indemnity (I will refer to indemnity as loss in the code, hopefully it will be clear from the context) and ALAE. The first handful of rows are shown below the code.
setwd("c:\\directory")
# Change this to where the .csv file with the data is stored
copula_data <- read.csv("copula_data.csv", header = TRUE)
loss | alae |
---|---|
4468750 | 5571.34 |
3490000 | 808363.4 |
3450000 | 180757 |
3425000 | 713552.2 |
2895000 | 394624.5 |
1958925 | 36074.6 |
1626762 | 326741.5 |
990000 | 936229.8 |
980885.4 | 13156.44 |
Here was a quick look at the summary statistics of our data:
summary(copula_data)
loss | ALAE | ||
---|---|---|---|
Min. : | 100000 | Min. : | 0 |
1st Qu.: | 132812 | 1st Qu.: | 12390 |
Median : | 197500 | Median : | 48396 |
Mean : | 374647 | Mean : | 115491 |
3rd Qu.: | 336250 | 3rd Qu.: | 126048 |
Max. : | 4468750 | Max. : | 1767626 |
We removed any ALAE data points at exactly 0. This will allow us to fit curves to the logarithm of the data. An adjustment could be made at the end to add back in the probability of 0 ALAE but I have not done that here.
alae_data <- copula_data$alae
alae_data <- alae_data[alae_data>0]
#The ALAE data will have a point mass at 0 which our fitted distributions do not account for.
Finally we transformed the data to be in $[0,1]\times[0,1]$ by using the rank function and then dividing by the number of data points plus one. Some of the copula fitting procedures break down if there are ties or repeats in the data, so we applied a tie-break procedure which just randomly selects one of the equal entries to be ranked ahead of the other.
set.seed(123)
rankdata <- sapply(copula_data , rank, ties.method = "random") / (nrow(copula_data) + 1)
A quick look at the data (log scale with liner trendline) showed the 100% correlation assumption to be unrealistic:
plot(copula_data, log="yx")
abline(lm(copula_data[,2]~copula_data[,1]), untf = "TRUE")

Step 3. ALAE Distribution Fitting
There are so many resources discussing curve fitting for univariate data I won’t go into the theory but the following code took the ALAE data and used maximum likelihood to fit a pareto, log-gamma, Weibull and lognormal distribution:
nLL1 <- function(mu, sigma) -sum(stats::dlnorm(alae_data, mu, sigma, log = TRUE))
# This defines a function giving the logliklihood of the data for a given set of distribution parameters. This line is for the log-normal distribution.
alae_fit1<-mle(nLL1, start = list(mu = 10, sigma = 1), method = "L-BFGS-B", nobs = length(alae_data), lower = c(1,0.01))
#This finds the distribution parameters minimizing the function defined above, i.e. the maximum liklihood fit.
nLL2 <- function(shape, scale) -sum(stats::dweibull(alae_data, shape, scale, log = TRUE))
# Same as above but for the Weibull distribution.
alae_fit2<-mle(nLL2, start = list(shape = 1, scale = 50000), method = "L-BFGS-B", nobs = length(alae_data), lower = c(0.1,100))
nLL3 <- function(shapelog, ratelog) -sum(actuar::dlgamma(alae_data, shapelog, ratelog, log = TRUE))
#Log-gamma distribution.
alae_fit3<-mle(nLL3, start = list(shapelog = 60, ratelog = 1), method = "L-BFGS-B", nobs = length(alae_data), lower = c(0.01,0.01))
nLL4 <- function(shape, scale) -sum(actuar::dpareto(alae_data, shape, scale, log = TRUE))
# Pareto distribution.
alae_fit4<-mle(nLL4, start = list(shape = 1.1, scale = 1000), method = "L-BFGS-B", nobs = length(alae_data), lower = c(1,100))
# The following code created a graph displaying each of the fitted curves against the empirical distribution of the data. The x-axis is ALAE amount and the y-axis is cumulative probability of ALAE < y.
x<-seq(0,max(alae_data),max(alae_data)/1000) # This defines the x-axis range for the following graph to encompas all ALAE data.
plot(x,plnorm(x,coef(alae_fit1)[1], coef(alae_fit1)[2]),type="l",col="red", main="ECDF vs Curve Fits")
#We give each distribution a different color as indicated in the code below.
lines(x,pweibull(x,coef(alae_fit2)[1], coef(alae_fit2)[2]),type="l",col="blue")
lines(x,plgamma(x,coef(alae_fit3)[1], coef(alae_fit3)[2]),type="l",col="orange")
lines(x,ppareto(x,coef(alae_fit4)[1], coef(alae_fit4)[2]),type="l",col="green")
plot(ecdf(alae_data),add=TRUE)
# At this point the user may select which fit they think is the best. I typically found the pareto to be the best fit and so the rest of the code assumes the pareto distribution is chosen. The code can be modified to make a different selection.
summary(alae_fit4)
alae_fit4@coef
If everything worked correctly you should see this graph:

The graph shows each of the 4 fitted distributions against the actual cumulative distribution of ALAE amounts. From practice the pareto often seems to be the best fit as it is here in green.
Step 4. Copula Fitting
There are two aspects to copula fitting. Just like with univariate distributions, there are different families of distributions and then within a family any given dataset will have a best fit member (according to some goodness of fit measure). From my own experimentation I found that the best fitting copula family to various datasets of liability indemnity and ALAE was the Gumbel copula. This was also chosen as the best family in Frees and Valdez [3], Micocci [9] and Venter [12]. The Gumbel has the desirable properties of being single parameter, an extreme-value copula (meaning it's appropriate for right tailed truncated data as we often work with in reinsurance), and has a closed form expression for traditional product-moment correlation and upper tail correlation.
So for this exercise we assumed the Gumbel to be the correct copula family and then fit the best parameter (as it is a single parameter copula) using the maximum likelihood option in the 'copula' package's fitCopula method:
fitted_copula<-fitCopula(gumbelCopula(dim=2), rankdata, method = "ml")
#We have assumed use of the Gumbel copula.
summary(fitted_copula)
theta <- fitted_copula@copula@parameters
tail_correlation <- 2 - 2^(1/theta)
tail_correlation
# The upper tail correlation is one way of uniquely describing a member of a one-parameter copula family (e.g. Gumbel).
plot(rankdata)
In my trials with different datasets of casualty large loss data, I had fitted tail correlations of between 0.2 and 0.4. This is on a scale of 0 to 1 for the upper tail correlation measure, as opposed to -1 to 1 for the traditional Pearson correlation measure. After running the code you should have seen a plot of the empirical copula data:

You can see the upper tail correlation since there is a cluster of points in the upper right hand corner. That means that when indemnity is large or in the upper quantiles, then ALAE also tends to be relatively large, or in the upper quantiles of the ALAE distribution. You can also see even more distinctly the lack of points in the upper-left and lower-right corners. This means that it is very rare to have a small ALAE amount accompanying a large loss amount and vice versa. If ALAE and indemnity were independent, then the points would be uniformly scattered across the entire square and you would see a similar number of points in each of the four corners.
Step 5. Indemnity Exposure Curve Loading
indexpo_data<-read.csv("indemnity_curve.csv", header = FALSE)
#This assumes the user already has an indemnity distribution, i.e. exposure curve, they want to use.
indexpo_data[1001, 2] <- 1
supp = indexpo_data[2:1001, 1]
probs = indexpo_data[, 2][-1] - indexpo_data[1:1000, 2]
indemnity<-DiscreteDistribution(supp, probs)
#This just defines the cumulative distribution function of the exposure curve as a distribution object in R.
As mentioned above, we are not fitting a cure to the indemnity data, we are assuming that you already have an empirical distribution representing projected indemnity amounts. This is usually based on the types of business written by the ceding company and at what limits.
The minimum value in the indemnity severity distribution is $100,000. This is what we will call the model threshold. This just simplifies the analysis by not requiring us to know about the severity distribution far below the reinsurance attachment point.
Step 6. Plot the Selected Bivariate Distribution Against the Actual Data
With both marginal distributions and a copula we now have a full model for the ALAE-indemnity bivariate distribution. We can plot some simulated data against the actual data to visually assess the model for any blatant errors:
simcopula <- rCopula(nrow(copula_data), fitted_copula@copula)
# This simulates as many random draws from the fitted copula as there are in the original dataset.
simloss <- cbind(distr::q(indemnity)(simcopula[,1]), qpareto(simcopula[,2], alae_fit4@coef["shape"], alae_fit4@coef["scale"]))
# This uses the cumulative ALAE and loss distributions to transform the copula data, which is in terms of percentiles, into loss/ALAE amounts.
plot(copula_data, xlim = c(0, 10^7), ylim = c(0, 1.5*10^6), main = "input data v simulated data", xlab = "indemnity", ylab = "ALAE")
par(new=T)
plot(simloss, axes = F, type = "p", col=2, xlim = c(0, 10^7), ylim = c(0, 1.5*10^6), xlab = "", ylab = "")
par(new=F)
# This is a plot of the actual data versus simulated data.
If all went according to plan, you should see a plot similar to this:

Of course, the above plot may be of limited value because we’ve only simulated as many points from the fitted distribution as there are points in the original dataset, so even two such plots form the same fitted distribution may look very different.
Step 7. Simulation and Creation of a Loss Plus ALAE Distribution
The final output is based on empirical simulation, not theoretical integration, so the first step is to simulate from our fitted bivariate distribution:
n_simulations <-100000
# define the number of simulations to do for creation of the curve
simcopula <- rCopula(n_simulations, fitted_copula@copula)
simloss <- cbind(distr::q(indemnity)(simcopula[,1]), qpareto(simcopula[,2], alae_fit4@coef["shape"], alae_fit4@coef["scale"]))
Interestingly, we need to simulate only from the copula distribution. These points live on the unit square, then we use the “q” function of the fitted distributions to convert from cumulative percentiles to x-values of ALAE and indemnity.
The first type of output we can create is an “ALAE Included” severity cumulative distribution. We add the simulated indemnity and ALAE amounts together for each simulated point, and rank them to create a univariate distribution. This could come in handy if we are trying to evaluate or simulate multiple layers simultaneously in excel or some other program, but need a single univariate severity distribution. It should be noted that in the ALAE pro-rata reinsurance case, a univariate distribution cannot replicate the loss to the layer for each claim. We export the resulting empiric distribution to a .csv:
loss_alaeinc<-simloss[,1]+simloss[,2]
n_points <-1000
#Define the number of points desired for the final empirical loss plus alae distribution. Should be orders of magnitude less than n_simulations for stability.
cumulative_prob<-c(1:n_points)/n_points
loss_dist_alaeinc <- cbind(quantile(loss_alaeinc, cumulative_prob), cumulative_prob)
write.csv(loss_dist_alaeinc, "loss_dist_alaeinc.csv")
#Output the empirical distribution based on simulated data to a .csv file (assuming this is desired for use in some other program, spreadsheet, etc.)
Step 8a. Creation of a Loss Cost Exhibit: ALAE Included Treatment
The other form of output is a summary of loss statistics for the layers of interest. The final goal is to have an estimate for the expected loss, frequency, severity and standard deviation of loss for each layer as these are typically used in the determination of the price or cost of the reinsurance.
We start with a .csv file with the desired reinsurance limit and attachment for the layers of interest. We also have blank spaces where we will use R to enter the statistics of interest. Here is what the input file looks like:
limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|
250000 | 250000 | 0 | 0 | 0 | 0 |
500000 | 500000 | 0 | 0 | 0 | 0 |
1000000 | 1000000 | 0 | 0 | 0 | 0 |
3000000 | 2000000 | 0 | 0 | 0 | 0 |
5000000 | 5000000 | 0 | 0 | 0 | 0 |
The only additional item of information we need is the frequency at the model threshold, which is our minimum modeled indemnity loss amount of $100,000. We assume this number has been estimated already and is given. Here is the code in the ALAE included reinsurance case:
freq_at_threshold <- 16
loss_cost_exhibit_alaeinc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE)
layer <- function(x, limit, attachment) pmax(0, pmin(limit, x - attachment))
#This function calculates the reinsurance loss given the ground up loss and alae, limit and attachment.
for (i in seq(length.out=nrow(loss_cost_exhibit_alaeinc))) {
layeredloss <- layer(simloss[,1]+simloss[,2], loss_cost_exhibit_alaeinc$limit[i], loss_cost_exhibit_alaeinc$attachment[i])
#Amount of loss plus ALAE within the layer for each simulated loss.
count <- sum(layeredloss > 0)
#Number of simulated losses over the reinsurance attachment
mean <- mean(layeredloss)
#Average of the simulated layered loss. Note that layered losses of zero are included in the average.
mean_sq <- mean(layeredloss^2)
#Again, zeroes included.
loss_cost_exhibit_alaeinc$expected_loss[i] <- freq_at_threshold * mean
loss_cost_exhibit_alaeinc$frequency[i] <- count * freq_at_threshold / n_simulations
loss_cost_exhibit_alaeinc$severity[i] <- mean * n_simulations / count
#Adjusting for the zeroes included in mean.
loss_cost_exhibit_alaeinc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold)
#This assumes poisson frequency. We use the threshold instead of layer frequency because the squared layer mean includes simulated losses of zero.
}
loss_cost_exhibit_alaeinc
write.csv(loss_cost_exhibit_alaeinc, "loss_cost_exhibit_output_ALAE_Incl.csv")
If all went well, you should have the following table both in R and in a .csv file:
limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|
250000 | 250000 | 1943577 | 10.9496 | 177502 | 661010 |
500000 | 500000 | 1770682 | 5.8824 | 301014 | 872540 |
1000000 | 1000000 | 1178566 | 2.45936 | 479216 | 984557 |
3000000 | 2000000 | 1003379 | 0.76896 | 1304852 | 1520420 |
5000000 | 5000000 | 261281 | 0.16784 | 1556727 | 935496 |
Step 8b. Creation of a Loss Cost Exhibit: ALAE Pro-Rata Treatment
This code is mostly identical to Step 8a, but we do the calculations assuming ALAE pro-rata reinsurance treatment. We need to change the layeredloss variable definition. It is the indemnity (first simloss column entry) plus the ALAE amount (second entry) multiplied by the ratio of the layered indemnity over the total indemnity.
loss_cost_exhibit_alaepr <- read.csv("loss_cost_exhibit_input.csv", header = TRUE)
for (i in seq(length.out=nrow(loss_cost_exhibit_alaepr))) {
layeredindem <- layer(simloss[,1], loss_cost_exhibit_alaepr$limit[i], loss_cost_exhibit_alaepr$attachment[i])
#Amount of indemnity falling in the layer.
layeredloss <- layeredindem + simloss[,2] * layeredindem/simloss[,1]
#Reinsurance loss under ALAE pro-rata treatment which is layered indemnity plus an equal portion of the ALAE as the reinsured indemnity is of total indemnity.
count <- sum(layeredloss > 0)
mean <- mean(layeredloss)
mean_sq <- mean(layeredloss^2)
loss_cost_exhibit_alaepr$expected_loss[i] <- freq_at_threshold * mean
loss_cost_exhibit_alaepr$frequency[i] <- count * freq_at_threshold / n_simulations
loss_cost_exhibit_alaepr$severity[i] <- mean * n_simulations / count
loss_cost_exhibit_alaepr$std_dev[i] <- sqrt(mean_sq * freq_at_threshold)
}
loss_cost_exhibit_alaepr
write.csv(loss_cost_exhibit_alaepr, "loss_cost_exhibit_output_ALAE_Prorata.csv")
If all went well, you should have the following table both in R and in a .csv file:
limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|
250000 | 250000 | 1856440 | 9.02144 | 205781 | 776859 |
500000 | 500000 | 1594784 | 4.72624 | 337432 | 976762 |
1000000 | 1000000 | 970301 | 1.028 | 943872 | 1077113 |
3000000 | 2000000 | 902565 | 0.66144 | 1364546 | 1722935 |
5000000 | 5000000 | 107998 | 0.12592 | 857669 | 924822 |
Comparison to Classical Method
We should compare the final results of the copula method for modeling ALAE to the classical assumption we talked about before. The classical assumption is that ALAE is a fixed percent of indemnity for every loss. The first step is to determine what that fixed percentage should be. A very simple way is to take the total ALAE in our loss dataset and divide by the total indemnity. This is a commonly used method and so will give us a fair comparison.
ALAE_load <- 1+ sum(copula_data$alae)/sum(copula_data$loss)
ALAE_load
# We assume that the fixed ALAE load to be applied to each claim is the total ALAE in the dataset divided by the total loss (including below the threshold), which is a typical practice.
The next step is to prepare the loss cost exhibits for ALAE included and pro-rata treatment under the classical assumption:
Step 8c. Creation of a Loss Cost Exhibit: ALAE Included Treatment, Classical Assumption
loss_cost_exhibit_alaeinc_clsc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE)
for (i in seq(length.out=nrow(loss_cost_exhibit_alaeinc_clsc))) {
layeredloss <- layer(simloss[,1]*ALAE_load, loss_cost_exhibit_alaeinc_clsc$limit[i], loss_cost_exhibit_alaeinc_clsc$attachment[i])
#We load each indemnity amount by the ALAE load and then apply the layering. The rest of the calculations are identical.
count <- sum(layeredloss > 0)
mean <- mean(layeredloss)
mean_sq <- mean(layeredloss^2)
loss_cost_exhibit_alaeinc_clsc$expected_loss[i] <- freq_at_threshold * mean
loss_cost_exhibit_alaeinc_clsc$frequency[i] <- count * freq_at_threshold / n_simulations
loss_cost_exhibit_alaeinc_clsc$severity[i] <- mean * n_simulations / count
loss_cost_exhibit_alaeinc_clsc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold)
}
loss_cost_exhibit_alaeinc_clsc
write.csv(loss_cost_exhibit_alaeinc_clsc, "loss_cost_exhibit_output_ALAE_Incl_clsc.csv")
If all went well, you should have the following table both in R and in a .csv file:
limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|
250000 | 250000 | 1902908 | 10.6384 | 178872 | 650561 |
500000 | 500000 | 1832596 | 5.78512 | 316778 | 878039 |
1000000 | 1000000 | 1317938 | 2.51984 | 523024 | 999219 |
3000000 | 2000000 | 1172114 | 0.78608 | 1491087 | 1630258 |
5000000 | 5000000 | 303197 | 0.20064 | 1511148 | 856929 |
Step 8d. Creation of a Loss Cost Exhibit: ALAE Pro-Rata Treatment, Classical Assumption
loss_cost_exhibit_alaepr_clsc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE)
for (i in seq(length.out=nrow(loss_cost_exhibit_alaepr_clsc))) {
layeredloss <- ALAE_load*layer(simloss[,1], loss_cost_exhibit_alaepr_clsc$limit[i], loss_cost_exhibit_alaepr_clsc$attachment[i])
#Exercise: Work out that this is the correct formula for reinsurance loss in this case.
count <- sum(layeredloss > 0)
mean <- mean(layeredloss)
mean_sq <- mean(layeredloss^2)
loss_cost_exhibit_alaepr_clsc$expected_loss[i] <- freq_at_threshold * mean
loss_cost_exhibit_alaepr_clsc$frequency[i] <- count * freq_at_threshold / n_simulations
loss_cost_exhibit_alaepr_clsc$severity[i] <- mean * n_simulations / count
loss_cost_exhibit_alaepr_clsc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold)
}
loss_cost_exhibit_alaepr_clsc
write.csv(loss_cost_exhibit_alaepr_clsc, "loss_cost_exhibit_output_ALAE_Prorata_clsc.csv")
If all went well, you should have the following table both in R and in a .csv file:
limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|
250000 | 250000 | 1956277 | 9.02144 | 216848 | 766881 |
500000 | 500000 | 1726739 | 4.72624 | 365351 | 1013374 |
1000000 | 1000000 | 1062436 | 1.028 | 1033498 | 1135475 |
3000000 | 2000000 | 963839 | 0.66144 | 1457183 | 1719035 |
5000000 | 5000000 | 87520 | 0.12592 | 695043 | 621557 |
Now that we have the loss cost exhibit for each ALAE treatment and method, we can do a quick comparison. The following code generates a table of percentage differences:
100*(loss_cost_exhibit_alaepr/loss_cost_exhibit_alaepr_clsc-1)[,3:6]
You should get a table similar to this for the ALAE pro-rata case:
expected_loss | frequency | severity | std_dev |
---|---|---|---|
-5.150366 | 0 | -5.150366 | 0.1551662 |
-7.610009 | 0 | -7.610009 | -2.7739925 |
-8.051739 | 0 | -8.051739 | -3.8005262 |
-5.527266 | 0 | -5.527266 | 2.7570940 |
18.515967 | 0 | 18.515967 | 49.7287914 |
Note that the frequency doesn't change because under ALAE pro-rata treatment, the frequency is determined by indemnity amount only which does not depend on the ALAE. Since the frequency is the same, it makes sense that the difference in loss cost is entirely due to differences in severity so we see these differences being equal. But why is it that the loss, or severity, is lower for the new method for every layer except the very highest layer where it is much higher? It could be that for very high indemnity amounts, the tail correlation of the copula tends to draw the very high ALAE amounts which, due to the heavy tailed-ness of the pareto distribution, are a much greater than the average ALAE based on the ALAE ratio. Or it could be simulation error since we have only a finite number of points. For very high indemnity amounts, the tail correlation of the copula tends to draw the very high ALAE amounts which, due to the heavy tailed-ness of the pareto distribution, are a much greater than the average ALAE based on the ALAE ratio. However, in this example, more simulations should be done to increase the stability of the top layer.
Let's look at the ALAE included case:
100*(loss_cost_exhibit_alaeinc/loss_cost_exhibit_alaeinc_clsc-1)[,3:6]
You should get a table like this:
expected_loss | frequency | severity | std_dev |
---|---|---|---|
2.305544 | 2.9593188 | -0.6349835 | 1.69892766 |
-2.483069 | 2.1675698 | -4.5519714 | -0.05632885 |
-8.914428 | -1.0747844 | -7.9248187 | -0.51161208 |
-12.769818 | -0.5792305 | -12.2616102 | -6.07735758 |
-17.075425 | -17.2059984 | 0.1577083 | 4.78690599 |
This is interesting because we see the loss cost in the lowest layer increase and the highest layer decrease which is opposite of the ALAE pro-rata case. For the low layer, this may be because our model allows large ALAE amounts to occur for small indemnity amounts (with probability determined by the copula) and so we have additionally those indemnity amounts below the threshold divided by the ALAE load able to enter the layer, whereas under the classical assumption they would not. For the highest layer we may be getting the benefit of the partial correlation given by the copula, as opposed to 100% correlation in the classical assumptions.
Conclusion
As you probably noticed from the comparison table, the classical method is doing a fine job most of the time (otherwise the alarm would have been sounded already!). What I would like you to take away from this, rather than just blindly implementing the method, is to think about how ALAE has its own distribution and is tail correlated with indemnity. This has implications for certain particular scenarios: a $1M layer attachment when all policy limits (applying only to indemnity) are $1M and the reinsurance covers ALAE included with the loss.
What about layers that attach just above the ALAE load times a common policy limit, do losses from those policy limits really contribute no expected loss to the layer? Occasionally a reinsurance contract will say the ALAE treatment can either be included or pro-rata, whichever the client prefers. What should this cost and how does the distribution of ALAE and tail correlation with indemnity affect that cost? With this blogpost as a starting point, hopefully you are in a better position to answer those questions.
Greg McNulty, FCAS
SCOR Reinsurance
moc.rocs|ytluncmg#moc.rocs|ytluncmg
Like this entry?
Leave a comment
I think Greg's blog would get more exposure and discussion if it were available as a shiny app. If you click on the "Files" link at the bottom of this wikidot page you will see Greg's R code. This would be loaded into RStudio, which would then "compile" it into a shiny app. Also at the bottom you will also see three csv files the program needs. Versions of those three files on one's computer could be selected using shiny drop-down boxes. It has been a few weeks/months since I read Greg's paper, but there may be one or two other defaults in his algorithm that could be changed with shiny selection widgets. RStudio will host the online app for free.
I've had experience building csv file selection boxes in shiny online apps and have been intending to start this project for some time. But it would be more fun to work on this with other people — and might actually get done that way! Let me know if you're interested by replying to this post. Thanks.
Dan
Dan: did anyone ever reply to this? I've recently tried to get back into shiny and this could be fun.
Hi Brian, thanks for the reply. I think you are being modest — upon visiting the word-cloud example in the shiny gallery, I noted PirateGrunt lurking in the footnote!
I've since started blogging about shiny-ing Greg's code on my tri-know-bits site, and have enough material for the next two weeks: displaying all plots, and uploading to shiny's free online hosting service. However, I note that you have also registered for shiny's free service, so it's up for discussion under whose name to upload mauc: yours, mine, or someone else on the committee who might want to give this a try.
Ultimately, I believe mauc users will get a deeper appreciation of copule (Italian plural) in practice if the app could receive users' own data. But first things first.