Modeling ALAE Using Copulas

Blog » Modeling ALAE Using Copulas

Posted by Greg McNultyGreg McNulty on 09 Dec 2015 22:16


Imagine you are a reinsurance pricing actuary tasked with pricing (or costing) an excess of loss contract. A typical method would be to determine the expected number of claims excess of some threshold, and then to also chose a severity distribution representing the probability of different sizes of loss above that threshold. A pareto curve would be a typical example or you also might use a semi-parametric mixed exponential distribution. Assuming these distributions represent only the ceding company's incurred loss, you can also apply their limit profile to get what's called an exposure estimate of the loss to the layer.

But what about the ceding company's loss adjustment expenses, commonly known as ALAE? For many lines of business these expenses are covered in addition to the insured's policy limit, and that is the case we will assume here. Usually an excess of loss reinsurance contract will cover some of the ceding company's ALAE for claims in the layer, or even below the layer. There are two common reinsurance treatments: ALAE included which means you add the indemnity and ALAE together and the reinsurer is responsible for however much of that is in the layer, or ALAE pro-rata which means the reinsurer pays the same percentage of the ALAE as it paid of the loss.

So we need to adjust our exposure loss cost estimate for ALAE. The traditional, and still very common, way this is done is to select an overall ratio of ALAE to loss, e.g. 5% or perhaps 20%, and then multiply each indemnity value by that amount to determine the amount of ALAE for that claim. For example, with a 20% ALAE load a $1M indemnity loss would have exactly $200k of ALAE, and every $1M claim would have exactly that same amount of ALAE.

While this seems reasonable it actually makes two very strong implicit assumptions. It forces the distribution of ALAE to be a scaled copy of the distribution of indemnity and it forces the two to be 100% correlated. We might suspect that the ALAE distribution is not a scaled copy of the indemnity distribution especially if there is a significant effect of policy limit capping, which there often is. A $1M indemnity limit and a 20% ALAE load implies a maximum possible ALAE incurred of $200k. We will look at the data further down to evaluate the correlation assumption.

Quick Theory of Bivariate Distributions:

Claim data with indemnity and ALAE information is an example of a bivariate distribution: random points living on the x-y plane with a certain probability distribution. At first, you might recall all the one dimensional distributions actuaries use, pareto, gamma, normal, exponential, etc., and think there must be that many bivariate distributions squared. Luckily, Sklar’s theorem states that a bivariate distribution is completely defined by the marginal distribution of the variables, i.e. the univariate distribution of x by itself and y by itself, and the copula which relates their cumulative distribution functions:

\begin{equation} F(x,y)=C(u,v);u= F_X (x), v= F_Y (y) \end{equation}

A copula is simply a bivariate (or multivariate) distribution on the unit square (or cube, etc.). This means that we can fit univariate distributions to ALAE and indemnity each in isolation, something more actuaries are comfortable with, and then fit a copula to the bivariate ALAE-indemnity data transformed to $[0,1]\times[0,1]$ without worrying about any loss of information.

See the references at the bottom for more information.

The Goal

Back to the premise: you are a reinsurance actuary trying to price an excess of loss contract. You already have a severity curve for indemnity, as discussed in the first paragraph, and an estimate of the expected number of claims excess of a certain threshold. You have a dataset of indemnity and ALAE amounts for a set of claims. We will not worry about trend or development (assume everything is trended and at ultimate already). What you are trying to do is refine the traditional assumption of ALAE as a fixed percent of indemnity and use a copula to model the bivariate nature of claims.

What we will do in this blog is present and walk through R code that performs this analysis step by step. If you download the .csv files attached (see link at bottom of page), you should be able to follow along and reproduce the results. We will discuss places where changes could be made as well. The full R code is embedded in the blog and also attached.

Step 1: Load R Libraries Required for Running the Code

Most of the functions we use have thankfully been programmed by somebody else. Loading these packages gives us access to all those functions. R can automatically download these or you might have to manually download and unzip them.


Step 2. Input and Setup Data

We started by importing a file containing the loss data with a column for indemnity (I will refer to indemnity as loss in the code, hopefully it will be clear from the context) and ALAE. The first handful of rows are shown below the code.

# Change this to where the .csv file with the data is stored

copula_data <- read.csv("copula_data.csv", header = TRUE)
loss alae
4468750 5571.34
3490000 808363.4
3450000 180757
3425000 713552.2
2895000 394624.5
1958925 36074.6
1626762 326741.5
990000 936229.8
980885.4 13156.44

Here was a quick look at the summary statistics of our data:

loss ALAE
Min. : 100000 Min. : 0
1st Qu.: 132812 1st Qu.: 12390
Median : 197500 Median : 48396
Mean : 374647 Mean : 115491
3rd Qu.: 336250 3rd Qu.: 126048
Max. : 4468750 Max. : 1767626

We removed any ALAE data points at exactly 0. This will allow us to fit curves to the logarithm of the data. An adjustment could be made at the end to add back in the probability of 0 ALAE but I have not done that here.

alae_data <- copula_data$alae

alae_data <- alae_data[alae_data>0] 
#The ALAE data will have a point mass at 0 which our fitted distributions do not account for.

Finally we transformed the data to be in $[0,1]\times[0,1]$ by using the rank function and then dividing by the number of data points plus one. Some of the copula fitting procedures break down if there are ties or repeats in the data, so we applied a tie-break procedure which just randomly selects one of the equal entries to be ranked ahead of the other.

rankdata <- sapply(copula_data , rank, ties.method = "random") / (nrow(copula_data) + 1)

A quick look at the data (log scale with liner trendline) showed the 100% correlation assumption to be unrealistic:

plot(copula_data, log="yx")
abline(lm(copula_data[,2]~copula_data[,1]), untf = "TRUE")

Step 3. ALAE Distribution Fitting

There are so many resources discussing curve fitting for univariate data I won’t go into the theory but the following code took the ALAE data and used maximum likelihood to fit a pareto, log-gamma, Weibull and lognormal distribution:

nLL1 <- function(mu, sigma) -sum(stats::dlnorm(alae_data, mu, sigma, log = TRUE)) 
# This defines a function giving the logliklihood of the data for a given set of distribution parameters. This line is for the log-normal distribution.
alae_fit1<-mle(nLL1, start = list(mu = 10, sigma = 1), method = "L-BFGS-B", nobs = length(alae_data), lower = c(1,0.01)) 
#This finds the distribution parameters minimizing the function defined above, i.e. the maximum liklihood fit.

nLL2 <- function(shape, scale) -sum(stats::dweibull(alae_data, shape, scale, log = TRUE)) 
# Same as above but for the Weibull distribution.
alae_fit2<-mle(nLL2, start = list(shape = 1, scale = 50000), method = "L-BFGS-B", nobs = length(alae_data), lower = c(0.1,100))

nLL3 <- function(shapelog, ratelog) -sum(actuar::dlgamma(alae_data, shapelog, ratelog, log = TRUE)) 
#Log-gamma distribution.
alae_fit3<-mle(nLL3, start = list(shapelog = 60, ratelog = 1), method = "L-BFGS-B", nobs = length(alae_data), lower = c(0.01,0.01))

nLL4 <- function(shape, scale) -sum(actuar::dpareto(alae_data, shape, scale, log = TRUE)) 
# Pareto distribution.
alae_fit4<-mle(nLL4, start = list(shape = 1.1, scale = 1000), method = "L-BFGS-B", nobs = length(alae_data), lower = c(1,100))

# The following code created a graph displaying each of the fitted curves against the empirical distribution of the data. The x-axis is ALAE amount and the y-axis is cumulative probability of ALAE < y.

x<-seq(0,max(alae_data),max(alae_data)/1000) # This defines the x-axis range for the following graph to encompas all ALAE data.

plot(x,plnorm(x,coef(alae_fit1)[1], coef(alae_fit1)[2]),type="l",col="red", main="ECDF vs Curve Fits")

#We give each distribution a different color as indicated in the code below.
lines(x,pweibull(x,coef(alae_fit2)[1], coef(alae_fit2)[2]),type="l",col="blue")
lines(x,plgamma(x,coef(alae_fit3)[1], coef(alae_fit3)[2]),type="l",col="orange")
lines(x,ppareto(x,coef(alae_fit4)[1], coef(alae_fit4)[2]),type="l",col="green")

# At this point the user may select which fit they think is the best. I typically found the pareto to be the best fit and so the rest of the code assumes the pareto distribution is chosen. The code can be modified to make a different selection.


If everything worked correctly you should see this graph:


The graph shows each of the 4 fitted distributions against the actual cumulative distribution of ALAE amounts. From practice the pareto often seems to be the best fit as it is here in green.

Step 4. Copula Fitting

There are two aspects to copula fitting. Just like with univariate distributions, there are different families of distributions and then within a family any given dataset will have a best fit member (according to some goodness of fit measure). From my own experimentation I found that the best fitting copula family to various datasets of liability indemnity and ALAE was the Gumbel copula. This was also chosen as the best family in Frees and Valdez [3], Micocci [9] and Venter [12]. The Gumbel has the desirable properties of being single parameter, an extreme-value copula (meaning it's appropriate for right tailed truncated data as we often work with in reinsurance), and has a closed form expression for traditional product-moment correlation and upper tail correlation.

So for this exercise we assumed the Gumbel to be the correct copula family and then fit the best parameter (as it is a single parameter copula) using the maximum likelihood option in the 'copula' package's fitCopula method:

fitted_copula<-fitCopula(gumbelCopula(dim=2), rankdata, method = "ml") 
#We have assumed use of the Gumbel copula.


theta <- fitted_copula@copula@parameters

tail_correlation <- 2 - 2^(1/theta)
# The upper tail correlation is one way of uniquely describing a member of a one-parameter copula family (e.g. Gumbel). 


In my trials with different datasets of casualty large loss data, I had fitted tail correlations of between 0.2 and 0.4. This is on a scale of 0 to 1 for the upper tail correlation measure, as opposed to -1 to 1 for the traditional Pearson correlation measure. After running the code you should have seen a plot of the empirical copula data:


You can see the upper tail correlation since there is a cluster of points in the upper right hand corner. That means that when indemnity is large or in the upper quantiles, then ALAE also tends to be relatively large, or in the upper quantiles of the ALAE distribution. You can also see even more distinctly the lack of points in the upper-left and lower-right corners. This means that it is very rare to have a small ALAE amount accompanying a large loss amount and vice versa. If ALAE and indemnity were independent, then the points would be uniformly scattered across the entire square and you would see a similar number of points in each of the four corners.

Step 5. Indemnity Exposure Curve Loading

indexpo_data<-read.csv("indemnity_curve.csv", header = FALSE) 
#This assumes the user already has an indemnity distribution, i.e. exposure curve, they want to use.

indexpo_data[1001, 2] <- 1

supp = indexpo_data[2:1001, 1]
probs = indexpo_data[, 2][-1] -  indexpo_data[1:1000, 2]

indemnity<-DiscreteDistribution(supp, probs) 
#This just defines the cumulative distribution function of the exposure curve as a distribution object in R.

As mentioned above, we are not fitting a cure to the indemnity data, we are assuming that you already have an empirical distribution representing projected indemnity amounts. This is usually based on the types of business written by the ceding company and at what limits.

The minimum value in the indemnity severity distribution is $100,000. This is what we will call the model threshold. This just simplifies the analysis by not requiring us to know about the severity distribution far below the reinsurance attachment point.

Step 6. Plot the Selected Bivariate Distribution Against the Actual Data

With both marginal distributions and a copula we now have a full model for the ALAE-indemnity bivariate distribution. We can plot some simulated data against the actual data to visually assess the model for any blatant errors:

simcopula <- rCopula(nrow(copula_data), fitted_copula@copula) 
# This simulates as many random draws from the fitted copula as there are in the original dataset.

simloss <- cbind(distr::q(indemnity)(simcopula[,1]), qpareto(simcopula[,2], alae_fit4@coef["shape"], alae_fit4@coef["scale"])) 
# This uses the cumulative ALAE and loss distributions to transform the copula data, which is in terms of percentiles, into loss/ALAE amounts.

plot(copula_data, xlim = c(0, 10^7), ylim = c(0, 1.5*10^6), main = "input data v simulated data", xlab = "indemnity", ylab = "ALAE") 
plot(simloss, axes = F, type = "p", col=2, xlim = c(0, 10^7), ylim = c(0, 1.5*10^6), xlab = "", ylab = "")
# This is a plot of the actual data versus simulated data.

If all went according to plan, you should see a plot similar to this:


Of course, the above plot may be of limited value because we’ve only simulated as many points from the fitted distribution as there are points in the original dataset, so even two such plots form the same fitted distribution may look very different.

Step 7. Simulation and Creation of a Loss Plus ALAE Distribution

The final output is based on empirical simulation, not theoretical integration, so the first step is to simulate from our fitted bivariate distribution:

n_simulations <-100000 
# define the number of simulations to do for creation of the curve

simcopula <- rCopula(n_simulations, fitted_copula@copula) 

simloss <- cbind(distr::q(indemnity)(simcopula[,1]), qpareto(simcopula[,2], alae_fit4@coef["shape"], alae_fit4@coef["scale"]))

Interestingly, we need to simulate only from the copula distribution. These points live on the unit square, then we use the “q” function of the fitted distributions to convert from cumulative percentiles to x-values of ALAE and indemnity.

The first type of output we can create is an “ALAE Included” severity cumulative distribution. We add the simulated indemnity and ALAE amounts together for each simulated point, and rank them to create a univariate distribution. This could come in handy if we are trying to evaluate or simulate multiple layers simultaneously in excel or some other program, but need a single univariate severity distribution. It should be noted that in the ALAE pro-rata reinsurance case, a univariate distribution cannot replicate the loss to the layer for each claim. We export the resulting empiric distribution to a .csv:


n_points <-1000 
#Define the number of points desired for the final empirical loss plus alae distribution. Should be orders of magnitude less than n_simulations for stability.


loss_dist_alaeinc <- cbind(quantile(loss_alaeinc, cumulative_prob), cumulative_prob)

write.csv(loss_dist_alaeinc, "loss_dist_alaeinc.csv") 
#Output the empirical distribution based on simulated data to a .csv file (assuming this is desired for use in some other program, spreadsheet, etc.)

Step 8a. Creation of a Loss Cost Exhibit: ALAE Included Treatment

The other form of output is a summary of loss statistics for the layers of interest. The final goal is to have an estimate for the expected loss, frequency, severity and standard deviation of loss for each layer as these are typically used in the determination of the price or cost of the reinsurance.

We start with a .csv file with the desired reinsurance limit and attachment for the layers of interest. We also have blank spaces where we will use R to enter the statistics of interest. Here is what the input file looks like:

limit attachment expected_loss frequency severity std_dev
250000 250000 0 0 0 0
500000 500000 0 0 0 0
1000000 1000000 0 0 0 0
3000000 2000000 0 0 0 0
5000000 5000000 0 0 0 0

The only additional item of information we need is the frequency at the model threshold, which is our minimum modeled indemnity loss amount of $100,000. We assume this number has been estimated already and is given. Here is the code in the ALAE included reinsurance case:

freq_at_threshold <- 16

loss_cost_exhibit_alaeinc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE)

layer <- function(x, limit, attachment) pmax(0, pmin(limit, x - attachment))  
#This function calculates the reinsurance loss given the ground up loss and alae, limit and attachment.

for (i in seq(length.out=nrow(loss_cost_exhibit_alaeinc))) {
  layeredloss <- layer(simloss[,1]+simloss[,2], loss_cost_exhibit_alaeinc$limit[i], loss_cost_exhibit_alaeinc$attachment[i]) 
#Amount of loss plus ALAE within the layer for each simulated loss.

  count <- sum(layeredloss > 0) 
#Number of simulated losses over the reinsurance attachment
  mean <- mean(layeredloss) 
#Average of the simulated layered loss. Note that layered losses of zero are included in the average.
  mean_sq <- mean(layeredloss^2) 
#Again, zeroes included.

  loss_cost_exhibit_alaeinc$expected_loss[i] <- freq_at_threshold * mean 
  loss_cost_exhibit_alaeinc$frequency[i] <- count * freq_at_threshold / n_simulations
  loss_cost_exhibit_alaeinc$severity[i] <- mean * n_simulations / count 
#Adjusting for the zeroes included in mean.

  loss_cost_exhibit_alaeinc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold) 
#This assumes poisson frequency. We use the threshold instead of layer frequency because the squared layer mean includes simulated losses of zero.


write.csv(loss_cost_exhibit_alaeinc, "loss_cost_exhibit_output_ALAE_Incl.csv")

If all went well, you should have the following table both in R and in a .csv file:

limit attachment expected_loss frequency severity std_dev
250000 250000 1943577 10.9496 177502 661010
500000 500000 1770682 5.8824 301014 872540
1000000 1000000 1178566 2.45936 479216 984557
3000000 2000000 1003379 0.76896 1304852 1520420
5000000 5000000 261281 0.16784 1556727 935496

Step 8b. Creation of a Loss Cost Exhibit: ALAE Pro-Rata Treatment

This code is mostly identical to Step 8a, but we do the calculations assuming ALAE pro-rata reinsurance treatment. We need to change the layeredloss variable definition. It is the indemnity (first simloss column entry) plus the ALAE amount (second entry) multiplied by the ratio of the layered indemnity over the total indemnity.

loss_cost_exhibit_alaepr <- read.csv("loss_cost_exhibit_input.csv", header = TRUE)

for (i in seq(length.out=nrow(loss_cost_exhibit_alaepr))) {
  layeredindem <- layer(simloss[,1], loss_cost_exhibit_alaepr$limit[i], loss_cost_exhibit_alaepr$attachment[i]) 
#Amount of indemnity falling in the layer.
  layeredloss <- layeredindem + simloss[,2] * layeredindem/simloss[,1] 
#Reinsurance loss under ALAE pro-rata treatment which is layered indemnity plus an equal portion of the ALAE as the reinsured indemnity is of total indemnity.

  count <- sum(layeredloss > 0) 
  mean <- mean(layeredloss) 
  mean_sq <- mean(layeredloss^2) 

  loss_cost_exhibit_alaepr$expected_loss[i] <- freq_at_threshold * mean 
  loss_cost_exhibit_alaepr$frequency[i] <- count * freq_at_threshold / n_simulations
  loss_cost_exhibit_alaepr$severity[i] <- mean * n_simulations / count 
  loss_cost_exhibit_alaepr$std_dev[i] <- sqrt(mean_sq * freq_at_threshold)

write.csv(loss_cost_exhibit_alaepr, "loss_cost_exhibit_output_ALAE_Prorata.csv")

If all went well, you should have the following table both in R and in a .csv file:

limit attachment expected_loss frequency severity std_dev
250000 250000 1856440 9.02144 205781 776859
500000 500000 1594784 4.72624 337432 976762
1000000 1000000 970301 1.028 943872 1077113
3000000 2000000 902565 0.66144 1364546 1722935
5000000 5000000 107998 0.12592 857669 924822

Comparison to Classical Method

We should compare the final results of the copula method for modeling ALAE to the classical assumption we talked about before. The classical assumption is that ALAE is a fixed percent of indemnity for every loss. The first step is to determine what that fixed percentage should be. A very simple way is to take the total ALAE in our loss dataset and divide by the total indemnity. This is a commonly used method and so will give us a fair comparison.

ALAE_load <- 1+ sum(copula_data$alae)/sum(copula_data$loss) 
# We assume that the fixed ALAE load to be applied to each claim is the total ALAE in the dataset divided by the total loss (including below the threshold), which is a typical practice.

The next step is to prepare the loss cost exhibits for ALAE included and pro-rata treatment under the classical assumption:

Step 8c. Creation of a Loss Cost Exhibit: ALAE Included Treatment, Classical Assumption

loss_cost_exhibit_alaeinc_clsc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE)

for (i in seq(length.out=nrow(loss_cost_exhibit_alaeinc_clsc))) {
  layeredloss <- layer(simloss[,1]*ALAE_load, loss_cost_exhibit_alaeinc_clsc$limit[i], loss_cost_exhibit_alaeinc_clsc$attachment[i]) 
#We load each indemnity amount by the ALAE load and then apply the layering. The rest of the calculations are identical.

  count <- sum(layeredloss > 0) 
  mean <- mean(layeredloss) 
  mean_sq <- mean(layeredloss^2) 

  loss_cost_exhibit_alaeinc_clsc$expected_loss[i] <- freq_at_threshold * mean 
  loss_cost_exhibit_alaeinc_clsc$frequency[i] <- count * freq_at_threshold / n_simulations
  loss_cost_exhibit_alaeinc_clsc$severity[i] <- mean * n_simulations / count 
  loss_cost_exhibit_alaeinc_clsc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold)

write.csv(loss_cost_exhibit_alaeinc_clsc, "loss_cost_exhibit_output_ALAE_Incl_clsc.csv")

If all went well, you should have the following table both in R and in a .csv file:

limit attachment expected_loss frequency severity std_dev
250000 250000 1902908 10.6384 178872 650561
500000 500000 1832596 5.78512 316778 878039
1000000 1000000 1317938 2.51984 523024 999219
3000000 2000000 1172114 0.78608 1491087 1630258
5000000 5000000 303197 0.20064 1511148 856929

Step 8d. Creation of a Loss Cost Exhibit: ALAE Pro-Rata Treatment, Classical Assumption

loss_cost_exhibit_alaepr_clsc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE)

for (i in seq(length.out=nrow(loss_cost_exhibit_alaepr_clsc))) {
 layeredloss <- ALAE_load*layer(simloss[,1], loss_cost_exhibit_alaepr_clsc$limit[i], loss_cost_exhibit_alaepr_clsc$attachment[i]) 
#Exercise: Work out that this is the correct formula for reinsurance loss in this case.

  count <- sum(layeredloss > 0) 
  mean <- mean(layeredloss) 
  mean_sq <- mean(layeredloss^2) 

  loss_cost_exhibit_alaepr_clsc$expected_loss[i] <- freq_at_threshold * mean 
  loss_cost_exhibit_alaepr_clsc$frequency[i] <- count * freq_at_threshold / n_simulations
  loss_cost_exhibit_alaepr_clsc$severity[i] <- mean * n_simulations / count 
  loss_cost_exhibit_alaepr_clsc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold)

write.csv(loss_cost_exhibit_alaepr_clsc, "loss_cost_exhibit_output_ALAE_Prorata_clsc.csv")

If all went well, you should have the following table both in R and in a .csv file:

limit attachment expected_loss frequency severity std_dev
250000 250000 1956277 9.02144 216848 766881
500000 500000 1726739 4.72624 365351 1013374
1000000 1000000 1062436 1.028 1033498 1135475
3000000 2000000 963839 0.66144 1457183 1719035
5000000 5000000 87520 0.12592 695043 621557

Now that we have the loss cost exhibit for each ALAE treatment and method, we can do a quick comparison. The following code generates a table of percentage differences:


You should get a table similar to this for the ALAE pro-rata case:

expected_loss frequency severity std_dev
-5.150366 0 -5.150366 0.1551662
-7.610009 0 -7.610009 -2.7739925
-8.051739 0 -8.051739 -3.8005262
-5.527266 0 -5.527266 2.7570940
18.515967 0 18.515967 49.7287914

Note that the frequency doesn't change because under ALAE pro-rata treatment, the frequency is determined by indemnity amount only which does not depend on the ALAE. Since the frequency is the same, it makes sense that the difference in loss cost is entirely due to differences in severity so we see these differences being equal. But why is it that the loss, or severity, is lower for the new method for every layer except the very highest layer where it is much higher? It could be that for very high indemnity amounts, the tail correlation of the copula tends to draw the very high ALAE amounts which, due to the heavy tailed-ness of the pareto distribution, are a much greater than the average ALAE based on the ALAE ratio. Or it could be simulation error since we have only a finite number of points. For very high indemnity amounts, the tail correlation of the copula tends to draw the very high ALAE amounts which, due to the heavy tailed-ness of the pareto distribution, are a much greater than the average ALAE based on the ALAE ratio. However, in this example, more simulations should be done to increase the stability of the top layer.

Let's look at the ALAE included case:


You should get a table like this:

expected_loss frequency severity std_dev
2.305544 2.9593188 -0.6349835 1.69892766
-2.483069 2.1675698 -4.5519714 -0.05632885
-8.914428 -1.0747844 -7.9248187 -0.51161208
-12.769818 -0.5792305 -12.2616102 -6.07735758
-17.075425 -17.2059984 0.1577083 4.78690599

This is interesting because we see the loss cost in the lowest layer increase and the highest layer decrease which is opposite of the ALAE pro-rata case. For the low layer, this may be because our model allows large ALAE amounts to occur for small indemnity amounts (with probability determined by the copula) and so we have additionally those indemnity amounts below the threshold divided by the ALAE load able to enter the layer, whereas under the classical assumption they would not. For the highest layer we may be getting the benefit of the partial correlation given by the copula, as opposed to 100% correlation in the classical assumptions.


As you probably noticed from the comparison table, the classical method is doing a fine job most of the time (otherwise the alarm would have been sounded already!). What I would like you to take away from this, rather than just blindly implementing the method, is to think about how ALAE has its own distribution and is tail correlated with indemnity. This has implications for certain particular scenarios: a $1M layer attachment when all policy limits (applying only to indemnity) are $1M and the reinsurance covers ALAE included with the loss.

What about layers that attach just above the ALAE load times a common policy limit, do losses from those policy limits really contribute no expected loss to the layer? Occasionally a reinsurance contract will say the ALAE treatment can either be included or pro-rata, whichever the client prefers. What should this cost and how does the distribution of ALAE and tail correlation with indemnity affect that cost? With this blogpost as a starting point, hopefully you are in a better position to answer those questions.

Greg McNulty, FCAS
SCOR Reinsurance

1. Camphausen, F. et al. “Package ‘distr’”. Version 2.4. February 7, 2013.
2. Dutang, C. et al. “actuar: An R Package for Actuarial Science”. Journal of Statistical Software. March 2008, Volume 25, Issue 7.
3. Frees, E.; Valdez, E. “Understanding Relationships Using Copulas”. North American Actuarial Journal, Volume 2, Number 1. 1998.
4. Genest, C.; MacKay, J. “The Joy of Copulas: Bivariate Distributions with Uniform Marginals”. The American Statistician, Volume 40, Issue 4 (Nov., 1986),280-283.
5. Geyer, C. “Maximum Likelihood in R”. September 30, 2003.
6. Hofert, M. et al. “Package ‘copula’”. Version 0.999-5, November 2012.
7. Joe, Harry. Multivariate Models and Dependence Concepts. Monographs on Statistics and Probability 73, Chapman & Hall/CRC, 2001.
8. Kojadinovic, I.; Yan, J. “Modeling Multivariate Distributions with Continuous Margins Using the copula R Package”. Journal of Statistical Software. May 2010, Volume 34, Issue 9.
9. Micocci, M.; Masala, G. “Loss-ALAE modeling through a copula dependence structure”. Investment Management and Financial Innovations. Volume 6, Issue 4, 2009.
10. Ricci, V. “Fitting Distributions with R”. Release 0.4-21, February 2005.
11. Ruckdeschel, P. et al. “S4 Classes for Distributions—a manual for packages "distr", "distrEx", "distrEllipse", "distrMod", "distrSim", "distrTEst", "distrTeach", version 2.4”. February 5, 2013.
12. Venter, G. “Tails of Copulas”. Proceedings of the Casualty Actuarial Society. Arlington, Virginia. 2002: LXXXIX, 68-113.
13. Yan, J. “Enjoy the Joy of Copulas: With a Package copula”. Journal of Statistical Software. October 2007, Volume 21, Issue 4.

Like this entry?

Leave a comment

Add a New Comment