JuliaActuary
Practical, extensible, and open-source actuarial modeling and analysis.

# Bayesian vs Limited Fluctuation Experience Analysis

Alternate title: Why actuaries should cease to care about the number 1082.

## Introduction

This notebook briefly discusses two approaches to experience analysis (rate setting). One traditional method, Limited Fluctuation Credibility (LFC), has been around for a very long time although was never intended to be the most accurate predictive method. However, many actuaries still discuss the notion of "full" or "partially" credible indicating a reliance on the LFC concepts.

The Bayesian approach uses explicit assumptions about the statistical relationships and all data given to the model to make inferences. Small datasets lead to greater uncertainty, while larger datasets never reach a point that could be considered "fully credible", although the posterior density may narrow considerably.

This notebook argues that the Bayesian approach is superior to the LFC heuristics and should be adopted more widely in actuarial practice.

The example will use an auto-claims dataset and will use the first two years of experience to predict the third.

## Limited Fluctuation Credibility

The Limited Fluctuation Method was so named because it allowed premiums to fluctuate from year to year based on experience, while limiting those fluctuations by giving less than full credibility to premiums based on limited data. In contrast, setting premium rates by giving full credibility to recent experience could be called the Full Fluctuation Method. While every credibility method serves to limit fluctuations, this method acquired its name because it was the first. The Limited Fluctuation Method, also known as Classical Credibility, is the most widely-used credibility method because it can be relatively simple to apply. Outside North America, this method is sometimes referred to as American Credibility.

Quoted from Atkinson, 2019.

### Formulas

The Limited Fluctuation Method components: a credibility weight, $Z$, an Observed Rate, and a Prior Rate.

$$\text{Credibility-Weighted Rate} = Z × \text{Observed Rate} + (1 – Z) × \text{Prior Rate}$$

With probability equal to 0.9 (we'll call this LC_p) that the Observed Rate does not differ from the true rate by more than 0.05 (we'll call this LC_r).

$$\text{Claims for full credibility} = \left(\frac{\text{Z-Score}}{\text{ratio}}\right)^{2}$$

Using the inputs above, we have a z-score corresponding to 0.9 of 1.6448536269514717, so: ( 1.6448536269514717 ÷ 0.05)² ≈ 1082

LC_full_credibility = round(Int,(quantile(Normal(),1-(1-LC_p)/2)/LC_r)^2)
1082

Atkinson goes on to nicely summarize the square root method which assigns full credibility, $Z = 1$, when the number of actual claims equals or exceeds the full credibility threshold of 1082 claims.

When the number of claims is less than full credibility:

$$Z = \sqrt{\frac{\text{no. claims in group}}{\text{no. claims full credibility}}}$$

### Issues with Limited Fluctuation

• Ignores available information:

• Why does our $Z$ not vary by exposures?

• No mechanism for a related group to inform the credibility of another

• Results in a single point estimate with only approximate measure of estimate uncertainty

• Relies on a number of assumptions/constraints:

• That aggregated claims can be approximated by a normal distribution

• The thresholds of 0.9 and 0.05 are arbitrary (e.g. all of the same arguments against p-value thresholds can apply to this approach)

## Bayesian Approach

Bayes' formula of updating the posterior probability conditioned on the observed data:

$$P(A\mid B) = \frac{P(B \mid A) P(A)}{P(B)}$$

This is well known to most actuaries... but generally not applied frequently in practice!

The issue is that once the probability distributions and data become non-trivial, the formula becomes analytically intractable. Work over the last several decades has set the stage for addressing these situations by developing algorithms that let you sample the Bayesian posterior, even if you can't analytically say what that is.

A full overview is beyond the scope of this notebook, which is simply to demonstrate that Limited Fluctuation Credibility is of questionable use in practice and that superior tools exist. For references on modern Bayesian statistics, see the end notes.

Note that this is describing a different, more first-principles approach than the the Buhlman Bayesian approach, which attempts to simply relate group experience to population experience. The Bayesian approach described here is much more general and extensible.

### The formulation

Contrary to Limited Fluctuation, the Bayesian approach forces one to be explicit about the presumed structure of the probability model. The flexibility of the statistical model allows one to incorporate actuarial judgement in a quantitative way. For example, in this example we assume that the claims experience of each group informs a global (hyperparameter) prior distribution which we could use as a starting point for a new type of observation. More on this once the data is introduced.

## Sample Claims Prediction

The data comes from an Allstate auto-claims data via Kaggle. It contains exposure level information about predictor variable and claim amounts for calendar years 2005-2007.

For simplicity, we will focus on the narrow objective of estimating the claims rate at the level of automobile make and model. We will use the years 2005 and 2006 as the training data set and then 2007 to evaluate the predictive model.

The original data is over 13 million rows, we will load an already summarized CSV and split it into train and test sets based on the Calendar_Year.

train,test = let
df.p_observed = df.claims ./ df.n

train = df[df.Calendar_Year .< 2007,:]
test  = df[df.Calendar_Year .== 2007,:]

train, test

end
(2413×5 DataFrame
Row │ Calendar_Year  Blind_Model  n      claims  p_observed
│ Int64          String7      Int64  Int64   Float64
──────┼───────────────────────────────────────────────────────
1 │          2005  K.78          6132      41  0.00668624
2 │          2005  Q.22         35141     270  0.00768333
3 │          2005  AR.41        18719     174  0.00929537
4 │          2006  AR.41        18868     196  0.010388
5 │          2005  D.20          4573      27  0.00590422
⋮   │       ⋮             ⋮         ⋮      ⋮         ⋮
2410 │          2005  AJ.51            1       0  0.0
2411 │          2006  AJ.51            1       0  0.0
2412 │          2005  BQ.6             1       0  0.0
2413 │          2006  BQ.6             1       0  0.0
2404 rows omitted, 1289×5 DataFrame
Row │ Calendar_Year  Blind_Model  n      claims  p_observed
│ Int64          String7      Int64  Int64   Float64
──────┼───────────────────────────────────────────────────────
1 │          2007  X.45         99368     789  0.00794018
2 │          2007  Y.29         55783     435  0.00779807
3 │          2007  P.18          4923      46  0.0093439
4 │          2007  X.40          5573      22  0.0039476
5 │          2007  Y.42         25800     168  0.00651163
⋮   │       ⋮             ⋮         ⋮      ⋮         ⋮
1286 │          2007  AE.6             2       0  0.0
1287 │          2007  CB.6             1       0  0.0
1288 │          2007  BU.32            1       0  0.0
1289 │          2007  CA.5             1       0  0.0
1280 rows omitted)

### Discussion of Bayesian model

Each group (make and model combination) has an expected claim rate $μ_i$, which is informed by the global hyperparameter $μ$ and variance $\sigma^2$. Then, the observed claims by group are assumed to be distributed according to a Poisson distribution.

A complete overview of modern Bayesian models is beyond the scope of this notebook, but a few key points:

• We are forced with the Bayesian approach to be explicit about the assumptions (versus all of the implicit assumptions of alternative techniques like LF)

• We set priors which are the assumed distribution of model parameters before "learning" from the observations. With enough data, it can result in a posterior that the prior was very skeptical of beforehad.

• With the volume of data in the training set, the priors we select here are not that important. Some comments on why they were chosen though:

• The rate of claims is linked with a logistic function, which looks like an integral sign of sorts. logistic(0.0) equals 0.5 while very negative inputs approach 0.0 and positive numbers approach 1.0. We do this so that the rate of claims is always constrained in the range $(0,1)$

• $μ \sim Normal(-2,4)$ says that we expect the population of auto claims rate to be less than 0.5 (logistic(-2) ≈ .12) but very wide range of possible values given the wide standard deviation.

• $σ \sim Exponential(0.25)$ says that we expect the standard devation of an individual group to be positive, but not super wide.

• $μ_i \sim Normal(μ,σ)$ is the actual prior for each group's rate of claim and is informed by the above hyperparameters.

• @model is a Turing.jl macro which enables interpreting the nice ~ syntax, which makes the Julia code look very similar to the traditional mathematical notation.

• Note the use of broadcasting with the dot syntax (e.g. the . in .~). This tells Julia to vectorize and fuse the computation.

• data.claims .~ Poisson.(data.n .* logistic.(μ_i)) means "each value in data.claims is a random outcome (.~) distributed accroding to a corresponding Poisson distribution with $\lambda =n \times \text{logistic}(\mu_i)$ where $text{logistic}(\mu_i)$ is the average claims rate for each group.

@model function model_poisson(data)
# hyperparameter that informs the prior for each group
μ ~ Normal(-2,4)
σ ~ Exponential(0.25)

# the random variable representing the average claim rate for each group
# filldist creates a set of random variable without needing to list out each one
μ_i ~ filldist(Normal(μ,σ),length(unique(data.Blind_Model)))

# use the poisson appproximation to the binomial claim outcome with a
# logisitc link function to keep the probability between 0 and 1
data.claims .~ Poisson.(data.n .* logistic.(μ_i))
end
model_poisson (generic function with 2 methods)

Here's what the prior distribution of claims looks like... and peeking ahead to what the posterior for a single group of claims looks like. See how even though our posterior was very wide (favoring small claims rates), it was dominated by the data to create a very narrow posterior average claims rate.

let
# sample from the priors before learning from any data
ch_prior = sample(mp,Prior(),1500)

f = Figure()
ax = Axis(f[1,1],title="Prior distribution of expected claims rate for a group")
μ = vec(ch_prior["μ_i[1]"].data)
hist!(ax,logistic.(μ);bins=50)
ax2 = Axis(f[2,1],title="Posterior distribution of expected claims rate for a group")
μ = vec(cp["μ_i[1]"].data)
hist!(ax2,logistic.(μ);bins=50)
f
end


The model is combined with the data and Turing.jl is used to computationally arrive at the posterior distribution for the parameters in the statistical model, $\mu$, $\mu_i$, and $\sigma$.

mp = let
# combine the different years in the training set
condensed = @chain train begin
groupby(:Blind_Model)
@combine begin
:n = sum(:n)
:claims = sum(:claims)
end
end
model_poisson(condensed);
end
DynamicPPL.Model{typeof(model_poisson), (:data,), (), (), Tuple{DataFrame}, Tuple{}, DynamicPPL.DefaultContext}(Main.var"workspace#3".model_poisson, (data = 1238×3 DataFrame
Row │ Blind_Model  n       claims
│ String7      Int64   Int64
──────┼─────────────────────────────
1 │ K.78          14058      94
2 │ Q.22          71401     532
3 │ AR.41         37587     370
4 │ D.20          10019      50
5 │ AJ.129       100824     693
⋮   │      ⋮         ⋮       ⋮
1235 │ AQ.1              1       0
1236 │ AJ.96             1       0
1237 │ AJ.51             2       0
1238 │ BQ.6              2       0
1229 rows omitted,), NamedTuple(), DynamicPPL.DefaultContext())

### Bayesian Posterior Sampling

Here's where recent advances in algorithms and computing power make Bayesian analyis possible. We can't analytically compute the posterior distribution, but we can genererate samples from the posterior such that the frequency of the sampled result appears in proportion to the true posterior density. This uses a technique called Markov Chain Monte Carlo, abbreviated MCMC. There are different algorithms in this family, and we will use one called the No-U-Turn Sampler (NUTS for short).

The result of the sample function is a set of data containing data that is generated in proportion to the posterior distributions and we will use this data to make predictions and understand the distribution of our parameters.

# on an M1 Macbook Air, the sampler takes about 10 minutes to run. We use saved results by default in order to avoid uneccesary website build time or to make it easier for users to open the notebook and see results quickly
# change use_saved_results to false in order to rerun the chain within this notebook
cp = let
use_saved_results = true
cp = if use_saved_results
h5open(pth, "r") do f
end
else
chain = sample(mp, NUTS(), 500) # this is the line that runs the MCMC sampler

# uncomment this section to save the chain to the given location
h5open("_data/claims_model_chain.h5", "w") do f
write(f, chain)
end
chain
end
cp
end;

### Visualizing the Posterior Density

This is a plot of the posterior density for all of the many, many $\mu$ parameters in our model. The black line shows the distribution which comes from our hyperparameters μ and σ. In the event of coming across a new group of interest (in our case, a new make of car), this is the prior distribution for the expected claims rate. The model has learned this from the data itself, and serves to regularize predictions.

dplot(cp, "μ")

## Predictions and Results

Here we compare four predictive models:

1. Naive, where the predicted rate for the 2007 year is the average of each groups' for 2005-2006

2. Limited Fluctuation Overall, where the "prior" in the LFC formula is the overall mean claim rate for 2005-2006

3. Limited Fluctuation Year-by-Year where the "prior" in the LFC formula is the claim rate for the ith group in 2005 and updated using 2006 claim rates.

4. Bayesian approach where the predicted claim rate is based on the bayesian model discussed above.

### Predictions

claims_summary_posterior = let
# combine the dataset by model
df = @chain train begin
groupby(:Blind_Model)
@combine begin
:n = sum(:n)
:claims = sum(:claims)
end
end
pop_μ = sum(df.claims) / sum(df.n)
df[!,:pop_μ] .= pop_μ

## Bayesian prediction
# get the mean of the posterior estimate for the ith model group
means = map(1:length(unique(df.Blind_Model))) do i

## Conclusion

This notebook shows that the Bayesian approach results in claims predictions with less total predictive error than the limited fluctuation method.

### Further thoughts

The Bayesian approach could be extended to improve its accuracy even further:

• Using vehicle make data to add more hierarchical structure to the statistical model. For example, one may observe that Porsches experience crashes at a higher rate than Volvos. LFC cannot embed that sort of overlapping hierarchy into its framework.

• The Bayesian hyperparameter provides a framework to think about "unseen" make-model combinations

### Downsides to the Bayesian approach

• Computationally intensive. Complex @models can take very long time to run (many hours), compared to relatively quick frequentest methods like maximum likelihood estimation.

If this notebook has piqued your interest in Bayesian techniques, the following books are recommended learning resources (from easiest to most difficult):

• Statistical Rethinking

• Bayes Rules!

• Bayeisan Data Analysis

## Appendices

### 1. Misc utility functions

"""
dplot(chain,parameter_string)

plot the posterior density for model parameters matching the given string.
"""
function dplot(ch,param_string)
f = Figure()
ax = Axis(f[1, 1])

# plot each group posterior
for (ind, param) in enumerate(ch.name_map.parameters)
if contains(string(param),param_string)
v = vec(getindex(ch, param).data)
density!(ax,logistic.(v), color = (:red, 0.0),
strokecolor = (:red,0.3), strokewidth = 1, strokearound = false, label="Individual group posterior")
end
end

# Plot hyperparameters
hyper_mean = mean(getindex(ch, :μ).data)
hyper_sigma = mean(getindex(ch, :σ).data)
d = density!(ax,logistic.(rand(Normal(hyper_mean,hyper_sigma),1000)),color = (:black, 0.),
strokecolor = (:black,0.3), strokewidth = 3, strokearound = false,label="Hyper-parameter distribution")

xlims!(0.0,0.03)
hideydecorations!(ax)
axislegend(ax,unique=true)
f
end

Additional legwork to get the alternate limited fluctuation approach data:

# An alternate approach to LFC where the first year becomes the prior, adjusted by data from the seonc year.
LF2 = let
# split dataset by year and recombine
df2005 = @subset(train,:Calendar_Year .== 2005)
df2006 = @subset(train,:Calendar_Year .== 2006)
df = outerjoin(df2005,df2006,on=:Blind_Model,renamecols= "_2005" => "_2006")

# use 2005 actuals as LFC prior, and the overall mean if model is missing
μ_2005 = sum(skipmissing(df.claims_2005)) / sum(skipmissing(df.n_2005))
df.assumed = coalesce.(df.p_observed_2005,μ_2005)

df.LF2_Z = min.(1, .√(coalesce.(df.claims_2006,0.) ./ LC_full_credibility))
df.LF2_μ = let Z = df.LF2_Z
# use the 2005 mean if the model not observed in 2006
Z .* coalesce.(df.p_observed_2006,μ_2005) .+ (1 .- Z) .* df.assumed
end
df
end
Calendar_Year_2005 Blind_Model n_2005 claims_2005 p_observed_2005 Calendar_Year_2006 n_2006 claims_2006 p_observed_2006 assumed LF2_Z LF2_μ
1 2005 "AR.41" 18719 174 0.00929537 2006 18868 196 0.010388 0.00929537 0.425613 0.00976039
2 2005 "D.20" 4573 27 0.00590422 2006 5446 23 0.00422328 0.00590422 0.145798 0.00565914
3 2005 "AJ.129" 50021 351 0.00701705 2006 50803 342 0.00673189 0.00701705 0.562211 0.00685673
4 2005 "AQ.17" 42684 278 0.00651298 2006 44018 263 0.00597483 0.00651298 0.49302 0.00624766
5 2005 "BW.3" 37903 215 0.00567237 2006 37752 195 0.00516529 0.00567237 0.424525 0.0054571
6 2005 "BW.167" 19098 136 0.00712116 2006 23639 158 0.00668387 0.00712116 0.382133 0.00695406
7 2005 "Y.9" 47236 383 0.00810822 2006 44970 372 0.00827218 0.00810822 0.586351 0.00820436
8 2005 "BH.29" 7850 85 0.010828 2006 7945 77 0.00969163 0.010828 0.266767 0.0105249
9 2005 "K.2" 21362 174 0.0081453 2006 24834 221 0.00889909 0.0081453 0.451942 0.00848597
10 2005 "N.8" 4215 34 0.00806643 2006 6036 30 0.00497018 0.00806643 0.166513 0.00755086
...
1238 missing "BG.8" missing missing missing 2006 1 0 0.0 0.00748894 0.0 0.00748894

### 2. Packages used

begin
using CSV
using DataFramesMeta
using Distributions
using Turing
using MCMCChains
using PlutoUI; TableOfContents()
using DataFrames
using Logging; Logging.disable_logging(Logging.Warn);
using StatsFuns
using StatisticalRethinking
using CairoMakie
using MCMCChainsStorage, HDF5
end`

Built with Julia 1.8.3 and

CSV 0.10.7
CairoMakie 0.9.1
DataFrames 1.3.6
DataFramesMeta 0.12.0
Distributions 0.25.76
HDF5 0.16.12
MCMCChains 5.5.0
MCMCChainsStorage 0.1.2
PlutoUI 0.7.48
StatisticalRethinking 4.5.2
StatsFuns 1.0.1
Turing 0.21.12