JuliaActuary
Practical, extensible, and open-source actuarial modeling and analysis.

Bayesian vs Limited Fluctuation Experience Analysis

Alternate title: Why actuaries should cease to care about the number 1082.

Introduction

This notebook briefly discusses two approaches to experience analysis (rate setting). One traditional method, Limited Fluctuation Credibility (LFC), has been around for a very long time although was never intended to be the most accurate predictive method. However, many actuaries still discuss the notion of "full" or "partially" credible indicating a reliance on the LFC concepts.

The Bayesian approach uses explicit assumptions about the statistical relationships and all data given to the model to make inferences. Small datasets lead to greater uncertainty, while larger datasets never reach a point that could be considered "fully credible", although the posterior density may narrow considerably.

This notebook argues that the Bayesian approach is superior to the LFC heuristics and should be adopted more widely in actuarial practice.

The example will use an auto-claims dataset and will use the first two years of experience to predict the third.

Limited Fluctuation Credibility

The Limited Fluctuation Method was so named because it allowed premiums to fluctuate from year to year based on experience, while limiting those fluctuations by giving less than full credibility to premiums based on limited data. In contrast, setting premium rates by giving full credibility to recent experience could be called the Full Fluctuation Method. While every credibility method serves to limit fluctuations, this method acquired its name because it was the first. The Limited Fluctuation Method, also known as Classical Credibility, is the most widely-used credibility method because it can be relatively simple to apply. Outside North America, this method is sometimes referred to as American Credibility.

Quoted from Atkinson, 2019.

Formulas

The Limited Fluctuation Method components: a credibility weight, $Z$, an Observed Rate, and a Prior Rate.

$$\text{Credibility-Weighted Rate} = Z × \text{Observed Rate} + (1 – Z) × \text{Prior Rate}$$

With probability equal to 0.9 (we'll call this LC_p) that the Observed Rate does not differ from the true rate by more than 0.05 (we'll call this LC_r).

$$\text{Claims for full credibility} = \left(\frac{\text{Z-Score}}{\text{ratio}}\right)^{2}$$

Using the inputs above, we have a z-score corresponding to 0.9 of 1.6448536269514717, so: ( 1.6448536269514717 ÷ 0.05)² ≈ 1082

LC_full_credibility = round(Int,(quantile(Normal(),1-(1-LC_p)/2)/LC_r)^2)
1082

Atkinson goes on to nicely summarize the square root method which assigns full credibility, $Z = 1$, when the number of actual claims equals or exceeds the full credibility threshold of 1082 claims.

When the number of claims is less than full credibility:

$$Z = \sqrt{\frac{\text{no. claims in group}}{\text{no. claims full credibility}}}$$

Issues with Limited Fluctuation

Bayesian Approach

Bayes' formula of updating the posterior probability conditioned on the observed data:

$$P(A\mid B) = \frac{P(B \mid A) P(A)}{P(B)}$$

This is well known to most actuaries... but generally not applied frequently in practice!

The issue is that once the probability distributions and data become non-trivial, the formula becomes analytically intractable. Work over the last several decades has set the stage for addressing these situations by developing algorithms that let you sample the Bayesian posterior, even if you can't analytically say what that is.

A full overview is beyond the scope of this notebook, which is simply to demonstrate that Limited Fluctuation Credibility is of questionable use in practice and that superior tools exist. For references on modern Bayesian statistics, see the end notes.

Note that this is describing a different, more first-principles approach than the the Buhlman Bayesian approach, which attempts to simply relate group experience to population experience. The Bayesian approach described here is much more general and extensible.

The formulation

Contrary to Limited Fluctuation, the Bayesian approach forces one to be explicit about the presumed structure of the probability model. The flexibility of the statistical model allows one to incorporate actuarial judgement in a quantitative way. For example, in this example we assume that the claims experience of each group informs a global (hyperparameter) prior distribution which we could use as a starting point for a new type of observation. More on this once the data is introduced.

Sample Claims Prediction

The data comes from an Allstate auto-claims data via Kaggle. It contains exposure level information about predictor variable and claim amounts for calendar years 2005-2007.

For simplicity, we will focus on the narrow objective of estimating the claims rate at the level of automobile make and model. We will use the years 2005 and 2006 as the training data set and then 2007 to evaluate the predictive model.

The original data is over 13 million rows, we will load an already summarized CSV and split it into train and test sets based on the Calendar_Year.

train,test = let
    pth = download("https://raw.githubusercontent.com/JuliaActuary/Learn/master/data/condensed_auto_claims.csv")
    df = CSV.read(pth,DataFrame,normalizenames=true)
    df.p_observed = df.claims ./ df.n
    

    train = df[df.Calendar_Year .< 2007,:]
    test  = df[df.Calendar_Year .== 2007,:]

    train, test
    
end
(2413×5 DataFrame
  Row │ Calendar_Year  Blind_Model  n      claims  p_observed
      │ Int64          String7      Int64  Int64   Float64
──────┼───────────────────────────────────────────────────────
    1 │          2005  K.78          6132      41  0.00668624
    2 │          2005  Q.22         35141     270  0.00768333
    3 │          2005  AR.41        18719     174  0.00929537
    4 │          2006  AR.41        18868     196  0.010388
    5 │          2005  D.20          4573      27  0.00590422
  ⋮   │       ⋮             ⋮         ⋮      ⋮         ⋮
 2410 │          2005  AJ.51            1       0  0.0
 2411 │          2006  AJ.51            1       0  0.0
 2412 │          2005  BQ.6             1       0  0.0
 2413 │          2006  BQ.6             1       0  0.0
                                             2404 rows omitted, 1289×5 DataFrame
  Row │ Calendar_Year  Blind_Model  n      claims  p_observed
      │ Int64          String7      Int64  Int64   Float64
──────┼───────────────────────────────────────────────────────
    1 │          2007  X.45         99368     789  0.00794018
    2 │          2007  Y.29         55783     435  0.00779807
    3 │          2007  P.18          4923      46  0.0093439
    4 │          2007  X.40          5573      22  0.0039476
    5 │          2007  Y.42         25800     168  0.00651163
  ⋮   │       ⋮             ⋮         ⋮      ⋮         ⋮
 1286 │          2007  AE.6             2       0  0.0
 1287 │          2007  CB.6             1       0  0.0
 1288 │          2007  BU.32            1       0  0.0
 1289 │          2007  CA.5             1       0  0.0
                                             1280 rows omitted)

Discussion of Bayesian model

Each group (make and model combination) has an expected claim rate $μ_i$, which is informed by the global hyperparameter $μ$ and variance $\sigma^2$. Then, the observed claims by group are assumed to be distributed according to a Poisson distribution.

A complete overview of modern Bayesian models is beyond the scope of this notebook, but a few key points:

@model function model_poisson(data)
    # hyperparameter that informs the prior for each group
    μ ~ Normal(-2,4) 
    σ ~ Exponential(0.25)

    # the random variable representing the average claim rate for each group
    # filldist creates a set of random variable without needing to list out each one
    μ_i ~ filldist(Normal(μ,σ),length(unique(data.Blind_Model)))

    # use the poisson appproximation to the binomial claim outcome with a 
    # logisitc link function to keep the probability between 0 and 1
    data.claims .~ Poisson.(data.n .* logistic.(μ_i))
end
model_poisson (generic function with 2 methods)

Here's what the prior distribution of claims looks like... and peeking ahead to what the posterior for a single group of claims looks like. See how even though our posterior was very wide (favoring small claims rates), it was dominated by the data to create a very narrow posterior average claims rate.

let 
    # sample from the priors before learning from any data
    ch_prior = sample(mp,Prior(),1500)
    
    f = Figure()
    ax = Axis(f[1,1],title="Prior distribution of expected claims rate for a group")
    μ = vec(ch_prior["μ_i[1]"].data)
    hist!(ax,logistic.(μ);bins=50)
    ax2 = Axis(f[2,1],title="Posterior distribution of expected claims rate for a group")
    μ = vec(cp["μ_i[1]"].data)
    hist!(ax2,logistic.(μ);bins=50)
    linkxaxes!(ax,ax2)
    f
end

The model is combined with the data and Turing.jl is used to computationally arrive at the posterior distribution for the parameters in the statistical model, $\mu$, $\mu_i$, and $\sigma$.

mp = let
    # combine the different years in the training set 
    condensed = @chain train begin
        groupby(:Blind_Model)
        @combine begin
            :n = sum(:n)
            :claims = sum(:claims)
        end
    end
    model_poisson(condensed);
end
DynamicPPL.Model{typeof(model_poisson), (:data,), (), (), Tuple{DataFrame}, Tuple{}, DynamicPPL.DefaultContext}(Main.var"workspace#3".model_poisson, (data = 1238×3 DataFrame
  Row │ Blind_Model  n       claims
      │ String7      Int64   Int64
──────┼─────────────────────────────
    1 │ K.78          14058      94
    2 │ Q.22          71401     532
    3 │ AR.41         37587     370
    4 │ D.20          10019      50
    5 │ AJ.129       100824     693
  ⋮   │      ⋮         ⋮       ⋮
 1235 │ AQ.1              1       0
 1236 │ AJ.96             1       0
 1237 │ AJ.51             2       0
 1238 │ BQ.6              2       0
                   1229 rows omitted,), NamedTuple(), DynamicPPL.DefaultContext())

Bayesian Posterior Sampling

Here's where recent advances in algorithms and computing power make Bayesian analyis possible. We can't analytically compute the posterior distribution, but we can genererate samples from the posterior such that the frequency of the sampled result appears in proportion to the true posterior density. This uses a technique called Markov Chain Monte Carlo, abbreviated MCMC. There are different algorithms in this family, and we will use one called the No-U-Turn Sampler (NUTS for short).

The result of the sample function is a set of data containing data that is generated in proportion to the posterior distributions and we will use this data to make predictions and understand the distribution of our parameters.

Run or download the chain:

# on an M1 Macbook Air, the sampler takes about 10 minutes to run. We use saved results by default in order to avoid uneccesary website build time or to make it easier for users to open the notebook and see results quickly
# change `use_saved_results` to false in order to rerun the chain within this notebook
cp = let
    use_saved_results = true
    cp = if use_saved_results
        pth = download("https://github.com/JuliaActuary/Learn/raw/2c40debad1cfaa25b27d440f442473749dfa8d13/data/claims_model_chain.h5")
         	h5open(pth, "r") do f
  				read(f, Chains)
            end
    else
        chain = sample(mp, NUTS(), 500) # this is the line that runs the MCMC sampler

        # uncomment this section to save the chain to the given location
        h5open("_data/claims_model_chain.h5", "w") do f
          write(f, chain)
        end
        chain
    end
    cp
end;

Visualizing the Posterior Density

This is a plot of the posterior density for all of the many, many $\mu$ parameters in our model. The black line shows the distribution which comes from our hyperparameters μ and σ. In the event of coming across a new group of interest (in our case, a new make of car), this is the prior distribution for the expected claims rate. The model has learned this from the data itself, and serves to regularize predictions.

dplot(cp, "μ")

Predictions and Results

Here we compare four predictive models:

  1. Naive, where the predicted rate for the 2007 year is the average of each groups' for 2005-2006

  2. Limited Fluctuation Overall, where the "prior" in the LFC formula is the overall mean claim rate for 2005-2006

  3. Limited Fluctuation Year-by-Year where the "prior" in the LFC formula is the claim rate for the ith group in 2005 and updated using 2006 claim rates.

  4. Bayesian approach where the predicted claim rate is based on the bayesian model discussed above.

Predictions

claims_summary_posterior = let
    # combine the dataset by model
    df = @chain train begin
        groupby(:Blind_Model)
        @combine begin
            :n = sum(:n)
            :claims = sum(:claims)
        end
    end
    pop_μ = sum(df.claims) / sum(df.n)
    df[!,:pop_μ] .= pop_μ

    ## Bayesian prediction
    # get the mean of the posterior estimate for the ith model group
    means = map(1:length(unique(df.Blind_Model))) do i
        d = logistic.(getindex(cp, Symbol("μ_i[$i]")).data)
        (est = mean(d), se=std(d))
    end

    df.p_observed = df.claims ./ df.n
    df.bayes_μ = [x.est for x in means]
    df.bayes_se = [x.se for x in means]

    ## Limited Fluctuation (square root rule)
    # using overall population mean
    # using the square-root rule
    df.LF_Z = min.(1, .√(df.claims ./ LC_full_credibility))
    df.LF_μ = let Z = df.LF_Z
        Z .* (df.p_observed) .+ (1 .- Z) .* pop_μ
    end

    ## Limited Fluctuation 
    # using the first year as the prior, 2nd year as new data
    # using some additional procesing to get the LF2 dataframe, see appendix
    df2005 = @subset(train, :Calendar_Year .== 2005)
    dict2005 = Dict(
        model => rate 
        for (model, rate) in zip(df2005.Blind_Model,df2005.p_observed)
    )
    
    df = leftjoin(df,LF2[:,[:Blind_Model,:LF2_μ]],on=:Blind_Model,)
    
    df = innerjoin(df,test;on=:Blind_Model,renamecols= "_train" => "_test")

    # predictions on the test set using the predictive rates time exposures
    df.pred_bayes = df.n_test .* df.bayes_μ_train
    df.pred_LF = df.n_test .* df.LF_μ_train
    df.pred_LF2 = df.n_test .* df.LF2_μ_train
    df.pred_naive = df.n_test .* df.p_observed_train

    sort!(df,:n_train,rev=true)
end

    
    
        
    
Blind_Model n_train claims_train pop_μ_train p_observed_train bayes_μ_train bayes_se_train LF_Z_train ...
1 "K.7" 382210 3365 0.0072979 0.00880406 0.00879048 0.000153097 1.0
2 "X.45" 192591 1542 0.0072979 0.0080066 0.00799648 0.000177584 1.0
3 "AU.14" 188702 1687 0.0072979 0.00894002 0.00890045 0.000206362 1.0
4 "W.16" 150304 1064 0.0072979 0.00707899 0.0070734 0.000208565 0.991647
5 "AU.11" 133445 1188 0.0072979 0.00890254 0.0088512 0.000246467 1.0
6 "BO.38" 125206 775 0.0072979 0.0061898 0.00619061 0.000208256 0.846325
7 "AO.7" 109746 1055 0.0072979 0.00961311 0.00954709 0.00028441 0.987444
8 "AJ.58" 107538 891 0.0072979 0.00828544 0.00823783 0.000273748 0.907455
9 "AJ.52" 107252 691 0.0072979 0.00644277 0.00644673 0.000249251 0.799145
10 "AJ.129" 100824 693 0.0072979 0.00687336 0.00684067 0.000248461 0.8003
...
1224 "AJ.118" 1 0 0.0072979 0.0 0.00639861 0.00166099 0.0

Bayesian approach has lower error

Looking at the accumulated absolute error, the bayesian approach has about 16% lower error than the Limited Fluctuation approaches.

let df = claims_summary_posterior
    f = Figure()
    ax = Axis(f[1,1],title="Cumulative Predictive Residual Error", subtitle="(Lower is better predictive accuracy)",xlabel=L"$i^{th}$ vehicle group",ylabel=L"absolute error")
    lines!(ax,cumsum(abs.(df.pred_LF .- df.claims_test)),label="Limited Fluctuation Overall",color=(:purple,0.5))
    lines!(ax,cumsum(abs.(df.pred_LF2 .- df.claims_test)),label="Limited Fluctuation Year-by-Year",color=(:blue,0.5))
    lines!(ax,cumsum(abs.(df.pred_bayes .- df.claims_test)),label="Bayesian",color=(:red,0.5))
    lines!(ax,cumsum(abs.(df.pred_naive .- df.claims_test)),label="Naive",color=(:grey10,0.5))
    # xlims!(0,40)
    # Legend(f[1,1],[s1,s2],["LFC"halign=:right,valign=:top)
    axislegend(ax,position=:rb)
    f
end

Total Actual to Expected

Here, offsetting errors happen to make the naive and LF year-by-year (the latter is a good proxy for the former) such that their total A/E is better than the Bayesian approach.

Given the lower error of the Bayesian approach, one would expect that over multiple years that it would produce more accurate predictions than the alternative methods.

let 
    post = claims_summary_posterior
    X = @chain vcat(train,test) begin
        groupby(:Calendar_Year)
        @combine :claim_rate = sum(:claims) / sum(:n)
    end
    f = Figure()
    ax = Axis(f[1,1],xlabel="year",ylabel="claims rate")
    scatter!(ax,X.Calendar_Year,X.claim_rate, label="data")
    scatter!(ax,[2007],[sum(post.pred_bayes)/sum(post.n_test)], label="Bayes", marker=:hline,markersize=30)
    scatter!(ax,[2007],[sum(post.pred_LF)/sum(post.n_test)], label="LF Overall",marker=:hline,markersize=30)
    scatter!(ax,[2007],[sum(post.pred_LF2)/sum(post.n_test)], label="LF Year-by-Year",marker=:hline,markersize=30)
    scatter!(ax,[2007],[sum(post.pred_naive)/sum(post.n_test)], label="naive",marker=:hline,markersize=30)
    axislegend(ax,position=:rb)
    ylims!(0.005,0.008)
    f
end

Some Remarks about the Results

With limited, real-world data drawing conclusions is a little bit messy because we don't know what the ground-truth should be, but here are some thoughts that seem to be consistent with what the data and results suggest:

Conclusion

This notebook shows that the Bayesian approach results in claims predictions with less total predictive error than the limited fluctuation method.

Further thoughts

The Bayesian approach could be extended to improve its accuracy even further:

Downsides to the Bayesian approach

Further Reading

If this notebook has piqued your interest in Bayesian techniques, the following books are recommended learning resources (from easiest to most difficult):

Appendices

1. Misc utility functions

"""
    dplot(chain,parameter_string)

plot the posterior density for model parameters matching the given string.
"""
function dplot(ch,param_string)
    f = Figure()
    ax = Axis(f[1, 1])

    # plot each group posterior
   for (ind, param) in enumerate(ch.name_map.parameters)
        if contains(string(param),param_string)
            v = vec(getindex(ch, param).data)
            density!(ax,logistic.(v), color = (:red, 0.0),
    strokecolor = (:red,0.3), strokewidth = 1, strokearound = false, label="Individual group posterior")
        end
   end
    
    # Plot hyperparameters
    hyper_mean = mean(getindex(ch, :μ).data)
    hyper_sigma = mean(getindex(ch, :σ).data)
    d = density!(ax,logistic.(rand(Normal(hyper_mean,hyper_sigma),1000)),color = (:black, 0.),
    strokecolor = (:black,0.3), strokewidth = 3, strokearound = false,label="Hyper-parameter distribution")

    
    xlims!(0.0,0.03)
    hideydecorations!(ax)
    axislegend(ax,unique=true)
    f
end

Additional legwork to get the alternate limited fluctuation approach data:

# An alternate approach to LFC where the first year becomes the prior, adjusted by data from the seonc year.
LF2 = let 
    # split dataset by year and recombine
    df2005 = @subset(train,:Calendar_Year .== 2005)
    df2006 = @subset(train,:Calendar_Year .== 2006)
    df = outerjoin(df2005,df2006,on=:Blind_Model,renamecols= "_2005" => "_2006")

    # use 2005 actuals as LFC prior, and the overall mean if model is missing
    μ_2005 = sum(skipmissing(df.claims_2005)) / sum(skipmissing(df.n_2005))
    df.assumed = coalesce.(df.p_observed_2005,μ_2005)
    
    df.LF2_Z = min.(1, .√(coalesce.(df.claims_2006,0.) ./ LC_full_credibility))
    df.LF2_μ = let Z = df.LF2_Z
        # use the 2005 mean if the model not observed in 2006
        Z .* coalesce.(df.p_observed_2006,μ_2005) .+ (1 .- Z) .* df.assumed
    end
    df
end
Calendar_Year_2005 Blind_Model n_2005 claims_2005 p_observed_2005 Calendar_Year_2006 n_2006 claims_2006 p_observed_2006 assumed LF2_Z LF2_μ
1 2005 "AR.41" 18719 174 0.00929537 2006 18868 196 0.010388 0.00929537 0.425613 0.00976039
2 2005 "D.20" 4573 27 0.00590422 2006 5446 23 0.00422328 0.00590422 0.145798 0.00565914
3 2005 "AJ.129" 50021 351 0.00701705 2006 50803 342 0.00673189 0.00701705 0.562211 0.00685673
4 2005 "AQ.17" 42684 278 0.00651298 2006 44018 263 0.00597483 0.00651298 0.49302 0.00624766
5 2005 "BW.3" 37903 215 0.00567237 2006 37752 195 0.00516529 0.00567237 0.424525 0.0054571
6 2005 "BW.167" 19098 136 0.00712116 2006 23639 158 0.00668387 0.00712116 0.382133 0.00695406
7 2005 "Y.9" 47236 383 0.00810822 2006 44970 372 0.00827218 0.00810822 0.586351 0.00820436
8 2005 "BH.29" 7850 85 0.010828 2006 7945 77 0.00969163 0.010828 0.266767 0.0105249
9 2005 "K.2" 21362 174 0.0081453 2006 24834 221 0.00889909 0.0081453 0.451942 0.00848597
10 2005 "N.8" 4215 34 0.00806643 2006 6036 30 0.00497018 0.00806643 0.166513 0.00755086
...
1238 missing "BG.8" missing missing missing 2006 1 0 0.0 0.00748894 0.0 0.00748894

2. Packages used

begin 
    using CSV
    using DataFramesMeta
    using Distributions
    using Turing
    using MCMCChains
    using PlutoUI; TableOfContents()
    using DataFrames
    using Logging; Logging.disable_logging(Logging.Warn);
    using StatsFuns
    using StatisticalRethinking
    using CairoMakie
    using MCMCChainsStorage, HDF5
end

Built with Julia 1.8.5 and

CSV 0.10.7
CairoMakie 0.9.1
DataFrames 1.3.6
DataFramesMeta 0.12.0
Distributions 0.25.76
HDF5 0.16.12
MCMCChains 5.5.0
MCMCChainsStorage 0.1.2
PlutoUI 0.7.48
StatisticalRethinking 4.5.2
StatsFuns 1.0.1
Turing 0.21.12

Run this Pluto Notebook

To run this page locally, download this file and open it with Pluto.jl.

The packages in JuliaActuary are open-source and liberally licensed (MIT License) to allow wide private and commercial usage of the packages, like the base Julia language and many other packages in the ecosystem. See terms of this site.