Using MortaltiyTables.jl with DataFrames

JuliaActuary is an ecosystem of packages that makes Julia the easiest language to get started for actuarial workflows.

Using MortaltiyTables.jl with DataFrames

MortalityTables.jl standard representation

MortalityTables.jl stores the rates in a very efficient manner as a collection of vectors indexed by attained age.

using MortalityTables

name = "2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB"
vbt = MortalityTables.table(name)
1
Or any other mort.soa.org table
MortalityTable (Insured Lives Mortality):
   Name:
       2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB
   Fields: 
       (:select, :ultimate, :metadata)
   Provider:
       Society of Actuaries
   mort.SOA.org ID:
       1118
   mort.SOA.org link:
       https://mort.soa.org/ViewTable.aspx?&TableIdentity=1118
   Description:
       2001 Valuation Basic Table (VBT) Residual Standard Select and Ultimate Table -  Male Nonsmoker. Basis: Age Nearest Birthday. Minimum Select Age: 0. Maximum Select Age: 99. Minimum Ultimate Age: 25. Maximum Ultimate Age: 120

First, we include the package, and then we’ll pick a table, where all of the mort.soa.org tables are mirrored into your MortalityTables.jl installation.

To see how the data is represented, we can look at the the select data for a 55 year old and see the attained age and mortality rates:

vbt.select[55]
66-element OffsetArray(::Vector{Float64}, 55:120) with eltype Float64 with indices 55:120:
 0.00139
 0.00218
 0.00288
 0.00344
 0.00403
 0.00485
 0.00599
 0.00736
 0.00879
 0.010590.63838
 0.67541
 0.71458
 0.75603
 0.79988
 0.84627
 0.89536
 0.94729
 1.0

This is very efficient and convenient for modeling, but a lot of times you want the data matched up with policy data in a DataFrame.

Getting data into a dataframe

Generate sample data

using DataFrames

sample_size = 10_000

sample_data = let
    # generate fake data
    df = DataFrame(
        "sex" => rand(["Male","Female"],sample_size),
        "smoke" => rand(["Smoker","Nonsmoker"],sample_size),
        "issue_age" => rand(25:65,sample_size),
        )
    
    # a random offset of issue age is the current attained age
    df.attained_age = df.issue_age .+ rand(1:10,sample_size)
    df
end
10000×4 DataFrame
9975 rows omitted
Row sex smoke issue_age attained_age
String String Int64 Int64
1 Male Smoker 36 38
2 Male Nonsmoker 62 67
3 Female Smoker 58 62
4 Female Smoker 35 36
5 Male Nonsmoker 47 54
6 Female Smoker 63 64
7 Male Nonsmoker 32 35
8 Female Nonsmoker 25 29
9 Female Smoker 37 38
10 Male Nonsmoker 27 33
11 Male Smoker 57 59
12 Female Smoker 37 47
13 Male Smoker 53 62
9989 Male Nonsmoker 27 35
9990 Female Nonsmoker 50 57
9991 Female Nonsmoker 63 70
9992 Male Smoker 36 41
9993 Male Smoker 47 56
9994 Female Nonsmoker 25 34
9995 Male Smoker 58 64
9996 Male Smoker 32 35
9997 Female Smoker 46 55
9998 Female Nonsmoker 39 41
9999 Female Smoker 47 48
10000 Female Smoker 48 58

Define the table set you want to use

There are a lot of different possible combinations of parameters that you might want to use, such as rates that vary by sex, risk class, table set (VBT/CSO/etc), smoking status, relative risk, ALB/ANB, etc.

It’s easy to define the parameters applicable to your assumption set. Here, we’ll use a dictionary to define the relationship:

rate_map = Dict(
    "Male" => Dict(
        "Smoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Smoker, ANB"),
        "Nonsmoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB"),
        ),
    
    "Female" => Dict(
        "Smoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Female Smoker, ANB"),
        "Nonsmoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Female Nonsmoker, ANB"),
        )
    );

And then we’ll define a function to look up the relevant rate. Note how the function matches the levels we defined for the assumption set dictionary above.

function rate_lookup(assumption_map,sex,smoke,issue_age,attained_age)
    # pick the relevant table
    table = assumption_map[sex][smoke]
    
    # check if the select rate exists, otherwise look to the ultimate table
    if issue_age in eachindex(table.select)
        table.select[issue_age][attained_age]
    else
        table.ultimate[attained_age]
    end
end
rate_lookup (generic function with 1 method)

Lining up with dataframe

By mapping each row’s data to the lookup function, we get a vector of rates for our data:

rates = map(eachrow(sample_data)) do row
    rate_lookup(rate_map, row.sex, row.smoke, row.issue_age, row.attained_age)
end
10000-element Vector{Float64}:
 0.00128
 0.01019
 0.00872
 0.00055
 0.00361
 0.00642
 0.00064
 0.00035
 0.00061
 0.000710.00209
 0.01047
 0.00064
 0.01746
 0.00125
 0.00741
 0.00047
 0.00212
 0.00995

And finally, we can just add this to the dataframe:

sample_data.expectation = rates

sample_data
10000×5 DataFrame
9975 rows omitted
Row sex smoke issue_age attained_age expectation
String String Int64 Int64 Float64
1 Male Smoker 36 38 0.00128
2 Male Nonsmoker 62 67 0.01019
3 Female Smoker 58 62 0.00872
4 Female Smoker 35 36 0.00055
5 Male Nonsmoker 47 54 0.00361
6 Female Smoker 63 64 0.00642
7 Male Nonsmoker 32 35 0.00064
8 Female Nonsmoker 25 29 0.00035
9 Female Smoker 37 38 0.00061
10 Male Nonsmoker 27 33 0.00071
11 Male Smoker 57 59 0.00729
12 Female Smoker 37 47 0.00334
13 Male Smoker 53 62 0.01684
9989 Male Nonsmoker 27 35 0.0008
9990 Female Nonsmoker 50 57 0.00375
9991 Female Nonsmoker 63 70 0.01
9992 Male Smoker 36 41 0.00209
9993 Male Smoker 47 56 0.01047
9994 Female Nonsmoker 25 34 0.00064
9995 Male Smoker 58 64 0.01746
9996 Male Smoker 32 35 0.00125
9997 Female Smoker 46 55 0.00741
9998 Female Nonsmoker 39 41 0.00047
9999 Female Smoker 47 48 0.00212
10000 Female Smoker 48 58 0.00995