Using MortaltiyTables.jl with DataFrames

JuliaActuary is an ecosystem of packages that makes Julia the easiest language to get started for actuarial workflows.
mortalitytables
dataframes
tutorial

Using MortaltiyTables.jl with DataFrames

MortalityTables.jl standard representation

MortalityTables.jl stores the rates in a very efficient manner as a collection of vectors indexed by attained age.

using MortalityTables

name = "2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB"
vbt = MortalityTables.table(name)
1
Or any other mort.soa.org table
MortalityTable (Insured Lives Mortality):
   Name:
       2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB
   Fields: 
       (:select, :ultimate, :metadata)
   Provider:
       Society of Actuaries
   mort.SOA.org ID:
       1118
   mort.SOA.org link:
       https://mort.soa.org/ViewTable.aspx?&TableIdentity=1118
   Description:
       2001 Valuation Basic Table (VBT) Residual Standard Select and Ultimate Table -  Male Nonsmoker. Basis: Age Nearest Birthday. Minimum Select Age: 0. Maximum Select Age: 99. Minimum Ultimate Age: 25. Maximum Ultimate Age: 120

First, we include the package, and then we’ll pick a table, where all of the mort.soa.org tables are mirrored into your MortalityTables.jl installation.

To see how the data is represented, we can look at the the select data for a 55 year old and see the attained age and mortality rates:

vbt.select[55]
66-element OffsetArray(::Vector{Float64}, 55:120) with eltype Float64 with indices 55:120:
 0.00139
 0.00218
 0.00288
 0.00344
 0.00403
 0.00485
 0.00599
 0.00736
 0.00879
 0.01059
 0.01211
 0.014
 0.01584
 ⋮
 0.53905
 0.57031
 0.60339
 0.63838
 0.67541
 0.71458
 0.75603
 0.79988
 0.84627
 0.89536
 0.94729
 1.0

This is very efficient and convienent for modeling, but a lot of times you want the data matched up with policy data in a DataFrame.

Getting data into a dataframe

Generate sample data

using DataFrames

sample_size = 10_000

sample_data = let
    # generate fake data
    df = DataFrame(
        "sex" => rand(["Male","Female"],sample_size),
        "smoke" => rand(["Smoker","Nonsmoker"],sample_size),
        "issue_age" => rand(25:65,sample_size),
        )
    
    # a random offset of issue age is the current attained age
    df.attained_age = df.issue_age .+ rand(1:10,sample_size)
    df
end
10000×4 DataFrame
9975 rows omitted
Row sex smoke issue_age attained_age
String String Int64 Int64
1 Female Nonsmoker 65 71
2 Male Nonsmoker 33 35
3 Male Nonsmoker 44 45
4 Male Smoker 36 37
5 Female Smoker 57 64
6 Female Smoker 40 47
7 Female Nonsmoker 31 40
8 Female Nonsmoker 44 45
9 Male Smoker 40 49
10 Male Nonsmoker 47 56
11 Male Smoker 48 56
12 Male Nonsmoker 54 58
13 Female Nonsmoker 25 34
9989 Male Smoker 36 45
9990 Male Smoker 39 44
9991 Male Nonsmoker 49 50
9992 Male Smoker 45 55
9993 Male Smoker 65 70
9994 Male Smoker 60 68
9995 Male Nonsmoker 27 36
9996 Male Nonsmoker 52 60
9997 Female Nonsmoker 56 65
9998 Female Nonsmoker 40 47
9999 Female Nonsmoker 34 39
10000 Female Nonsmoker 63 71

Define the table set you want to use

There are a lot of different possible combinations of parameters that you might want to use, such as rates that vary by sex, risk class, table set (VBT/CSO/etc), smoking status, relative risk, ALB/ANB, etc.

It’s easy to define the parameters applicable to your assumption set. Here, we’ll use a dictionary to define the relationship:

rate_map = Dict(
    "Male" => Dict(
        "Smoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Smoker, ANB"),
        "Nonsmoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB"),
        ),
    
    "Female" => Dict(
        "Smoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Female Smoker, ANB"),
        "Nonsmoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Female Nonsmoker, ANB"),
        )
    );

And then we’ll define a function to look up the relevant rate. Note how the function matches the levels we defined for the assumption set dictionary above.

function rate_lookup(assumption_map,sex,smoke,issue_age,attained_age)
    # pick the relevant table
    table = assumption_map[sex][smoke]
    
    # check if the select rate exists, otherwise look to the ulitmate table
    if issue_age in eachindex(table.select)
        table.select[issue_age][attained_age]
    else
        table.ultimate[attained_age]
    end
end
rate_lookup (generic function with 1 method)

Lining up with dataframe

By mapping each row’s data to the lookup function, we get a vector of rates for our data:

rates = map(eachrow(sample_data)) do row
    rate_lookup(rate_map, row.sex, row.smoke, row.issue_age, row.attained_age)
end
10000-element Vector{Float64}:
 0.00971
 0.00056
 0.00091
 0.001
 0.01237
 0.00292
 0.00081
 0.00066
 0.00527
 0.00483
 0.00977
 0.00373
 0.00064
 ⋮
 0.00362
 0.00263
 0.00127
 0.00981
 0.02741
 0.02602
 0.00087
 0.00615
 0.00744
 0.00145
 0.00059
 0.01198

And finally, we can just add this to the dataframe:

sample_data.expectation = rates

sample_data
10000×5 DataFrame
9975 rows omitted
Row sex smoke issue_age attained_age expectation
String String Int64 Int64 Float64
1 Female Nonsmoker 65 71 0.00971
2 Male Nonsmoker 33 35 0.00056
3 Male Nonsmoker 44 45 0.00091
4 Male Smoker 36 37 0.001
5 Female Smoker 57 64 0.01237
6 Female Smoker 40 47 0.00292
7 Female Nonsmoker 31 40 0.00081
8 Female Nonsmoker 44 45 0.00066
9 Male Smoker 40 49 0.00527
10 Male Nonsmoker 47 56 0.00483
11 Male Smoker 48 56 0.00977
12 Male Nonsmoker 54 58 0.00373
13 Female Nonsmoker 25 34 0.00064
9989 Male Smoker 36 45 0.00362
9990 Male Smoker 39 44 0.00263
9991 Male Nonsmoker 49 50 0.00127
9992 Male Smoker 45 55 0.00981
9993 Male Smoker 65 70 0.02741
9994 Male Smoker 60 68 0.02602
9995 Male Nonsmoker 27 36 0.00087
9996 Male Nonsmoker 52 60 0.00615
9997 Female Nonsmoker 56 65 0.00744
9998 Female Nonsmoker 40 47 0.00145
9999 Female Nonsmoker 34 39 0.00059
10000 Female Nonsmoker 63 71 0.01198