Using MortaltiyTables.jl with DataFrames

MortalityTables.jl standard representation

MortalityTables.jl stores the rates in a very efficient manner as a collection of vectors indexed by attained age.

using MortalityTables

name = "2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB"
vbt = MortalityTables.table(name)

1: Or any other mort.soa.org table

MortalityTable (Insured Lives Mortality):
   Name:
       2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB
   Fields: 
       (:select, :ultimate, :metadata)
   Provider:
       Society of Actuaries
   mort.SOA.org ID:
       1118
   mort.SOA.org link:
       https://mort.soa.org/ViewTable.aspx?&TableIdentity=1118
   Description:
       2001 Valuation Basic Table (VBT) Residual Standard Select and Ultimate Table -  Male Nonsmoker. Basis: Age Nearest Birthday. Minimum Select Age: 0. Maximum Select Age: 99. Minimum Ultimate Age: 25. Maximum Ultimate Age: 120

First, we include the package, and then we’ll pick a table, where all of the mort.soa.org tables are mirrored into your MortalityTables.jl installation.

To see how the data is represented, we can look at the the select data for a 55 year old and see the attained age and mortality rates:

vbt.select[55]

66-element OffsetArray(::Vector{Float64}, 55:120) with eltype Float64 with indices 55:120:
 0.00139
 0.00218
 0.00288
 0.00344
 0.00403
 0.00485
 0.00599
 0.00736
 0.00879
 0.01059
 0.01211
 0.014
 0.01584
 ⋮
 0.53905
 0.57031
 0.60339
 0.63838
 0.67541
 0.71458
 0.75603
 0.79988
 0.84627
 0.89536
 0.94729
 1.0

This is very efficient and convienent for modeling, but a lot of times you want the data matched up with policy data in a DataFrame.

Getting data into a dataframe

Generate sample data

using DataFrames

sample_size = 10_000

sample_data = let
    # generate fake data
    df = DataFrame(
        "sex" => rand(["Male","Female"],sample_size),
        "smoke" => rand(["Smoker","Nonsmoker"],sample_size),
        "issue_age" => rand(25:65,sample_size),
        )
    
    # a random offset of issue age is the current attained age
    df.attained_age = df.issue_age .+ rand(1:10,sample_size)
    df
end

10000×4 DataFrame

9975 rows omitted

Row	sex	smoke	issue_age	attained_age
	String	String	Int64	Int64
1	Female	Nonsmoker	65	71
2	Male	Nonsmoker	33	35
3	Male	Nonsmoker	44	45
4	Male	Smoker	36	37
5	Female	Smoker	57	64
6	Female	Smoker	40	47
7	Female	Nonsmoker	31	40
8	Female	Nonsmoker	44	45
9	Male	Smoker	40	49
10	Male	Nonsmoker	47	56
11	Male	Smoker	48	56
12	Male	Nonsmoker	54	58
13	Female	Nonsmoker	25	34
⋮	⋮	⋮	⋮	⋮
9989	Male	Smoker	36	45
9990	Male	Smoker	39	44
9991	Male	Nonsmoker	49	50
9992	Male	Smoker	45	55
9993	Male	Smoker	65	70
9994	Male	Smoker	60	68
9995	Male	Nonsmoker	27	36
9996	Male	Nonsmoker	52	60
9997	Female	Nonsmoker	56	65
9998	Female	Nonsmoker	40	47
9999	Female	Nonsmoker	34	39
10000	Female	Nonsmoker	63	71

Define the table set you want to use

There are a lot of different possible combinations of parameters that you might want to use, such as rates that vary by sex, risk class, table set (VBT/CSO/etc), smoking status, relative risk, ALB/ANB, etc.

It’s easy to define the parameters applicable to your assumption set. Here, we’ll use a dictionary to define the relationship:

rate_map = Dict(
    "Male" => Dict(
        "Smoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Smoker, ANB"),
        "Nonsmoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB"),
        ),
    
    "Female" => Dict(
        "Smoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Female Smoker, ANB"),
        "Nonsmoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Female Nonsmoker, ANB"),
        )
    );

And then we’ll define a function to look up the relevant rate. Note how the function matches the levels we defined for the assumption set dictionary above.

function rate_lookup(assumption_map,sex,smoke,issue_age,attained_age)
    # pick the relevant table
    table = assumption_map[sex][smoke]
    
    # check if the select rate exists, otherwise look to the ulitmate table
    if issue_age in eachindex(table.select)
        table.select[issue_age][attained_age]
    else
        table.ultimate[attained_age]
    end
end

rate_lookup (generic function with 1 method)

Lining up with dataframe

By mapping each row’s data to the lookup function, we get a vector of rates for our data:

rates = map(eachrow(sample_data)) do row
    rate_lookup(rate_map, row.sex, row.smoke, row.issue_age, row.attained_age)
end

10000-element Vector{Float64}:
 0.00971
 0.00056
 0.00091
 0.001
 0.01237
 0.00292
 0.00081
 0.00066
 0.00527
 0.00483
 0.00977
 0.00373
 0.00064
 ⋮
 0.00362
 0.00263
 0.00127
 0.00981
 0.02741
 0.02602
 0.00087
 0.00615
 0.00744
 0.00145
 0.00059
 0.01198

And finally, we can just add this to the dataframe:

sample_data.expectation = rates

sample_data

10000×5 DataFrame

9975 rows omitted

Row	sex	smoke	issue_age	attained_age	expectation
	String	String	Int64	Int64	Float64
1	Female	Nonsmoker	65	71	0.00971
2	Male	Nonsmoker	33	35	0.00056
3	Male	Nonsmoker	44	45	0.00091
4	Male	Smoker	36	37	0.001
5	Female	Smoker	57	64	0.01237
6	Female	Smoker	40	47	0.00292
7	Female	Nonsmoker	31	40	0.00081
8	Female	Nonsmoker	44	45	0.00066
9	Male	Smoker	40	49	0.00527
10	Male	Nonsmoker	47	56	0.00483
11	Male	Smoker	48	56	0.00977
12	Male	Nonsmoker	54	58	0.00373
13	Female	Nonsmoker	25	34	0.00064
⋮	⋮	⋮	⋮	⋮	⋮
9989	Male	Smoker	36	45	0.00362
9990	Male	Smoker	39	44	0.00263
9991	Male	Nonsmoker	49	50	0.00127
9992	Male	Smoker	45	55	0.00981
9993	Male	Smoker	65	70	0.02741
9994	Male	Smoker	60	68	0.02602
9995	Male	Nonsmoker	27	36	0.00087
9996	Male	Nonsmoker	52	60	0.00615
9997	Female	Nonsmoker	56	65	0.00744
9998	Female	Nonsmoker	40	47	0.00145
9999	Female	Nonsmoker	34	39	0.00059
10000	Female	Nonsmoker	63	71	0.01198