# Howto smoothe a Kaplan Meyer plot

## Motivation

In Danish national registries it is forbidden identify individuals in reports. One solution is to smooth the Kaplan-Meier curve. Here, the smoothing techniques, lowess and exponentially weighted moving average are presented and compared.

## The example data

We use a clasical Stata example dataset:

webuse drug2, clear

stset, clear


The variables are:

describe


Contains data from drug2.dta
Observations:            48                  Patient Survival in Drug Trial
Variables:             4                  9 Oct 2017 08:17
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
name         type    format    label      Variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
studytime       byte    %8.0g                 Months to death or end of exp.
died            byte    %8.0g                 1 if patient died
drug            byte    %8.0g                 Drug type (0=placebo)
age             byte    %8.0g                 Patient's age at start of exp.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:


And the data looks like (each row is a person):

list in 1/6, sepby(studytime) abbreviate(20)


+-------------------------------+
| studytime   died   drug   age |
|-------------------------------|
1. |         1      1      0    61 |
2. |         1      1      0    65 |
|-------------------------------|
3. |         2      1      0    59 |
|-------------------------------|
4. |         3      1      0    52 |
|-------------------------------|
5. |         4      1      0    56 |
6. |         4      1      0    67 |
+-------------------------------+


## Generating the data behind the Kaplan Meyer plots

First sts generate is used to find the failure probabilities.

stset studytime, failure(died) noshow
sts generate fail_sts = f
label variable fail_sts "KM failure"


To get the proper Kaplan-Meier plot, the time variable is shifted.

sort _t
generate _t_shift = _t[_n+1]
label variable _t_shift "Time (Months)"


A lowess smoothed approximation can be made from the command -lowess-.

lowess fail_sts _t_shift, nograph generate(fail_lowess)


A EWMA smoothed approximation with weight 0.8 can be made recursive.

generate fail_ewma = .
replace fail_ewma = cond(_n == 1, fail_sts, ///
0.8*(fail_sts - fail_ewma[_n-1]) + fail_ewma[_n-1])


The Kaplan-Meier versions are formatted and compared in a graph.

format fail* %6.2f
twoway ///
(line fail_sts _t_shift, sort connect(step)) ///
(line fail_lowess _t_shift) ///
(line fail_ewma _t_shift) ///
, legend(on position(5) ring(0)  cols(1) ///
order(- "Kaplan Meyer" 1 "True" 2 "lowess" 3 "EWMA")) ///
name(km, replace)


The EWMA with weight 0.8 is closer to the true Kaplan-Meier curve than the lowess curve. The lowess curve is smother.

## References

The do file for this document

Last update: 2022-04-18, Stata version 17