Howto smoothe a Kaplan Meyer plot

Motivation

In Danish national registries it is forbidden identify individuals in reports. One solution is to smooth the Kaplan-Meier curve. Here, the smoothing techniques, lowess and exponentially weighted moving average are presented and compared.

The example data

We use a clasical Stata example dataset:

webuse drug2, clear

stset, clear

The variables are:

describe

Contains data from drug2.dta
 Observations:            48                  Patient Survival in Drug Trial
    Variables:             4                  9 Oct 2017 08:17
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
studytime       byte    %8.0g                 Months to death or end of exp.
died            byte    %8.0g                 1 if patient died
drug            byte    %8.0g                 Drug type (0=placebo)
age             byte    %8.0g                 Patient's age at start of exp.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: 

And the data looks like (each row is a person):

list in 1/6, sepby(studytime) abbreviate(20)

     +-------------------------------+
     | studytime   died   drug   age |
     |-------------------------------|
  1. |         1      1      0    61 |
  2. |         1      1      0    65 |
     |-------------------------------|
  3. |         2      1      0    59 |
     |-------------------------------|
  4. |         3      1      0    52 |
     |-------------------------------|
  5. |         4      1      0    56 |
  6. |         4      1      0    67 |
     +-------------------------------+

Generating the data behind the Kaplan Meyer plots

First sts generate is used to find the failure probabilities.

stset studytime, failure(died) noshow
sts generate fail_sts = f
label variable fail_sts "KM failure"

To get the proper Kaplan-Meier plot, the time variable is shifted.

sort _t
generate _t_shift = _t[_n+1]
label variable _t_shift "Time (Months)"

A lowess smoothed approximation can be made from the command -lowess-.

lowess fail_sts _t_shift, nograph generate(fail_lowess)

A EWMA smoothed approximation with weight 0.8 can be made recursive.

generate fail_ewma = . 
replace fail_ewma = cond(_n == 1, fail_sts, ///
    0.8*(fail_sts - fail_ewma[_n-1]) + fail_ewma[_n-1])

The Kaplan-Meier versions are formatted and compared in a graph.

format fail* %6.2f
twoway ///
    (line fail_sts _t_shift, sort connect(step)) ///
    (line fail_lowess _t_shift) ///
    (line fail_ewma _t_shift) ///
        , legend(on position(5) ring(0)  cols(1) ///
                order(- "Kaplan Meyer" 1 "True" 2 "lowess" 3 "EWMA")) ///
        name(km, replace)

The EWMA with weight 0.8 is closer to the true Kaplan-Meier curve than the lowess curve. The lowess curve is smother.

References

  1. Statalist: Smoothing Kaplan Meier Curves

The do file for this document

Last update: 2022-04-18, Stata version 17