//OFF
/*
if `c(version)' >= 13 cls
cd "C:/Users/au24026/Documents/STATA/StataHacks/docs/Coding Stata/Cox_regression"
do2md using "cox_regression"
*/
use drug2, clear
//ON
/***
# Howto hide small steps in a Kaplan Meyer plot
In Danish national registries it is forbidden to report smaller groups than 5.
Since steps in Kaplan Meyer plots often are based on steps less than 5
reporting Kaplan Meyer in small dataset is a problem.
A solution to either use a lowess smoothed version of the Kaplan Meyer or to
make Kaplan Meyer in steps of 5 is presented here.
## The example data
We use a clasical Stata example dataset:
`webuse drug2, clear`
***/
/**/stset, clear
/***
The variables are:
***/
describe
/***
And the data looks like (each row is a person):
***/
list in 1/6, sepby(studytime) abbreviate(20)
/***
## Generating the data behind the Kaplan Meyer plots
First **sts generate** is used to find the failure probabilitSes from the
survival probabilities.
***/
stset studytime, failure(died) noshow
/**/sts generate survival = s
/**/generate failure = 1 - survival
/**/label variable failure "KM failure"
/**/format failure %6.2f
/***
A lowess smoothed **twoway** graph of failure vs studytime is one way to report
the Kaplan Meyer plot.
## Making step size to 5
The variable n_prsns counts the the number of persons at each time (variable
studytime). The count is only saved in the last row for each time.
***/
/**/bysort studytime: generate n_prsns = cond(_n == _N, _N, 0)
/***
To get the accumulated number of persons over time one can use relative
references and the function **cond**:
```
generate acc_prsns = n_prsns if _n == 1
replace acc_prsns = cond(acc_prsns[_n-1] < 5, n_prsns + acc_prsns[_n-1], n_prsns) if _n > 1
if acc_prsns[_N] < 5 {
replace acc_prsns = n_prsns + acc_prsns[_N-1] if _n == _N
replace acc_prsns = . if _n == _N-1
}
```
***/
//OFF
generate acc_prsns = n_prsns if _n == 1
replace acc_prsns = cond(acc_prsns[_n-1] < 5, n_prsns + acc_prsns[_n-1], n_prsns) if _n > 1
if acc_prsns[_N] < 5 {
replace acc_prsns = n_prsns + acc_prsns[_N-1] if _n == _N
replace acc_prsns = . if _n == _N-1
}
//ON
/***
Only the failure values based on at least 5 persons are selected:
***/
/**/generate failure2 = failure if acc_prsns > 4
/**/quietly summarize failure2
/**/replace failure2 = `r(min)' if _n == 1
/**/replace failure2 = `r(max)' if _n == _N
/**/label variable failure2 "KM failure with steps of at least 5"
/**/format failure2 %6.2f
/***
## A graph comparison
Finally a graphical comparison of the classical Kaplan Meyer, the lowess
smoothed version and the Kaplan Meyer based on steps of at least 5 persons is
presented:
***/
twoway ///
(line failure studytime, lcolor(black) connect(stairstep)) ///
(lowess failure studytime, lcolor(blue) ) ///
(line failure2 studytime, lcolor(red) connect(stairstep)) ///
, legend(on position(5) ring(0) cols(1) ///
order(1 "Kaplan Meyer" 2 "Kaplan Meyer lowess" 3 "Kaplan Meyer steps of 5") ///
) ///
name(km, replace)
//OFF
graph export km.png, width(1600) height(1200) replace
//ON
/***
***/