/*
cls
cd "C:\Users\au24026\Documents\STATA\StataHacks\docs\Statistical Analysis\Understanding mlogit"
do2md using understanding_mlogit
*/
/***
# Understanding mlogit and margins
The objective is to understand what multinomial regression -mlogit- returns.
And to show how to report these results using the commands -margins-, -coefplot-, -mchange-,
and -mchangeplot-. The last two commands are from the package spost13.
Other sources on -mlogit- in Stata:
* [idre](https://stats.idre.ucla.edu/stata/dae/multinomiallogistic-regression/)
# Stata example dataset
"We have data on the type of health insurance available to 616 psychologically
depressed subjects in the United States (Tarlov et al. 1989; Wells et al. 1989).
The insurance is categorized as either an indemnity plan (that is, regular
fee-for-service insurance, which may have a deductible or coinsurance rate) or
a prepaid plan (a fixed up-front payment allowing subsequent unlimited use as
provided, for instance, by an HMO).
The third possibility is that the subject has no insurance whatsoever.
We wish to explore the demographic factors associated with each subjectâ€™s
insurance choice.
**One of the demographic factors in our data is the race of the participant,
coded as white or nonwhite.**
"
Retrieving the dataset and variables.
```
use patid age male nonwhite insure site using http://www.stata-press.com/data/r15/sysdsn1.dta, clear
```
***/
//OFF
use patid age male nonwhite insure site using sysdsn1, clear
//ON
/****/metadata, style(html) caption(Metadata describtion of data)
/***
Labelling data.
***/
/**/label variable nonwhite "Race"
/**/label define nonwhite 0 "White" 1 "Non white"
/**/label values nonwhite nonwhite
/***
# What are the estimates from -mlogit-
The estimates from are odds ratios (or log odds ratios) comparing the odds (or
log odds) of current value of the outcome with the reference value of the
outcome dependent on the exposure variable.
First the -mlogit- command itself.
Both crude and adjusted (*male* and *age*) estimates of the effect of *nonwhite*
on *insure* is estimated. With option **rrr** odds ratios are reported.
The estimates and their confidence intervals are gathered in a matrix *tbl*.
***/
/**/regmat, outcome(insure) exposure(i.nonwhite) adjustments("" "i.male c.age") ///
drop(se p) decimals(4) label: mlogit, nolog baseoutcome(2) rrr
/**/matrix tbl = r(regmat)
/**/matrix roweq tbl = Prepaid Uninsure
/***
For comparison more variables is needed.
First in order to compare the *insure* value Indemnity (1) with the reference (2)
value Prepaid a new variable *insure12*.
Recoding is necessary since logit requires outcome variable to be zero/one.
Note that the third value not used for comparison is set to missing.
***/
/**/generate insure12 = (insure == 1) if insure != 3 & !missing(insure)
/**/label variable insure12 "Prepaid"
/**/label define insure12 0 "Prepaid" 1 "Indemnity"
/**/label values insure12 insure12
/***
Then a logistic regression is done to get odds ratio estimates and their
confidence intervals.
Estimates and confidence intervals are added to the matrix *tbl*.
***/
/**/regmat, outcome(insure12) exposure(i.nonwhite) adjustments("" "i.male c.age") ///
drop(se p) decimals(4) label: logit, nolog or
/**/matrix tbl = tbl \ r(regmat)
/***
Now comparing the outcome value Uninsure (3) with the reference value Prepaid (2)
is similar to the above.
***/
/**/generate insure32 = (insure == 3) if insure != 1 & !missing(insure)
/**/label variable insure32 "Uninsure"
/**/label define insure32 0 "Prepaid" 1 "Uninsure"
/**/label values insure32 insure32
/**/regmat, outcome(insure32) exposure(i.nonwhite) adjustments("" "i.male c.age") ///
drop(se p) decimals(4) label: logit, nolog or
/**/matrix tbl = tbl \ r(regmat)
/***
Finally, below is the matrix *tbl* presented with the aggregated results from
the different regressions.
***/
/****/matprint tbl, style(html) decimals(4) title(Comparing multinomial logistic ///
regression with two logistic regression where the base compared to one of ///
the other two outcomes and where the one not in the comparison is set to ///
missing. Unjusted estimates (Adjustment 1) are the same whereas the adjusted ///
estimates (Adjustment 2) differ a bit (due to doing a combined multinomial ///
logistic regression versus two separate logistic regressions). ///
So regression estimates in multinomial regressions are odds ratios (option rrr) ///
or log odds ratios (no rrr option).)
/***
# Visualising estimates from -mlogit-
Let's look at the estimate of the effects of *nonwhite* on *insure* adjusted for
*age*, *gender* and *site*.
***/
/**/mlogit insure b0.nonwhite c.age i.male i.site, rrr base(1) nolog
/**/estimates store mlogit_1
/***
The command -coefplot- is an excellent tool to visualise estimates and their
confidence interval.
***/
coefplot ///
(,keep(*Pr*:*) drop(_cons) label(Prepaid vs Indemnity)) ///
(,keep(*Un*:*) drop(_cons) label(Uninsure vs Indemnity)) ///
, eform xline(1, lcolor(red%40)) legend(rows(2)) generate ///
name(coefplot1, replace)
*display `"`r(graph)'"'
/***
As seen in the returned text in the log the option **generate** saves a set of
variables with prefix __.
Some of these are used to generate a variable label.
***/
/**/replace __mlbl = string(__b, "%6.2f") + " (" + string(__ll1, "%6.2f") + "; " ///
+ string(__ul1, "%6.2f") + ")" if !missing(__b)
/***
The estimates and their confidence intervals are to be placed nicely above one
another to the right in the graph.
The x-value are to be the rounded maximum upper confidence interval
limit (**__ull**).
***/
/**/summarize __ul1
/**/replace __mlpos = round(r(max), 0.1) if !missing(__b)
/**/addplot: scatter __at __mlpos, ms(i) mlabel(__mlbl) mlabsize(vsmall) ///
xlabel(-1(1)4) xscale(range(0 8)) yticks(0.5(1)5.5) ///
legend(order(2 `"Prepaid vs Indemnity"' 4 `"Uninsure vs Indemnity"'))
//OFF
graph export coefplot1.png, width(1600) replace
//ON
/***
![A -coefplot- graph with odds ratio estimates and their 95% confidence intervals added to the right.](coefplot1.png)
# Margins and mlogit
Understanding margins is easiest when there is only one main effect in the
regression.
***/
mlogit insure b0.nonwhite, rrr base(1) nolog
/**/estimates store mlogit_main
/***
The LR test in -mlogit- the same as a classical LR chisquare test for
independence. Compare to the -tabulate- output below.
Then -margins- predicts the probabilities of the outcome (*insure*) given the
exposure (*nonwhite*), ie the expected probability $$P(insure|nonwhite)$$.
***/
margins nonwhite, cformat(%6.4f)
/***
compared to the column percentages from e.g. -tabulate-.
***/
tabulate insure nonwhite, col lrchi2
/***
The marginal effect (risk difference):
$$P(insure|nonwhite = Non white) - P(insure|nonwhite = White)$$ can found by
***/
margins r.nonwhite, cformat(%6.4f)
/***
Or by:
***/
margins, dydx(nonwhite) post
/***
Since the marginal effects are the risk differences between two sets of
column probabilities where each set sums to one, we have that marginal effects
(risk differences) per construction sums to zero.
Option **post** is needed for the coming -coefplot- command.
A coefplot can be made by:
***/
/**/coefplot (, keep(*:1._predict) label(Indemnity)) ///
(, keep(*:2._predict) label(Prepaid)) ///
(, keep(*:3._predict) label(Uninsure)) ///
, swapnames xline(0) legend(rows(1)) ///
mlabel(string(@b, "%5.2f") + " (" + string(@ll, "%5.2f") + "; " + string(@ul, "%5.2f") + ")") ///
mlabposition(12) mlabsize(vsmall) name(coefplot2, replace)
//OFF
graph export coefplot2.png, width(1600) replace
//ON
/***
![A coefplot visualising the marginal effect (risk difference) of the variable
*nonwhite* on the outcome *insure*. Note that the marginal effects by
constructions sums to zero.](coefplot2.png)
Similar results can be achieved by the spost13 commands -mchange-.
***/
/**/estimates restore mlogit_main
mchange, stats(ci)
/***
The results can be visualised by -mchangeplot-.
The slightly modified graph by -mchangeplot- below can be compared to the one
above.
***/
mchangeplot, name(mchangeplot, replace)
//OFF
gr_edit .plotregion1.textbox4.text = {}
gr_edit .plotregion1.textbox4.text.Arrpush `"Non white"'
gr_edit .plotregion1.textbox4.text.Arrpush vs White
gr_edit .plotregion1.textbox4.DragBy .15 -.03
graph export mchangeplot.png, width(1600) replace
//ON
/***
![A mchangeplot. Again note that the estimates by contruction sums to zero.](mchangeplot.png)
***/