Understanding mlogit and margins

Aim

The objective is to understand what multinomial regression -mlogit- returns. And to show how to report these results using the commands -margins-, -coefplot-, -mchange-, and -mchangeplot-. The last two commands are from the package spost13.

Other sources on -mlogit- in Stata:

idre

Stata example dataset

"We have data on the type of health insurance available to 616 psychologically depressed subjects in the United States (Tarlov et al. 1989; Wells et al. 1989). The insurance is categorized as either an indemnity plan (that is, regular fee-for-service insurance, which may have a deductible or coinsurance rate) or a prepaid plan (a fixed up-front payment allowing subsequent unlimited use as provided, for instance, by an HMO). The third possibility is that the subject has no insurance whatsoever. We wish to explore the demographic factors associated with each subject’s insurance choice.

One of the demographic factors in our data is the race of the participant, coded as white or nonwhite.

Retrieving the dataset and variables.

use patid age male nonwhite insure site using http://www.stata-press.com/data/r15/sysdsn1.dta, clear

Metadata describtion of data
Name	Index	Label	Value Label Name	Format	Value Label Values	n	unique	missing
patid	1			%9.0g		644	644	0
age	2	NEMC (ISCNRD-IBIRTHD)/365.25		%10.0g		643	632	1
male	3	NEMC PATIENT MALE		%8.0g		644	2	0
nonwhite	4			%9.0g		644	2	0
insure	5		insure	%9.0g	1 "Indemnity" 2 "Prepaid" 3 "Uninsure"	616	3	28
site	6			%9.0g		644	3	0

Labelling data.

label variable nonwhite "Race"
label define nonwhite 0 "White" 1 "Non white"
label values nonwhite nonwhite

What are the estimates from -mlogit-

The estimates from are odds ratios (or log odds ratios) comparing the odds (or log odds) of current value of the outcome with the reference value of the outcome dependent on the exposure variable.

First the -mlogit- command itself.

Both crude and adjusted (male and age) estimates of the effect of nonwhite on insure is estimated. With option rrr odds ratios are reported.

The estimates and their confidence intervals are gathered in a matrix tbl.

regmat, outcome(insure) exposure(i.nonwhite) adjustments("" "i.male c.age") ///
        drop(se p) decimals(4) label: mlogit, nolog baseoutcome(2) rrr
matrix tbl = r(regmat)
matrix roweq tbl = Prepaid Uninsure

For comparison more variables is needed.

First in order to compare the insure value Indemnity (1) with the reference (2) value Prepaid a new variable insure12.

Recoding is necessary since logit requires outcome variable to be zero/one. Note that the third value not used for comparison is set to missing.

generate insure12 = (insure == 1) if insure != 3 & !missing(insure)
label variable insure12 "Prepaid"
label define insure12 0 "Prepaid" 1 "Indemnity"
label values insure12 insure12

Then a logistic regression is done to get odds ratio estimates and their confidence intervals.

Estimates and confidence intervals are added to the matrix tbl.

regmat, outcome(insure12) exposure(i.nonwhite) adjustments("" "i.male c.age") ///
        drop(se p) decimals(4) label: logit, nolog or
matrix tbl = tbl \ r(regmat)

Now comparing the outcome value Uninsure (3) with the reference value Prepaid (2) is similar to the above.

generate insure32 = (insure == 3) if insure != 1 & !missing(insure)
label variable insure32 "Uninsure"
label define insure32 0 "Prepaid" 1 "Uninsure"
label values insure32 insure32
regmat, outcome(insure32) exposure(i.nonwhite) adjustments("" "i.male c.age") ///
        drop(se p) decimals(4) label: logit, nolog or
matrix tbl = tbl \ r(regmat)

Finally, below is the matrix tbl presented with the aggregated results from the different regressions.

Comparing multinomial logistic regression with two logistic regression where the base compared to one of the other two outcomes and where the one not in the comparison is set to missing. Unjusted estimates (Adjustment 1) are the same whereas the adjusted estimates (Adjustment 2) differ a bit (due to doing a combined multinomial logistic regression versus two separate logistic regressions). So regression estimates in multinomial regressions are odds ratios (option rrr) or log odds ratios (no rrr option).
		Adjustment 1			Adjustment 2
		b	Lower 95% CI	Upper 95% CI	b	Lower 95% CI	Upper 95% CI
Prepaid	Race (Non white)	-0.6608	-1.0836	-0.2380	-0.7313	-1.1605	-0.3021
Uninsure	Race (Non white)	-0.2829	-1.0624	0.4967	-0.2980	-1.0827	0.4868
Prepaid	Race (Non white)	-0.6608	-1.0836	-0.2380	-0.7158	-1.1443	-0.2873
Uninsure	Race (Non white)	-0.2829	-1.0624	0.4967	-0.2923	-1.0764	0.4919

Visualising estimates from -mlogit-

Let's look at the estimate of the effects of nonwhite on insure adjusted for age, gender and site.

mlogit insure b0.nonwhite c.age i.male i.site, rrr base(1) nolog
estimates store mlogit_1

The command -coefplot- is an excellent tool to visualise estimates and their confidence interval.

coefplot ///
        (,keep(*Pr*:*) drop(_cons) label(Prepaid vs Indemnity)) ///
        (,keep(*Un*:*) drop(_cons) label(Uninsure vs Indemnity)) ///
        , eform xline(1, lcolor(red%40)) legend(rows(2) ring(1) position(6)) generate ///
        name(coefplot1, replace)


Generated variables:
Variable      Storage   Display    Value
    name         type    format    label      Variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
__by            byte    %10.0g     __by       subgraph ID
__plot          byte    %10.0g     __plot     plot ID
__at            float   %10.0g                plot position (category axis)
__mlbl          str1    %9s                   marker label
__mlpos         byte    %10.0g                marker label position
__b             double  %10.0g                coefficient
__V             double  %10.0g                variance
__se            double  %10.0g                standard error
__t             double  %10.0g                t or z statistic
__df            byte    %10.0g                degrees of freedom
__pval          double  %10.0g                p-value
__ll1           double  %10.0g                CI1: lower limit
__ul1           double  %10.0g                CI1: upper limit

As seen in the returned text in the log the option generate saves a set of variables with prefix __. Some of these are used to generate a variable label.

replace __mlbl = string(__b, "%6.2f") + " (" + string(__ll1, "%6.2f") + "; " ///
        + string(__ul1, "%6.2f") + ")" if !missing(__b)

The estimates and their confidence intervals are to be placed nicely above one another to the right in the graph. The x-value are to be the rounded maximum upper confidence interval limit (__ull).

summarize __ul1
replace __mlpos = round(r(max), 0.1) if !missing(__b)
addplot: scatter __at __mlpos, ms(i) mlabel(__mlbl) mlabsize(vsmall) ///
        xlabel(-1(1)4) xscale(range(-1 8)) yticks(0.5(1)5.5) ///
        legend(order(2 `"Prepaid vs Indemnity"' 4 `"Uninsure vs Indemnity"'))

A -coefplot- graph with odds ratio estimates and their 95% confidence intervals added to the right.

Margins and mlogit

Understanding margins is easiest when there is only one main effect in the regression.

mlogit insure b0.nonwhite, rrr base(1) nolog


Multinomial logistic regression                         Number of obs =    616
                                                        LR chi2(2)    =   9.62
                                                        Prob > chi2   = 0.0081
Log likelihood = -551.78348                             Pseudo R2     = 0.0086

------------------------------------------------------------------------------
      insure |        RRR   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Indemnity    |  (base outcome)
-------------+----------------------------------------------------------------
Prepaid      |
    nonwhite |
  Non white  |      1.936      0.418     3.06    0.00        1.269       2.955
       _cons |      0.829      0.078    -2.00    0.05        0.690       0.996
-------------+----------------------------------------------------------------
Uninsure     |
    nonwhite |
  Non white  |      1.459      0.595     0.93    0.35        0.656       3.244
       _cons |      0.143      0.026   -10.90    0.00        0.101       0.203
------------------------------------------------------------------------------
Note: _cons estimates baseline relative risk for each outcome.

estimates store mlogit_main

The LR test in -mlogit- the same as a classical LR chisquare test for independence. Compare to the -tabulate- output below.

Then -margins- predicts the probabilities of the outcome (insure) given the exposure (nonwhite), ie the expected probability $$P(insure|nonwhite)$$.

margins nonwhite, cformat(%6.4f)


Adjusted predictions                                       Number of obs = 616
Model VCE: OIM

1._predict: Pr(insure==Indemnity), predict(pr outcome(1))
2._predict: Pr(insure==Prepaid), predict(pr outcome(2))
3._predict: Pr(insure==Uninsure), predict(pr outcome(3))

-----------------------------------------------------------------------------------
                  |            Delta-method
                  |     Margin   std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
_predict#nonwhite |
         1#White  |     0.5071     0.0225    22.57    0.00       0.4630      0.5511
     1#Non white  |     0.3554     0.0435     8.17    0.00       0.2701      0.4407
         2#White  |     0.4202     0.0222    18.94    0.00       0.3767      0.4637
     2#Non white  |     0.5702     0.0450    12.67    0.00       0.4820      0.6585
         3#White  |     0.0727     0.0117     6.23    0.00       0.0499      0.0956
     3#Non white  |     0.0744     0.0239     3.12    0.00       0.0276      0.1211
-----------------------------------------------------------------------------------

compared to the column percentages from e.g. -tabulate-.

tabulate insure nonwhite, col lrchi2


+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |         Race
    insure |     White  Non white |     Total
-----------+----------------------+----------
 Indemnity |       251         43 |       294 
           |     50.71      35.54 |     47.73 
-----------+----------------------+----------
   Prepaid |       208         69 |       277 
           |     42.02      57.02 |     44.97 
-----------+----------------------+----------
  Uninsure |        36          9 |        45 
           |      7.27       7.44 |      7.31 
-----------+----------------------+----------
     Total |       495        121 |       616 
           |    100.00     100.00 |    100.00 

 Likelihood-ratio chi2(2) =   9.6231   Pr = 0.008

The marginal effect (risk difference): $$P(insure|nonwhite = Non white) - P(insure|nonwhite = White)$$ can found by

margins r.nonwhite, cformat(%6.4f)


Contrasts of adjusted predictions                          Number of obs = 616
Model VCE: OIM

1._predict: Pr(insure==Indemnity), predict(pr outcome(1))
2._predict: Pr(insure==Prepaid), predict(pr outcome(2))
3._predict: Pr(insure==Uninsure), predict(pr outcome(3))

-----------------------------------------------------------
                        |         df        chi2     P>chi2
------------------------+----------------------------------
      nonwhite@_predict |
(Non white vs White) 1  |          1        9.60     0.0020
(Non white vs White) 2  |          1        8.94     0.0028
(Non white vs White) 3  |          1        0.00     0.9504
                 Joint  |          2       10.02     0.0067
-----------------------------------------------------------

-------------------------------------------------------------------------
                        |            Delta-method
                        |   Contrast   std. err.     [95% conf. interval]
------------------------+------------------------------------------------
      nonwhite@_predict |
(Non white vs White) 1  |    -0.1517     0.0490       -0.2477     -0.0557
(Non white vs White) 2  |     0.1500     0.0502        0.0517      0.2484
(Non white vs White) 3  |     0.0017     0.0266       -0.0504      0.0537
-------------------------------------------------------------------------

Or by:

margins, dydx(nonwhite) post


Conditional marginal effects                               Number of obs = 616
Model VCE: OIM

dy/dx wrt: 1.nonwhite

1._predict: Pr(insure==Indemnity), predict(pr outcome(1))
2._predict: Pr(insure==Prepaid), predict(pr outcome(2))
3._predict: Pr(insure==Uninsure), predict(pr outcome(3))

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.nonwhite   |  (base outcome)
-------------+----------------------------------------------------------------
1.nonwhite   |
    _predict |
          1  |     -0.152      0.049    -3.10    0.00       -0.248      -0.056
          2  |      0.150      0.050     2.99    0.00        0.052       0.248
          3  |      0.002      0.027     0.06    0.95       -0.050       0.054
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Since the marginal effects are the risk differences between two sets of column probabilities where each set sums to one, we have that marginal effects (risk differences) per construction sums to zero.

Option post is needed for the coming -coefplot- command.

A coefplot can be made by:

coefplot (, keep(*:1._predict) label(Indemnity)) ///
         (, keep(*:2._predict) label(Prepaid))   ///
         (, keep(*:3._predict) label(Uninsure))  ///
    , swapnames xline(0) legend(cols(1)) ///
        mlabel(string(@b, "%5.2f") + " (" + string(@ll, "%5.2f") + "; " + string(@ul, "%5.2f") + ")") ///
        mlabposition(12) mlabsize(vsmall) name(coefplot2, replace)

A coefplot visualising the marginal effect (risk difference) of the variable
*nonwhite* on the outcome *insure*. Note that the marginal effects by
constructions sums to zero.

Similar results can be achieved by the spost13 commands -mchange-.

estimates restore mlogit_main
mchange, stats(ci)


mlogit: Changes in Pr(y) | Number of obs = 616

Expression: Pr(insure), predict(outcome())

                    | Indemnity    Prepaid   Uninsure 
--------------------+---------------------------------
nonwhite            |                                 
 Non white vs White |    -0.152      0.150      0.002 
                 LL |    -0.248      0.052     -0.050 
                 UL |    -0.056      0.248      0.054 

Average predictions

             | Indemnity    Prepaid   Uninsure 
-------------+---------------------------------
  Pr(y|base) |     0.477      0.450      0.073

The do file for this document

Last update: 2022-04-22, Stata version 17

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search