What is Propensity score sub grouping (-pssg-)

Given estimates from a logit, logistic or probit regression propensity scores are calculated and grouped/sub classified into blocks where the propensity scores varies little between the two groups of regression outcome. Only blocks amendable for splitting are split.

The pssg is in many ways similar to pscore. The arguments are different. The main difference is that the regression is separated from the pssg command by estimates.

The reporting is also more compressed to give a better overview.

Finally, the algorithm is slightly different. The algorithm used here is described in section 13.5 at Imbens and Rubin, 2015

It works with atts if installed.

Syntax

The syntax is:

pssg modelname|.  [using/] [, options]
Modelname is the name under which estimation results were stored using estimates.
If a period is used as argument the last regression is used.

Options

Stored results

pssg stores the following in r():

Versions

pssg is tested in version 12.1 ic, 13.1 ic, 14.2 ic, and 15.1 ic.

Installation

To install use the command: ssc install matrixtools

A demonstration of -pssg-

Example data

Data are from Becker and Ichino, 2002 and is described at nber.org.

To get the data from Becker and Ichino, 2002:

use https://users.nber.org/~rdehejia/data/nsw_dw.dta if treat, clear
append using "https://users.nber.org/~rdehejia/data/psid_controls.dta"

To replicate the example data the following is needed.

foreach var of varlist age education re74 re75 {
generate `var'2 = `var'^2
}
generate blackU74 = black *(re74 == 0)

Before propensity score sub classification is made a regression is needed.

logit treat age age2 education education2 married black hispanic re74 re75 re742 re752 blackU74, nolog

Logistic regression                             Number of obs     =      2,675
                                                LR chi2(11)       =     935.35
                                                Prob > chi2       =     0.0000
Log likelihood = -204.97536                     Pseudo R2         =     0.6953

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |      0.332      0.120     2.76    0.01        0.096       0.568
        age2 |     -0.006      0.002    -3.43    0.00       -0.010      -0.003
   education |      0.849      0.348     2.44    0.01        0.168       1.531
  education2 |     -0.051      0.017    -2.93    0.00       -0.084      -0.017
     married |     -1.886      0.299    -6.30    0.00       -2.472      -1.299
       black |      1.136      0.352     3.23    0.00        0.446       1.825
    hispanic |      1.969      0.567     3.47    0.00        0.858       3.080
        re74 |     -0.000      0.000    -3.00    0.00       -0.000      -0.000
        re75 |     -0.000      0.000    -5.24    0.00       -0.000      -0.000
       re742 |      0.000      0.000     3.72    0.00        0.000       0.000
       re752 |      0.000      0.000     0.20    0.84       -0.000       0.000
    blackU74 |      2.144      0.427     5.02    0.00        1.308       2.981
       _cons |     -7.475      2.444    -3.06    0.00      -12.264      -2.686
------------------------------------------------------------------------------
Note: 22 failures and 0 successes completely determined.

The usual period notation after a regression is allowed.

estimates store ps1
pssg ., g(g0)

--------------------------------------------------
   Group lb  Group ub      t  status   N0  N1    N
--------------------------------------------------
1     0.001     0.010  2.193       0  663   4  667
2     0.010     0.074  1.980       0  330   4  334
3     0.074     0.160  1.493       0   77   7   84
4     0.160     0.280  2.146       0   31  11   42
5     0.280     0.399  1.620       0   22  19   41
6     0.399     0.905  2.785       0   27  57   84
7     0.905     0.974  1.773       0    7  76   83
--------------------------------------------------

The summary table has 7 columns: lower and upper bounds of group interval (at least closed to the right), the t-value for the homogeneity test, the status of the interval (0 = No split needed, 1 = Not eligible for further splits, 2 = T test failed), and N for control(0), exposed(1) and in total.

Regressions stored in -estimates- can be used as an argument.

Below is the acceptance limit for the t-test set to 1. This is a to low limit and leads to a non-acceptable grouping.

pssg ps1, groupname(g1) tvalue(1)

1 group found, 2 required
1 group found, 2 required
1 group found, 2 required
1 group found, 2 required
1 group found, 2 required
---------------------------------------------------
    Group lb  Group ub      t  status   N0  N1    N
---------------------------------------------------
1      0.001     0.002              2  335   0  335
2      0.002     0.010  0.851       0  328   4  332
3      0.010     0.028              2  167   0  167
4      0.028     0.074  0.446       0  163   4  167
5      0.074     0.103  0.356       0   40   2   42
6      0.103     0.122              2   21   0   21
7      0.122     0.136  0.475       0    6   5   11
8      0.136     0.160              2   10   0   10
9      0.160     0.224  0.301       0   19   2   21
10     0.224     0.280  0.047       0   12   9   21
11     0.280     0.316  0.362       0    8   2   10
12     0.316     0.355  1.035       1    4   6   10
13     0.355     0.377  0.150       0    7   4   11
14     0.377     0.399  0.030       0    3   7   10
15     0.399     0.525  0.178       0   10  11   21
16     0.525     0.637  0.987       0    5  16   21
17     0.637     0.776  0.518       0   12   9   21
18     0.776     0.905              2    0  21   21
19     0.905     0.957  1.004       1    2  39   41
20     0.957     0.964  2.493       1    2  19   21
21     0.964     0.974  0.136       0    3  18   21
---------------------------------------------------

Saved regressions can also be used as base by the using modifier.

estimates save ps1, replace
pssg using ps1.ster, groupname(g2) scorename(pscr2) 

--------------------------------------------------
   Group lb  Group ub      t  status   N0  N1    N
--------------------------------------------------
1     0.001     0.010  2.193       0  663   4  667
2     0.010     0.074  1.980       0  330   4  334
3     0.074     0.160  1.493       0   77   7   84
4     0.160     0.280  2.146       0   31  11   42
5     0.280     0.399  1.620       0   22  19   41
6     0.399     0.905  2.785       0   27  57   84
7     0.905     0.974  1.773       0    7  76   83
--------------------------------------------------

Works with atts command

Note that generated group variable has missing values outside the common support, so option comsup in -atts- is redundant.

set seed 1221
atts re78 treat, pscore(pscr2) blockid(g2)


ATT estimation with the Stratification method
Analytical standard errors

---------------------------------------------------------
n. treat.   n. contr.         ATT   Std. Err.           t
---------------------------------------------------------

      178        1157    1854.474     798.703       2.322

---------------------------------------------------------

The do file for this document

Last update: 2019-07-24, Stata version 15.1