# Returned variables from -psmatch2-

## Aim

Here, the convenience variables generated by the Stata command **psmatch2**
are explained using the example dataset *cattaneo2*.

At stephenporter.org an unknown international expert on matching is quoted for:

```
I can never figure out what psmatch2 is doing, and that’s actually one of
the reasons I don’t use it much and don’t tend to recommend it.
The documentation isn’t very clear and there are a number of things where
I think it should be doing something but then it’s not clear that it is.
```

This leads Stephen Porter to conclude:

```
Not exactly a compelling endorsement.
This is why I now recommend to my students that they avoid psmatch2:
how can you justify using a software procedure if you have no idea what it is doing?
Especially considering so many other procedures available for Stata, as well as matchit in R.
```

The main problem for Stephen Porter is the deriving of the weights, *_weight*.

It turns out that -psmatch2- quite elegant transforms trimming and caliper matching into simple weights.

From Stata version 13, there has been -teffects- in [TE] and

the standard errors for ATT are conservative
Note as written in the link above, that there now is an option **ai(#)** or
**altvariance** which makes -psmatch2- return the Abadie-Imbens standard errors.

The beauty of -psmatch2- is that the returned variables makes the user a lot freer in the following analysis. Hence, it is still interesting to use -psmatch2-.

### The -psmatch2- help file

From **psmatch2** version 4.0.12 30jan2016 E. Leuven, B. Sianesi, the help file
says:

```
Convenience variables in Stata’s -psmatch2- (Caliper matching)
-psmatch2- creates a number of variables for the convenience of the user:
*_treated* is a variable that equals 0 for control observations and 1 for
treatment observations.
*_support* is an indicator variable with equals 1 if the observation is
on the common support and 0 if the observation is off the support.
*_pscore* is the estimated propensity score or a copy of the one
provided by pscore().
*_outcome_variable* for every treatment observation stores the value of
the matched outcome.
*_weight*. For nearest neighbor matching, it holds the frequency with
which the observation is used as a match;
with option ties and k-nearest neighbors matching it holds the normalized
weight;
for kernel matching, and llr matching with a weight other than
stata's tricube, it stores the overall weight given to the matched
observation.
When estimating att only *_weight* = 1 for the treated.
*_id* In the case of one-to-one and nearest-neighbors matching, a new
identifier created for all observations.
*_nk* In the case of one-to-one and nearest-neighbors matching, for every
treatment observation, it stores the observation number of the k-th
matched control observation.
Do not forget to sort by _id if you want to use the observation
number (id) of for example the 1st nearest neighbor as in
sort _id
g x_of_match = x[_n1]
*_nn* In the case of nearest-neighbors matching, for every treatment
observation, it stores the number of matched control observations.
```

This quoted version of the -psmatch2- help comes after the Porter post.

It might be that the help for -psmatch2- has been modified after the post.

With respect to the points mentioned in the Porter post, I think that the -psmatch2- help is quite clear.

## What is returned from -psmatch2-

We now look into what is returned from -psmatch2- using the same data as Stephen Porter does. Propensities are added.

First we look into what is returned in the one-to-one case, then the one-to-many. In both cases, estimates and comparison tests are reported.

### The Stephen Porter example data

```
use http://www.stata-press.com/data/r13/cattaneo2, clear
```

Creating the propensities

```
logit mbsmoke prenatal1 fbaby mmarried medu fedu mage fage mrace frace
predict p
```

### The returned variables from -psmatch2-, the one to one case

Using the propensities in the **psmatch2** command

```
summarize p
display `=0.25*r(sd)'
```

.0295236

```
psmatch2 mbsmoke, outcome(bweight) pscore(p) neighbor(1) common ///
caliper(`=0.25*r(sd)') noreplacement logit
```

---------------------------------------------------------------------------------------- Variable Sample | Treated Controls Difference S.E. T-stat ----------------------------+----------------------------------------------------------- bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83 ATT | 3136.60303 3381.92899 -245.32596 27.3360463 -8.97 ----------------------------+----------------------------------------------------------- Note: S.E. does not take into account that the propensity score is estimated. psmatch2: | psmatch2: Common Treatment | support assignment | Off suppo On suppor | Total -----------+----------------------+---------- Untreated | 0 3,778 | 3,778 Treated | 5 859 | 864 -----------+----------------------+---------- Total | 5 4,637 | 4,642

```
label variable _treated "Treatment"
```

For the description of the convenience variables only the outcome, the exposure and the propensity score is needed.

```
keep bweight mbsmoke p _*
format p _pscore %6.3f
```

Five non-treated rows

```
preserve
keep if !_treated
```

(864 observations deleted)

```
list in 1/5, noobs
```

+----------------------------------------------------------------------------------------------------------------+ | bweight mbsmoke p _pscore _treated _support _weight _bweight _id _n1 _nn _pdif | |----------------------------------------------------------------------------------------------------------------| | 3459 nonsmoker 0.120 0.120 Untreated On support . . 1674 . 0 . | | 3260 nonsmoker 0.409 0.409 Untreated On support 1 . 3606 . 0 . | | 3572 nonsmoker 0.239 0.239 Untreated On support 1 . 2980 . 0 . | | 2948 nonsmoker 0.168 0.168 Untreated On support 1 . 2546 . 0 . | | 2410 nonsmoker 0.099 0.099 Untreated On support . . 983 . 0 . | +----------------------------------------------------------------------------------------------------------------+

```
restore
```

Five treated rows

```
preserve
keep if _treated
```

(3,778 observations deleted)

```
list in 1/5, noobs
```

+------------------------------------------------------------------------------------------------------------------+ | bweight mbsmoke p _pscore _treated _support _weight _bweight _id _n1 _nn _pdif | |------------------------------------------------------------------------------------------------------------------| | 3090 smoker 0.099 0.099 Treated On support 1 2693 3829 1015 1 9.847e-06 | | 3166 smoker 0.293 0.293 Treated On support 1 3657 4342 3271 1 .00026615 | | 3147 smoker 0.157 0.157 Treated On support 1 2595 3992 2153 1 .00022986 | | 2580 smoker 0.433 0.433 Treated On support 1 3459 4546 3651 1 .0004093 | | 2693 smoker 0.375 0.375 Treated On support 1 3345 4468 3536 1 .00032921 | +------------------------------------------------------------------------------------------------------------------+

```
restore
```

*_id*

The variable *_id* a new identifier created for all observations.
The values are row numbers.

*_nn*

The variable *_nn* counts the number of neighbors found.
Note variable *_nn* can be zero when no neighbors are found.

*_n#*

The variable *_n1* is the *_id* for the first neighbor etc.
Note variable *_n#* can be missing when no neighbors are found.

*_treated*

This is a copy of the treatment variable

```
compare _treated mbsmoke
```

---------- Difference ---------- Count Minimum Average Maximum ------------------------------------------------------------------------ _treated=mbsmoke 4642 ---------- Jointly defined 4642 0 0 0 ---------- Total 4642

*_support*

The non-treated are always having *_support* equal to one.

Depending on choices in eg options **caliper** and **trim** the treated can be
in common support (*_support* equal one) or not (*_support* equal zero).

Note that *_support* also is zero if no match is found or already taken
(*_n1* is missing and *_nn* is zero).

```
list _* if _treated & !_nn, noobs
```

+----------------------------------------------------------------------------------+ | _pscore _treated _support _weight _bweight _id _n1 _nn _pdif | |----------------------------------------------------------------------------------| | 0.577 Treated Off support . . 4627 . 0 . | | 0.577 Treated Off support . . 4625 . 0 . | | 0.577 Treated Off support . . 4626 . 0 . | | 0.658 Treated Off support . . 4635 . 0 . | | 0.648 Treated Off support . . 4634 . 0 . | +----------------------------------------------------------------------------------+

*_pscore*

This is the same as the variable containing the propensity score.

```
compare _pscore p
```

---------- Difference ---------- Count Minimum Average Maximum ------------------------------------------------------------------------ _pscore=p 4642 ---------- Jointly defined 4642 0 0 0 ---------- Total 4642

*_bweight*

When there is only one matched value it is given in the variable _outcome,
here *_bweight*.
To see how *_bweight* is the matched value of *bweight*, the rows must be sorted
*_id*. Then the matched value can be found using the variable *_n1*.

```
sort _id
generate matched_bweight = bweight[_n1]
compare matched_bweight _bweight
```

---------- Difference ---------- Count Minimum Average Maximum ------------------------------------------------------------------------ matched~t=_bweight 859 ---------- Jointly defined 859 0 0 0 Jointly missing 3783 ---------- Total 4642

```
drop matched_bweight
```

*_weight*

The treated always get value one if the variable *_support* is one, ie that the
row is in the common support.
When there is no replacement, the matched non-treated row also get the value 1.
If the matching is with replacement, the value of *_weight* is the number of
times the non-treated value is matched to a treated value.

The weights sums up to the same number (the number of treated on support) for non-treated and treated.

```
sumat _weight, statistics(sum) rowby(_treated) decimals(0)
```

------------------------- sum ------------------------- Treatment(Untreated) 859 Treatment(Treated) 859 -------------------------

*_pdif*

The variable *_pdif* is the absolute difference between the propensity scores
for the treated and non-treated.

```
sort _id
generate double tmpdif = abs(_pscore - _pscore[_n1])
compare tmpdif _pdif
```

---------- Difference ---------- Count Minimum Average Maximum ------------------------------------------------------------------------ tmpdif=_pdif 859 ---------- Jointly defined 859 0 0 0 Jointly missing 3783 ---------- Total 4642

```
drop tmpdif
```

### Reproducing the estimates and test

The ATT can be reproduced using the command -ttest- (compare with ATT estimates from -psmatch2- above)

```
ttest bweight == _bweight if _nn, unpaired
```

Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- bweight | 859 3136.603 19.15138 561.302 3099.014 3174.192 _bweight | 859 3381.929 19.506 571.6957 3343.644 3420.214 ---------+-------------------------------------------------------------------- Combined | 1,718 3259.266 13.98103 579.4963 3231.844 3286.688 ---------+-------------------------------------------------------------------- diff | -245.326 27.33605 -298.9414 -191.7105 ------------------------------------------------------------------------------ diff = mean(bweight) - mean(_bweight) t = -8.9744 H0: diff = 0 Degrees of freedom = 1716 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

or regression with frequency weights (compare with ATT estimates from -psmatch2- above).

```
regress bweight i._treated [aw=_weight], noheader
```

(sum of wgt is 1,718) ------------------------------------------------------------------------------ bweight | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- _treated | Treated | -245.326 27.336 -8.97 0.00 -298.941 -191.710 _cons | 3381.929 19.330 174.96 0.00 3344.017 3419.841 ------------------------------------------------------------------------------

### The variable *_weight* from -psmatch2-, the one to many case

The data is (again)

```
use http://www.stata-press.com/data/r13/cattaneo2, clear
```

The scores are found the same way.

```
logit mbsmoke prenatal1 fbaby mmarried medu fedu mage fage mrace frace
predict p
```

And -psmatch2- is called similar as above now requring 5 neighbors and trimming 5 percent in each end.

```
summarize p
display `=0.25*r(sd)'
```

.0295236

```
psmatch2 mbsmoke, outcome(bweight) pscore(p) neighbor(5) caliper(`=0.25*r(sd)') logit
```

---------------------------------------------------------------------------------------- Variable Sample | Treated Controls Difference S.E. T-stat ----------------------------+----------------------------------------------------------- bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83 ATT | 3137.65972 3372.793 -235.133275 24.657814 -9.54 ----------------------------+----------------------------------------------------------- Note: S.E. does not take into account that the propensity score is estimated. | psmatch2: psmatch2: | Common Treatment | support assignment | On suppor | Total -----------+-----------+---------- Untreated | 3,778 | 3,778 Treated | 864 | 864 -----------+-----------+---------- Total | 4,642 | 4,642

```
label variable _treated "Treatment"
keep bweight mbsmoke p _*
format p _pscore %6.3f
```

The sum of *_weight* is (again) the same for non-treated and treated, the number
of treated in common support.

```
sumat _weight, statistics(sum) rowby(_treated) decimals(0)
```

------------------------- sum ------------------------- Treatment(Untreated) 864 Treatment(Treated) 864 -------------------------

To understand how the weights are calculated we look at the last of the treated.

```
sort _treated _id
local treated = _id[_N]
```

Then all the matched id's for the last treated is found and saved in the local
*untreat*. A variable *slct* is generated to mark the rows where the matched
id's appear.

```
local untreat ""
generate slct = .
forvalues val = 1/5 {
local untreated`val' = _n`val'[_N]
if `untreated`val'' < . {
local untreat "`untreat', `untreated`val''"
replace slct = inlist(`untreated`val'', _n1, _n2, _n3, _n4, _n5)
}
}
```

We see that the last treated with id 4642 is matched with the non-treated id's (3777, 3776, 3775).

`list bweight _* if inlist(_id, 4642 , 3777, 3776, 3775), noobs`

+-----------------------------------------------------------------------------------------------------------------------------+ | bweight _pscore _treated _support _weight _bweight _id _n1 _n2 _n3 _n4 _n5 _nn _pdif | |-----------------------------------------------------------------------------------------------------------------------------| | 3204 0.708 Untreated On support .73333333 . 3775 . . . . . 0 . | | 3289 0.719 Untreated On support .53333333 . 3776 . . . . . 0 . | | 3740 0.720 Untreated On support .33333333 . 3777 . . . . . 0 . | | 2350 0.737 Treated On support 1 3411 4642 3777 3776 3775 . . 3 .01672489 | +-----------------------------------------------------------------------------------------------------------------------------+

The value of *_bweight* for the treated id is the mean of the non-treated values.

`summarize bweight if inlist(_id , 3777, 3776, 3775)`

Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- bweight | 3 3411 288.0746 3204 3740

The *_weight* values for the non-treated are:

```
mkmat _weight if inlist(_id `untreat'), rownames(_id )
matrix roweq _weight = "non-treated id's"
```

```
_weight
```

------------------ ------ --------- non-treated id's 3775 0.7333

```
3776 0.5333
3777 0.3333
```

To understand the weights for the non-treated one must look where they are matched.

```
list bweight _* if slct, noobs
```

+----------------------------------------------------------------------------------------------------------------------------+ | bweight _pscore _treated _support _weight _bweight _id _n1 _n2 _n3 _n4 _n5 _nn _pdif | |----------------------------------------------------------------------------------------------------------------------------| | 3280 0.704 Treated On support 1 3476.4 4640 3773 3774 3772 3775 3771 5 .00034734 | | 2722 0.706 Treated On support 1 3396.2 4641 3775 3774 3773 3772 3776 5 .00193231 | | 2350 0.737 Treated On support 1 3411 4642 3777 3776 3775 . . 3 .01672489 | +----------------------------------------------------------------------------------------------------------------------------+

Eg, the non-treated with id 3775 appear in the means for the treated with id's 4640 (n = 5), 4641 (n = 5) and 4642 (n = 3).

So the weight for the non-treated with id 3775 is:

$$\frac{1}{5} + \frac{1}{5} + \frac{1}{3} = \frac{11}{15} = 0.7333$$

The weight for the non-treated with id 3776 is:

$$\frac{1}{5} + \frac{1}{3} = \frac{8}{15} = 0.5333$$

Finally, the weight for the non-treated with id 3777 is:

$$\frac{1}{3} = 0.3333$$

Quite simple.

### Reproducing the estimates and test

Again, the ATT can be reproduced using the command -ttest- (compare with ATT estimates from -psmatch2- above)

```
ttest bweight == _bweight if _nn, unpaired
```

Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- bweight | 864 3137.66 19.08197 560.8931 3100.207 3175.112 _bweight | 864 3372.793 9.298175 273.3094 3354.543 3391.043 ---------+-------------------------------------------------------------------- Combined | 1,728 3255.226 10.98101 456.472 3233.689 3276.764 ---------+-------------------------------------------------------------------- diff | -235.1333 21.22681 -276.7663 -193.5003 ------------------------------------------------------------------------------ diff = mean(bweight) - mean(_bweight) t = -11.0772 H0: diff = 0 Degrees of freedom = 1726 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

or regression with frequency weights (compare with ATT estimates from -psmatch2- above).

```
regress bweight i._treated [aw=_weight], noheader
```

(sum of wgt is 1,728) ------------------------------------------------------------------------------ bweight | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- _treated | Treated | -235.133 21.242 -11.07 0.00 -276.785 -193.482 _cons | 3372.793 15.021 224.54 0.00 3343.341 3402.245 ------------------------------------------------------------------------------

Last update: 2022-04-22, Stata version 17