/*
cd "C:/Users/au24026/Documents/STATA/StataHacks/docs/My commands/basetable"
*/
/***
# What is **-basetable-**
**-basetable-** is a simple yet highly efficient tool for interactively building
the first table required in most medical/epidemiological papers.
The typical layout of these tables is grouping (categorical) variable as column
header and then a set of rows of different variables being compared by each
group in the header and in a total.
The tables can off course also be used for survey like analysis.
When the table building is over, the result can be inserted into one or more
sheets in a excel workbook.
If the labelling of variables and values have been done carefully the table
outputs in the excel workbooks will be almost publication ready.
It is tested in version 12.1 ic, 13.1 ic, 14.2 ic, and 15.1 ic although export
to Excel only works from Stata version 13 and up.
## Syntax
The syntax is: `basetable column_variable [summary_variables] [if] [in] [using] [, options]`
### column_variable
The first argument must be a categorical variable with a value label.
### summary_variables
The rest of the arguments are variable names followed by with suboptions inside
braces.
#### Suboptions
Suboptions are:
* Headers (in square brackets)
* What follows each variable (except the the first) in soft brackets. Variable
suboptions depend on whether they should be presented as catagorical or
continuous variables
##### Header suboptions
Header suboptions are a header text (could be blank) possibly ending with a
hashtag (#) for a subcount separated by a comma (,) optionally a separator text
for the rest of the columns and optionally a local if condition (The subcount
from the hashtag matches the condition)
##### Categorical variable suboptions
Categorical variables has suboptions:
* 0, r, or R in order to report row percentages in the report table,
* 1, c, or C in order to report column percentages in the report table
* A label value to show only the row for the selected value in the report table.
There can be a second suboption separated by a comma: 0, r, R, 1, c, C
as above or a ci for a Wald confidence interval
##### continuous variable suboptions
continuous variables has a numeric format(eg %6.2f) as first argument and
eventually a local report specification (sd, iqr, iqi, ci or pi) separated by a
comma.
### Options
The options for basetable are:
* **l**og: Show the underlying Stata output
* **n**thousands: Add thousands separator to n values
* **pc**tformat: Alter the format used for the percentages for the categorical
summary variables. The argument must be a numeric format
* **pv**format: Alter the format used for the P value. The argument must be a
numeric format
* **c**ontinousreport: Specify overall default continuous report. The values
must be one of sd, iqr, iqi, ci or pi
* **cap**tion(string): Caption for the basetable output
* **to**p(string): String containing text prior to table content. Default is
dependent of the value of the style option
* **u**ndertop(string): String containing text between header and table content.
Default is dependent of the value of the style option.
* **b**ottom(string): String containing text after to table content.
Default is dependent of the value of the style option
* **m**issing: Show missing report to the right of the table
* **sm**all: Specify the limit for being small wrt. hidesmall. Default is 5
* **h**idesmall: Hide data when count values are less than small (default 5).
Note that the number less than "small" sometimes can be deduced from
surrounding values
* **st**yle(string): The output can be shown in the formats: smcl, csv, html,
latex, tex, or md. The default is smcl
* **r**eplace: The styled output can be saved into a file specified by using.
If an existing file should be replaced in the process, replace should be set
* **t**oxl(string): The argument is a string containing 4 values separated by a
comma. Thevalues needed are: 1) path and filename on the excel book to use,
2) the sheet name to use. 3) (Optional) the row number in the sheet to place
the table in. Must always appear with the column number. 4) (Optional) the
column number in the sheet to place the table in. Must always appear with the
row number 5) (Optional) replace - replace/overwrite the content in the sheet
**This option is only for Stata version 13 and up.**
***/
/***/do "basetable_demo_data.do"
/***
# Building reports
**The overall example here is to compare the variables age, race, smoke and ftv
for different values of the variable low.**
In principle there are two sorts of variables: continuous like height, weight,
blood preasure etc. And Categorical being grouping variables like sex/gender,
age groups, race etc. There are different types of reporting for the two groups.
## continuous variables
continuous variables can have up to two arguments separated by comma, **no space
allowed in the argument string**.
First part, the format, is necessary. It is just a standard Stata format that is
specified, typically a fixed one like %6.2f saying that the length including dot
is 6 and there are 2 decimals shown.
Second part is optional, specifying what is to be reported. It is one of the
strings:
* sd - Mean and standard deviation, the default. P-value is from oneway anova
* iqr - Interquartile range. P-value is from the Kruskal-Wallis test
* iqi - Interquartile interval. P-value is from the Kruskal-Wallis test
* ci - Mean and 95% Confindence interval. P-value is from oneway anova
* pi - Mean and 95% Prediction interval. P-value is from oneway anova
The second part if used overrules the overall type of reporting specified by the
option continuousreport demonstrated below.
Here is all the ways that the continuous variable age can be reported:
***/
basetable low age(%6.1f) age(%6.2f,sd) age(%6.0f,iqr) age(%6.0f,iqi) age(%6.1f,ci) age(%6.1f,pi)
/***
Default reporting can be changed by the option continuousreport with one of the
arguments sd (default), iqr, iqi, ci or pi.
***/
basetable low race(%6.0f) age(%6.0f), continuousreport(iqr)
/***
## Categorical variables
Categorical variables are grouping variables like sex, race, smoking etc.
The typical reporting way for categorical variables is counting numbers and
calculating percentages, either raltive to row totals or column totals.
Sometimes only the report of one of the values is needed in the table.
All versions are shown for the variable race below, row percentages
(argument: r), column percentages (argument: c), row percentages for the
value white of race (argument: white), and a 95% confidence Interval
(arguments: white, ci):
***/
basetable low race(r) race(c) race(white) race(white,r) race(white,ci)
/***
In the next 2 subsections a comparison with the **-basetable-** report is
necessary:
***/
basetable low ftv(c)
/***
### Setting the overall percentage format
The overall percentages format can be changed by the option pctformat with a
number format as an argument. Below is the overall percentage format set to be
of length 6 and with 3 decimals:
***/
basetable low ftv(c), pctformat(%6.3f)
/***
### Hide small count values
When working eg at places like Denmark Statistics it is important to anomyze
individuals by not reporting small counts, eg counts less than 5.
This is handled in **-basetable-** by optional options hidesmall (hide small
values) and small (specifying the limit for what is small, default is 5).
Two examples:
***/
basetable low ftv(c) ftv(r), hidesmall
basetable low ftv(c) ftv(r), hidesmall small(8)
/***
## Using if and in
It is now possible to use [if] and [in] in connection with basetable:
***/
basetable low [continuous,***] age(%6.0f,iqr) age(%6.0f,iqi) [***, ***] [Categorical] ftv(c) race(c) if smoke == 1 in 100/180
/***
## Separating large tables into blocks
Sometimes the variables can be grouped, eg into continuous and Categorical
variables.
It is possible to add headers in the variable list like [headedtext, separator]
- headertext in the label column and separator shown in the rest of the columns.
In older versions it was ordinary brackets instead of squared ones that were
used for headers.
**Note that commas are not allowed in headedtext.**
Below is added [continuous, ***] and [Categorical], default separator is blank:
***/
basetable low [continuous, ***] age(%6.0f,iqr) age(%6.0f,iqi) [***,***] [Categorical] ftv(c) race(c) smoke(Yes)
/***
Actually it is also possible to add an local if statement to the header
like eg [Race White, if race == 1] limiting the calculations following to that
condition as well as the global.
A header with no conditions resets the local condition back to the global
condition.
It is demonstrated below:
***/
basetable low [Race All] age(%6.0f,iqi) [Race White #, if race == 1] ///
age(%6.0f,iqi) [Race Black #, if race == 2] age(%6.0f,iqi) ///
[Race Other #, if race == 3] age(%6.0f,iqi) [Race All] age(%6.0f,iqi)
/***
Note that adding a # to the header text adds a count a percentage for the
specified condition.
The tables with local if statements leaves "empty" when needed:
***/
basetable low age(%6.2f, iqi) race(c) [Only low birth of non-blacks #, if low == 1 & race != 2] age(%6.2f, iqi) race(c)
/***
## Adding missing values report
Adding the option missing (or m) adds an extra column with a missing report:
***/
basetable low [continuous,***] age(%6.0f,iqr) age(%6.0f,iqi) [Categorical] ftv(c) race(c) smoke(Yes), missing
/***
## Sending the output for excel workbooks
The table can at any time be send to an Excel workbook with the option and toxl
the arguments: path_and_name_of_the_excelfile, sheetname, rownumber, columnnumber,
replace. The last 3 arguments are optional, but rownumber and columnnumber must
be there both or not at all.
If the workbook exists the sheet is added if the sheet doesn't already exists.
To overwrite a existing worksheet use the argument replace.
Below is shown how to add 2 tables in 2 sheets of a workbook and then overwrite
the first table:
***/
capture rm "tables.xls" // To remove outfile if it exists
basetable low age(%6.2f) race(c) smoke(Yes), toxl(tables.xls, Tabel 1)
/***
Column width can set by adding a set of comma separated numbers in soft
brackets at the end in option toxl ( Default is (70, 20) ):
***/
basetable low age(%6.2f,iqi) race(r) smoke(Yes), toxl(tables.xls, Tabel 2, (40,30, 20))
/***
The last column width is used for columns with no column width set.
Setting replace in option toxl overwrites sheets with existing name.
***/
basetable low age(%6.2f,ci) race(r) smoke(Yes), toxl(tables.xls, Tabel 1, replace)
/***
[You can see the resulting Excel book here](tables.xls)
**Just answer Yes if Excel promts you at opening an Excel book with extension
xls instead of xlsx.**
It is a Stata/Excel bug.
## Handling errors
Errors should be presented where they are seen. In this case it means that
errors are presented as lines int the table.
For example if there are to few observations:
***/
basetable low age(%6.1f) if ftv == 6
/***
If a non existing variable is used for columns:
***/
capture noisily basetable slow
/***
If a string variable is used for columns:
***/
generate str_low = string(low)
capture noisily basetable str_low
/***
If a variable with no value labels is used for columns:
***/
generate low_no_lbl = low
capture noisily basetable low_no_lbl
basetable low low_no_lbl(r)
*label define low_no_lbl 0 "N" 1 "1"
*label values low_no_lbl low_no_lbl
*basetable low low_no_lbl(r)
generate age_no_lbl = age
basetable low age_no_lbl(%6.2f)
/***
If wrong arguments are set for the summary variables:
***/
basetable low age(%6.2d,iqi) age(%6.2f,iqir) race(elf)
/***
If values cannot be calculated a dot (missing) is set in its place (sd is
missing for age when birthweight is low):
***/
basetable low age(%6.1f) if ftv == 4
/***
# Howto's
## Howto remove the total column and/or the P value column in a basetable output
Assume that the column Total has to be ignored in styled output.
***/
basetable low age(%6.0f, iqi) race(c) smoke(Yes, ci), missing nototal nopvalue
/***
## Reporting a set of variables without any column variable
By generating a dummy variable total and using the options *nototal* and
*nopvalue* it is possible to generate a summary table across variables of
different types:
***/
/**/generate total = 1
/**/label variable total "Total"
/**/label define total 1 "Column header"
/**/label values total total
basetable total age(%6.0f, ci) age(%6.0f, iqi) race(c) smoke(Yes) smoke(Yes, ci), missing nototal nopvalue