Welcome to StataHacks

A hack is about writing computer programs for fun. It is also about getting things done, sometimes elegant sometimes not.

This homepage has two purposes:

  1. To be a working proof of the literate programming tools for Stata developed by me
  2. To be a platform for demonstrating the use of different commands and concepts in Stata

Most parts of this homepage are build using logs from Stata code files (do files) containing Stata code and comments written in markdown.

The idea is to make it simple to integrate Stata code with commentary text.

The text could written in markdown, latex, html, or something else and then send to further processing. For flexibility reasons I prefer markdown.

It will be possible to download these do files for home studies. The links are at the end of each chapter.

The tools used to build this content are:

Use the menu at the top to see my Stata Hacks.

Site plan

This site has the following plan:

  1. Home: This page!
  2. Links: Links to interesting homepages on Stata
  3. Stata Hacks: Things that are good to know about Stata and the use of Stata
  4. My commands: Over time I produced a set of Stata commands. Extensive documentation of them can be found here

Reproducible research and literate programming

Everyone who is doing reproducible research has to keep track of the commands that leads to a graph or a table and before that a dataset needed for the work.

So everyone doing reproducible research are in fact a programmer using one or more programming tools to produce whatever is needed to understand the research question at hand.

Being a programmer the reproducible researcher can learn from the experiences in the programming world.

One experience is that code has to be documented. But if the documentation is separated from the code the information in the two documents is quite often out of sync after a while.

To remedy this the idea emerged that code and documentation could be combined into one document, using it both for running the code again and being able to at any time to extract the documentation into several formats.

This is known as literate programming formulated by Knuth in 1984. And it is implemented eg in Python where I first meet the concept.

In the end a product like an article, a homepage, or slide presentation is the ultimate documentation of the research process. So this document following Knuth should contain all code, of which some of the results are presented, all comments (some of them hidden in the product) and text.

The goal of the reproducible research process is to produce some refined information combining elements like text, tables, formules and graphs into a presentation like an article, a slideshow, a homepage, a book or similar. What is noteworthy is that this process is interactive.

Also what is noteworthy is that the product of the research is an extraction of far more analyses. So some analyses are presented, and some should be kept to document choices done in the presentation.

And the information in the part not shown in the extraction is as important as the information shown since that information keeps track of interesting questions usually with a not that important answer.

But after a while it is forgotten what was done and what wasn't.

So a reproducible researcher has to:

The second part is code and text important to the matter at hand. It should easy to reinclude parts thereof if necessary. And of course it should be easy to take out other text and code parts just as easily.

To see more on literate programming.

About me

Since January 2014 I have been working as a statistician at the Department of Public Health at the University of Aarhus.

There Stata is used as primary statistical tool.

When I see a need for new commands in Stata I like to build them.

My Stata commands is mainly done in Mata which is a programming language incorporated into Stata.

To see more about me try: