A note on Gregorian and ISO8601 date and week calculations in Mata

Using data from eg Statistics Denmark (DST) sometimes time is measured in weeks. This will be week numbers using the ISO8601 standard and not the Gregorian as is standard in Stata and Mata.

The Gregorian calender

The base for all calculations are that a year is approximtely 365.2425 days. In fractions this is $$365.2425 = 365 + \frac{1}{4} - \frac{1}{100} + \frac{1}{400}$$. To handle that a year is not whole number of days some year are 365 days long and some are 366 days long (leap years).

Day calculations in Stata and Mata

In Stata and Mata dates are integer number of days since 1960-01-01 (is base zero). Stata facilitates Gregorian calendar calculations.

To convert a date into a number of days Mata has functions

Examples are

mata: mdy(3,12,1960) // = 71

  71
mata: doy(mdy(3,12,1960)) // = 72

  72
mata: dow(mdy(3,12,1960)) // = 6

  6

These functions are scalar functions. It would be far better if these functions were vector functions. Also, it would e.g. to have years as a vector argument and locking the month and day to a specific date each year.

Hence, vector functions in Mata are introduced having either vector or scalar arguments.

The first converts year(s), month(s), and day(s) into Stata dates.

real colvector ymd2td(real colvector y, m, d)
{
        real scalar r, R
        real colvector td

        R = max((rows(y), rows(m), rows(d)))
        y =  rows(y) == 1  ? J(R, 1, y) : y
        m =  rows(m) == 1  ? J(R, 1, m) : m
        d =  rows(d) == 1  ? J(R, 1, d) : d
        if ( (R=rows(y)) != rows(m) | R != rows(d) ) _error("Input vectors must have same length")
        td = J(R,1,.)
        for(r=1;r<=R;r++) td[r] = mdy(m[r],d[r],y[r])
        return(td)      
}

The second simply vectorises Stata date(s) to day(s) of year.

real colvector td2doy(real colvector td)
{
        real scalar r, R
        real colvector out

        R = rows(td)
        out = J(R,1,.)
        for(r=1;r<=R;r++) out[r] = doy(td[r])
        return(out)
}

The third converts year(s) and day(s) of year into Stata date(s).

real colvector y_doy2td(real colvector y, doy)
{
        real scalar R

        R = max((rows(y), rows(doy)))
        y =  rows(y) == 1  ? J(R, 1, y) : y
        doy =  rows(doy) == 1  ? J(R, 1, doy) : doy
        if ( rows(y) != rows(doy) ) _error("Input vectors must have same length")
        return(year2days(y) :- year2days(1960) :+ doy)
}

Now it is easy to build vectorised function converting year(s), month(s), and day(s) into to day(s) of year.

real colvector ymd2doy(real colvector y, m, d)
{
        return(td2doy(ymd2td(y, m, d)))
}

With these extended functions derivement of vectorised calendar calculations can begin.

Leap years (Gregorian)

The first thing is to identify a leap year. Based on the fractions above this is quite simple: If a year is dividable with 4 and not with 100 or dividable with 400 then the year is a leap year.

real colvector is_leap_year(real colvector y)
{
        return( (!mod(y, 4) :& mod(y, 100)) :| !mod(y, 400) )
}

The number of days in a given year can found be the function below.

real colvector daysinyear(real colvector y)
{
        return(365 :+ is_leap_year(y))
}

To get the number of days from year 0 until a given year (y).

real colvector year2days(real colvector y)
{
        return(365*y + floor(0.25*y) - floor(0.01*y) + floor(0.0025*y))
}

Note that this function is based on the assumption that the Gregorian calendar has been used always (which it hasn't). Dates before 1582-10-15 should be handled with caution. The floor fractions in the function year2days calculates the number of leap years until the given year.

And given a number of days since Januar the 1st. year 0 the number of years can be found by:

real colvector days2year(real colvector days)
{
        return(round(days / 365.2425))
}

ISO8601 calculations

ISO dates can be presented in Stata/Mata by:

string colvector str_isodate(real colvector td)
{
        return(strofreal(td, "%tdCCYY-NN-DD"))
}

As can seen date and time values are ordered from the largest to smallest unit of time: year, month (or week), day, hour, minute, second, and fraction of second ISO8601.

The first day of week in ISO8601 is monday. In Stata/Mata first day of week is sunday. This is remidied by the function below.

real colvector iso_td_dow(real colvector td)
{
        real scalar r, R
        real colvector dw

        R = rows(td)
        dw = J(R,1,.) 
        for(r=1;r<=R;r++) dw[r] = dow(td[r])
        return(dw + 7 :* (dw :== 0))    
}

Given year, month and day of a date the day of week can be found by the function.

real colvector iso_ymd_dow(real colvector y, m, d)
{
        return(iso_td_dow(ymd2td(y,m,d)))       
}

An ISO calendar year is long if and only if the corresponding Gregorian year either begins or ends (or both) on a Thursday ISO calendar.

real colvector iso_long_year(real colvector y)
{
        return(iso_ymd_dow(y,1,1) :== 4 :| (is_leap_year(y) :& iso_ymd_dow(y,1,1) :== 3))       
}

The formula from Wikipedia on week numbering is used in the function below to calculate the week number.

real colvector isoweeknbr(real colvector y, m, d) 
{ 
        real colvector td, wnbr

        td = ymd2td(y, m, d)
        wnbr = floor((td2doy(td) - iso_td_dow(td) :+ 10) :/ 7)
        wnbr = (slct = wnbr :< 1) :* (52 :+ iso_long_year(y :- 1)) + !slct :* wnbr
        wnbr = (slct = (wnbr :> 52 :+ iso_long_year(y))) + !slct :* wnbr
        return(wnbr)
}

By definition the yyyy-01-04 is always in week one and has ordinal day 4. Since ISO weekdays has base 1, 5 - iso_ymd_dow(y, 1, 4) is the first monday in a year.

The following function returns monday of week as day of year for a year and a week number.

real colvector iso_mow_doy(real colvector y, w)
{
        real scalar R

        R = max((rows(y), rows(w)))
        y =  rows(y) == 1  ? J(R, 1, y) : y
        w =  rows(w) == 1  ? J(R, 1, w) : w
        if ( (rows(y)) != rows(w) ) _error("Input vectors must have same length")
        return(5 :- iso_ymd_dow(y, 1, 4) + 7 :* (w :- 1) :- is_leap_year(y))
}

It is better to get a Stata date, hence the function below.

real colvector iso_mow_td(real colvector y, w)
{
        return(y_doy2td(y, iso_mow_doy(y, w)))
}

Validation

In ISO8601 the 4th. January is always in the first week of the year, and the 28th. December is always in the last week.

So the number of weeks between week numbers for these two dates must be either 51 or 52. The later when the year has 53 weeks (iso_long_year == 1).

Looking at the years from 1583 up until 2200.

mata: y = 1583::2200

The years where the difference in weeks is not 51 is selected.

mata: slct = (isoweeknbr(y,12,28) - isoweeknbr(y,1,4) - iso_long_year(y)) :!= 51

Listing all years not satisfying the selection gives.

mata: select((y, isoweeknbr(y,1,4), isoweeknbr(y,12,28), iso_long_year(y)), slct)

No years found!

The number of days between last monday of previous year

mata: d1= iso_mow_td(y:-1, 52:+iso_long_year(y:-1))

and first monday of current year

mata: d2 = iso_mow_td(y, 1)

should be 7.

The years where this is not the case is selected.

mata: slct = (diff = d2 - d1) :!= 7
mata: select((str_isodate(d1), str_isodate(d2), strofreal((y, diff, iso_long_year(y:-1), iso_mow_doy(y:-1, 52:+iso_long_year(y:-1)), iso_mow_doy(y, 1)))), slct)

No years found!


The do file for this document

Last update: 2019-03-01, Stata version 15.1