JavaScript Loaders

Monday, July 19, 2021

Time Travel with py datatable 1.0

R package data.table has become a tool of choice when working with big tabular data thanks to its versatility and performance. Its Python counterpart py datatable follows R cousin in performance and steadily catches up in functionality. A notable omission - temporal data types - were introduced in version 1.0 by means of two new types: 

  • datatable.Type.date32 to represent and store particular calendar date without a time component and
  • datatable.Type.time64 to store specific moment in time (i.e. date with a time component)

and the datatable.time family of functions: https://datatable.readthedocs.io/en/latest/api/time.html

Let's have a brief overview of how to use them.

datatable.Type.date32

This type represents a calendar date without a time component and internally stores date as a 32-bit signed integer counting the number of days since (positive) or before (negative) the epoch (1970-01-01). Thus, this type includes dates within the range of approximately ±5.8 million years which places the oldest stored date into the Late Miocene Epoch and the maximum one into completely unknown even to science fiction year 5,879,610 of the 58797th century in the future:



 




 

There are various ways to initialize and/or create date32 column inside datatable:







 

or







 

Remember to use ISO 8601 format when representing dates as strings, otherwise parsing fails silently: 



 

 

 

 

 

 

If a frame already contains dates as strings then using combination of the constructor function datatable.time.ymd()  (to create date32 type), cast function datatable.as_type() (to convert str to int) and string slicer datatable.str.slice() (to substring date elements) suffices to parse and create corresponding date32 value all within datatable API:








datatable.Type.time64

This type represents a specific moment in time and is stored internally as a 64-bit integer containing the number of nanoseconds since the epoch (1970-01-01) in UTC:


 

 

 

 

Similarly time64 can be created in the same fashion as date32 type above, for example:


 

 

 

 

Again a time string should include ISO 8601 format as well. To create time from its components use a constructor function datatable.time.ymdt():








datatable.time.* Functions

To effectively use datatable date32 and time64 types there are special functions included that are part of datatable.time family:

  • constructors ymd() and ymdt() and
  • date and time part functions: day(), day_of_week(), hour(), minute(), month(), nanosecond(), second(), year()

Using constructors was showcased already and the part functions will come handy when filtering data etc., e.g.:



No comments: