Marketflow is a collection of tools for reading and processing financial market data, which often comes in difficult-to-work-with formats. See below for an overview of the data sources currently supported. While the normative user is expected to primarily use Python, use may be made of other runtimes, for example we currently have some experiments with Spark (though we’ve not yet included them here). If it seems useful to building a community around open-source econometrics, etc. please consider contributing your code!

This library was intially developed in a BIDS collaborative project with the D-Lab’s director, Justin McCrary, and led by Dav Clark (a member of both D-Lab and BIDS). It is the result of many contributors, all of whom are currently attributed as authors.


The Trade and Quote (TAQ) dataset from the NYSE comes in heavily compressed zip archives of fixed-width data. The column formats have changed over the years, though we currently believe that files of a given width will have the same format (essentially, the format has only ever been enlarged to include more columns). The documentation for these formats is usually hidden behind a paywall, but can likely be obtained from wherever you are getting the data themselves.

We have optimized a pathway that maps bytes from the file into a numpy structured array, with some degree of control over conversion to binary types. We feel that our fixed-width parser is of high quality and should likely be split out into a separate library!

The below can be used as a part of your analysis, or as informal documentation illustrating how the library can be used.

Utility scripts

There are a handful of scripts installed along with this package:

Runs the taq2h5 function from marketflow.hdf5, converting zipped fixed-width TAQ BBO files to HDF5 with PyTables. Invoke with -h or --help for more info.
Runs the main function from marketflow.ITCHbin, exporting the variable-length binary records from binary ITCH datafiles.

Developer utility scripts

Scripts that are primarily intended for development are not installed by default (avoiding cluttering the users PATH). Currently, there is a script for creating a small HDF5 file suitable for quick testing and distribution called generate_test_data.py.

Jupyter Python Notebooks

We have some examples of code in the tests/ directory in the marketflow repo. These can be browsed directly on GitHub.

