HDF5 for Python

 

logo image

About the project

The h5py package is a Pythonic interface to the HDF5 binary data format.

It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want.

H5py uses straightforward NumPy and Python metaphors, like dictionary and NumPy array syntax. For example, you can iterate over datasets in a file, or check out the .shape or .dtype attributes of datasets. You don't need to know anything special about HDF5 to get started.

In addition to the easy-to-use high level interface, h5py rests on a object-oriented Cython wrapping of the HDF5 C API. Almost anything you can do from C in HDF5, you can do from h5py.

Best of all, the files you create are in a widely-used standard binary format, which you can exchange with other people, including those who use programs like IDL and MATLAB.

Stable Downloads

All downloads are now available at the Python Package Index (PyPI).

Check out the install guide.

Development

All development for h5py takes place on GitHub. Before sending a pull request, please ping the mailing list at Google Groups.

Documentation

The h5py user manual is a great place to start; you may also want to check out the FAQ.

There's an O'Reilly book, Python and HDF5, written by the lead author of h5py, Andrew Collette.

General questions are always welcome on the mailing list.