http://www.hdfgroup.org/HDF5/doc/RM/Tools.html http://www.hdfgroup.org/tools5des...

http://www.hdfgroup.org/HDF5/doc/RM/Tools.html

http://www.hdfgroup.org/tools5desc.html#1

The first thing I do when I deal with data which has more than --let's say-- 10,000 rows is to put it in HDF file format and work with that. Saves a ton of time while developing a script. I had a python script do a histogram and it ran ~15sec for a file with 100k rows. With converting it first to HDF it ran in ~0.5sec. The import in python is also much shorter (two lines).

HDF is made for high performance numerical I/O. It's great and you can query several structures and even do slices of arrays on the command line (with h5tools).

It's also widely used by Octave, Python, R, Matlab... And you don't have a drawback since you can just pipe it into existing command line tools with a h5dump.

HDF5:

http://www.hdfgroup.org/HDF5/RD100-2002/HDF5_Performance.pdf

http://www.hdfgroup.org/HDF5/RD100-2002/HDF5_Overview.pdf