Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Newbie question here - I'm just curious. Is it quicker to store the data this way in loads of files or to use Postgres?


For new development I'd recommend PostgreSQL over flat files for most projects.

Really depends on what you're trying to store though. For large data (images, audio) that can't fit in a table row, the filesystem is way better.

In our case we started with flat files, and buying breathing room is the first step to move past them.


Curiosity: Why did you start with flat files? It looks like hackernews was started in 2007, relational databases had been around for quite some time at that point, and were the standard way to store such forum data (see: every popular forum framework at the time)... the decision to store this sort of data (news/link forum with comments, 100% text) as flat files is very confusing to me.


  My guess is that Arc - the lisp language running HN created by Paul Graham - was new, and coding and maintaining a database driver was out of question. 

 Today, perhaps the way to go would be to use some sort of json webservice interface to a database written in another language rather than writing a driver.


That would be my guess as well. It's one thing to decide you want a simple forum and have it coded within a couple of days. It's entirely another to spend months creating a stable database library and keep it upto date with all the latest changes.


Or you could, you know, use a more popular language.


I don't know why he did it, but some possible reasons:

If you already understand your OS well, filesystems have simple, known, and reliable performance characteristics. Databases involve a lot more code, and are harder to reason about.

If you're starting something, it pays to start simple. How many 2007 projects made it to today with this much traffic? A very tiny fraction.

If you're keeping a lot of data hot in RAM and working with it directly (which I hazily understand is HN's approach), then databases don't buy you much. Typical database usage is to use a database not just as a persistence engine, but a calculation engine, a locking engine, a cross-machine coordination engine, and other stuff as well. If all you need is persistence, then that isn't very hard to do yourself.

For things you intend to build and maintain yourself, "standard way" may not buy you anything. Graham already had toolbox he knew perfectly well. He didn't have a lot of incentive to learn somebody else's way.


See also the Viaweb FAQ -- http://www.paulgraham.com/vwfaq.html

pg calls out lisp and flat files as unconventional choices that worked well enough.


There are too many variables to give a straight answer. Writing to a bunch of different files at the same time? Postgres is probably faster since all that disk IO contention gets turned into a write to the WAL. Don't have the resources to give postgres? Files maybe faster than watching pg choke on a lack of RAM or CPU time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: