Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

is this an easier to do the "store parquet on s3 > stream to duckdb" pattern that's popping up more and more?


> MemTables are flushed periodically to object storage as a string-sorted table (SST). The flush interval is configurable.

Looks like it has a pretty similar structure under the hood, but DuckDB would get you more powerful queries.

FYI duckdb directly supports writes (and transactions) so you don’t necessarily even need the separate store step.


Do you know any resources/examples about the setup you mean? It sound interesting but from a quick search I didn't find anything straight forward.


Check out Apache Iceberg. It's a format for storing Parquet data in object storage, for both read and write. Not sure if DuckDB does Iceberg (I know ClickHouse does), but it's a similar principle, disaggregating data from compute.


Yes, DuckDB does do Iceberg.

https://duckdb.org/docs/extensions/iceberg


This is more targeted at OLTP style workloads with mutable data and potentially multiple writers




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: