That's right! Under the hood we're doing the same thing when a UDF function is created so its still language agnostic, but for python it offers much nicer and needed wrapper - designed for actual users and not for showcase. If this will translate just as well to other chdb bindings (go, rust, node, bun, etc) allowing them to attach native functions, UDF might become a major force for chdb adoption.
The embedded db is cool, but what's the security boundary around UDFs?
I figure running "standard" SQL is safe, because it's just some functions implemented in the database (SPJ). It's not malware (maybe only slow ;-)). But UDFs that could be python or worse risk a binary are more of an open ended danger.
Thank you, Lorenzo! When I first read about chDB, I thought it was a great idea. When I tried the implementation, I was even more impressed! So this little tutorial was born.
I have been talking to Auxten about adding some interactivity to the code examples on doc.chdb.io, so hopefully the chDB docs will soon be even friendlier to newcomers.
I ran the same queries and got similar results but the bandwidth utilization I measured was significantly different. On the same fly.io instance with 1vCPU/256MB both queries completed successfully but ClickHouse/chdb reached 10MB/s (max) and logically completed the count faster, while DuckDB only peaked at around 2.5MB/s.
This might be due to the tiny resources but I like rock bottom measurements. Did anyone else notice a similar bandwidth utilization gap?
Disclaimer: I am a chdb maintainer! duckdb is currently thinner and has lots of active contributors and mature integrations, while chdb is still in its early stages BUT if you already love ClickHouse (like we do) chdb is a great choice as it inherits all the ClickHouse stability, performance and more importantly, all the 70+ supported formats for the embedded use case without any of the server/client requirements, making it perfect for fast in-process and serverless OLAP executions.
Note chdb is based on ClickHouse codebase but completely community powered so there's no feud with DuckDB (I'm a quackhead, too!) which actually offers lots of great inspiration and many integration opportunities with ClickHouse/chdb for combined compute and processing of datasets. I personally love both and use them together all the time in my colab "OLAPps"
Different beasts, but if by any chance you love ClickHouse already and just want to run OLAP queries in-process, there's chdb: https://github.com/chdb-io/chdb
Nothing to watch out against. You're referring to a simple dockerfile for convenience of testing and displaying configuration options, while neither Grafana or Minio are part of IOx or the community builds.