Hacker Newsnew | past | comments | ask | show | jobs | submit | arsalanb's commentslogin

Working on Livedocs.com

We are building a modern alternative to Jupyter, something like Cursor meets Jupyter.


https://livedocs.com

An AI data scientist for serious data work. Think of it like an AI native Jupyter notebook.


Check out livedocs.com, we built a notebook around Polars and DuckDB (disclaimer: I'm the founder)


We use Vega pretty heavily, its a broader ecosystem. Where it really shines is in combination with Altair and Vegafusion to do number-crunching on the backend and return a chart spec that can just be rendered on the front-end.

That makes it particularly useful when building interactive visualizations with a lot of data.


Is the Python code executing via Pyodide in the browser?


That's correct


Out of curiosity, what took you 10 years? Keen to hear your journey as someone building in the same space..


Year 1 - Getting the first version out that worked for me and my colleagues. Year 2 - Cleaning it up enough to promote wider use. Documentation. Years 3-5 - Minor bug fixes only as I thought qStudio solved the problem. Years 6 - I realised restricting qStudio to only 1-2 database technologies was foolish. Major change to support many more databases. Improved generic SQL highlighting. Added a partial dark mode. Years 7-8 - Minor bug fixes. Year 9 - Added a proper Dark Mode and support for many themes by using FlatLaf. Now looking properly modern. Year 10 - Realise that I'm not fully solving the problem. That actually for most data analysts I should support creating the analysis (pivot table) and improve exporting (real excel export, not just nicely escaped CSV).

There were more learnings, like I should definitely have went fully open source at the start. It's harder to do later.


Good software takes ten years, at least according to Joel. That’s a significant application of course.


https://www.joelonsoftware.com/2001/07/21/good-software-take... - Reference. I read this for the first time. Thank you for sharing this.


> Out of curiosity, what took you 10 years

That's not the first release, that's "just" 3.0, they released QStudio 1.25 in 2013 (their first blog post) https://www.timestored.com/b/qstudio-kdb-ide-1-25-released/


Noob question: What is the advantage of replicating data into a warehouse vs. just querying it in place on a postgres database?


Typically data warehouses are OLAP databases that have much better performance than OLTP databases for large queries.

There might also be several applications in a company, each with their own database, and a need to produce reports based on combinations of data from multiple applications.

I think that in many cases your question is based on an idea that is completely right. engineers are too eager to split out applications into multiple databases and tacking on separate data warehouses. The costs of maintaining separate databases is often higher than initially thought. Especially when some of the data in the warehouse needs to go back into the application database, for example for customer facing analytics. I think many companies would be better served by considering traditional data warehousing needs directly in their main application databases and abstain from splitting out databases. Having one single ACID source of truth and paying a bit more for a single beefy database server makes a lot more sense than is commonly thought. Especially now when many customer facing products, like recommendation systems, are “data driven”. At least that’s my impression after working in the space for a while.


If the postgres database is recording business transactions, you don't want to cause your business to stop being able to take credit cards because you generated a report.


Assuming you use a connection pool, why would it stop? Either the query returns a result or it doesnt? Am I missing something?


Reporting queries can put a significant load on the db, to the point that it interrupts service.


Futhermore, Postgres is an OLTP (transactional) database, designed to efficiently perform updates and deletes on individual rows. OLAP (analytical) databases/query engines like Clickhouse, Presto, Druid, etc. are designed for efficient processing across many rows of mostly unchanging data.

Analytical queries (like "find the average sales price across all orders over the past year, grouped by store and region") can be 100-1000x faster in an OLAP database compared to Postgres.


I see, thanks!


What about using a read-only replica for reporting. Are there any downsides to that? Seems to be easier to manage


That’s the use case for cdc, to make it equally easy to use a DW. As always the complexity is just air you move in the balloon. The oltp db can spit out the events and forget them, how you load them efficiently is now a data engineer’s problem to solve ( if it was easy to write event grain on an olap you would not need an oltp). Kafka usually enters the room at this stage and the simplification promise is becoming tenuous.


that works great if all the data you need for reporting is in the database you're replicating.

You'd likely want a data warehouse if you also need to report on data that isn't in your prod database (e.g. stripe, your CRM, marketing data, etc.).

If setting up a data warehouse, etl, BI, etc. sounds like a lot of work to get reports, you're right, it is.

shameless plug: we're making this all much simpler at https://www.definite.app/


Additionally, unless your data model is designed as append-only (which is unusual and requires logic downstream), you won't be able to track updates and deletions, which are valuable for reporting


Event sourcing

Also reporting/analytics work loads tend to be ad hoc queries and hard to optimize so you generally favor fast storage over indexes. Frequently for analytics and reporting it's more efficient to use a columnar DB than a relational db


When you need to do large/medium-scale analytical queries. Postgres is fairly slow for aggregate/group queries needed for analytics. Think if you're building Google Analytics type functionality.


Data warehouses are structured to handle large volumes of data and complex queries more efficiently than a typical transactional database like PostgreSQL.


That sounds like a great add! Will add this in over the weekend!


and "add to calendar" link


This is great, especially to introduce new devs to models. I use (and love) TablePlus (https://tableplus.com/) which has a diagram generator plugin that does the same!


I don't have a formal compsci background but one of the most enlightening things I've undertaken recently was a course involving data modeling. this seems like an important exercise many overlook.


I imagine this is possible because of better modulation of the acoustic and pressure controllers to minimize energy loss..


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: