Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> best way to build intuition for a dataset

Could you elaborate on this please ? I work with a lot of datasets, and have found python + libraries (plt/pd/np/scipy/regex) to be far more useful. But, that might just be my inexperience with excel.

Can you give a few examples of analyses that work better in excel than python ?



Not the GP, but I think they are speaking specifically to non-programmers with this statement.

It's not about which analyses are more performant/easier in one object versus the other, it's how do you most easily introduce the general audience to big data, both reading, manipulating, and transforming.

I actually disagree with their statement tbh, as I think that it's too nuanced of a situation to scope like this.

I used to work in a university, and depending on the dataset and the intended output, I would switch between R and Excel for the students. Those who needed R level analysis eventually saw why it was more useful for them than Excel and got good at seeing when to use R versus when to use Excel.

Those who had datasets/output goals that didn't need heavy lifting really just needed Excel. It's not incorrect to say that learning heavier tooling/languages is a benefit, there is also a time consideration to learn and become efficient at a given toolset. The heavier toolsets have their nuances and accomplishing the same task in less robust toolings like Excel is the more efficient and better approach for those who have extremely limited time and for those who are not likely to need the heavier toolset in the future.

It's just a simple cost benefit analysis -- what tool is going to give the best return on time investment?

There is a very valid and reasonable argument that investing into the heavier toolsets will eventually reach a point where even the simple tasks that Excel and other tools allows users to perform more easily with less knowledge is faster/better with the heavier language; the question is "when is it optimal for a given person to invest the time to get to that stage?", and that's a question that doesn't always have all available data to make an informed decision on since it's hard to predict the future.


You're definitely correct that its a nuanced question whether for a given (user, analysis) pair they are better off in Excel or Python/R/etc. Specifically with respect to building intuition for a dataset, however, there is a huge benefit of having an interactive data representation (if only for the ability to scroll and see all of your data).

Because you can think of Mito as a frontend interface to Pandas, using Mito doesn't prohibit you from building intuition or analyzing your data in the same way you would if you didn't have the spreadsheet frontend. It just helps you write the Python/Pandas code faster + see the most up to date version of your data set in live time.

The typical Mito user uses Mito multiple times throughout an analysis. A common pattern is: start by just visualizing the data in Mito, create a few graphs to help understand the distribution using matplotlib (right now we only have a tiny bit of graphing support), passing the data back into Mito to do some filtering and cleaning, then lastly creating a pivot table output using Mito. Of course, it varies greatly from user to user, but that's a general flow we see often!


I think it's more simple than that. The way the data is presented to you in Excel makes it incredibly easy to grok.


For me it's SQL and then simply visualizing.


As a bit of background on Mito, it works by passing the parameters from the frontend spreadsheet to the Python kernel backend, which transpiles the spreadsheet formula into Python/Pandas code [0]. So what we're hoping to do down the line is let the user pick which language to translate to, SQL and R being the obvious next steps. But that's a ways away :)

[0] https://trymito.io/blog/transpiler


The problem with SQL is that it is not great with pivoting. But maybe that is not a big problem when you auto-generate the SQL.

I agree, SQL is what I like more for mangling. Except for the pivot/melt part that is




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: