Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

lots of folks in Finance, you can share csv with any Finance company and they can process it. It's text.


Humans generate decisions / text information at rates of ~bytes per second at most. There is barely enough humans around to generate 21GB/s of information even if all they did was make financial decisions!

So 21 GB/s would be solely algos talking to algos... Given all the investment in the algos, surely they don't need to be exchanging CSV around?


Standards (whether official or de facto) often aren't the best in isolation, but they're the best in reality because they're widely used.

Imagine you want to replace CSV for this purpose. From a purely technical view, this makes total sense. So you investigate, come up with a better standard, make sure it has all the capabilities everyone needs from the existing stuff, write a reference implementation, and go off to get it adopted.

First place you talk to asks you two questions: "Which of my partner institutions accept this?" "What are the practical benefits of switching to this?"

Your answer to the first is going to be "none of them" and the answer to the second is going to be vague hand-wavey stuff around maintainability and making programmers happier, with maybe a little bit of "this properly handles it when your clients' names have accent marks."

Next place asks the same questions, and since the first place wasn't interested, you have the same answers....

Replacing existing standards that are Good Enough is really, really hard.


CSV is a questionable choice for a dataset that size. It's not very efficient in terms of size (real numbers take more bytes to store as text than as binary), it's not the fastest to parse (due to escaping) and a single delimiter or escape out of place corrupts everything afterwards. That not to mention all the issues around encoding, different delimiters etc.


Its great for when people need to be in the loop, looking at the data, maybe loading in Excel etc. (I use it myself...). But not enough humans around for 21 GB/s


> (real numbers take more bytes to store as text than as binary)

Depends on the distribution of numbeds in the sataset. It's quite common to have small numbers. For these text is a more efficient representation compared to binary, especially compared to 64-bit or larger binary encodings.


The only real example I can think of is the US options market feed. It is up to something like 50 GiB/s now, and is open 6.5 hours per day. Even a small subset of the feed that someone may be working on for data analysis could be huge. I agree CSV shouldn't even be used here but I am sure it is.


OPRA is a half dozen terabytes of data per day compressed.

CSV wouldn't even be considered.


You might have accumulated some decades of data in that format and now want to ingest it into a database.


Yes, but if you have decades of data, what turns on having to wait for a minute or 10 minutes to convert it?


> Humans generate decisions / text information at rates of ~bytes per second at most

Yes, but the consequences of these decisions are worth much more. You attach an ID to the user, and an ID to the transaction. You store the location and time where it was made. Ect.


I think these would add only small amount of information (and in a DB would be modelled as joins). Only adds lots of data if done very inefficiently.


Why are you theoretising? I can tell you from out there its used massively, and its not going away in contrary. Even rather small banks can end up generating various reports etc. which can easily become huge.

The speed of human decision has basically 0 role here, as it doesn't with messaging generally, there is way more to companies than just direct keyboard-to-output link.


You seem to not realize that most humans are not coders.

And non coders use proprietary software, which usually has an export into CSV or XLS to be compatible with Microsoft Office.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: