Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> these design choices seem to limit its use to relatively small files

1. As a rule-of-thumb, I have been working on functionality before optimization. That said, `tv` is really fast. It is completely false that `tv` only works for relatively small files. I just pushed a 624MB file to `tv`. It ran in 2.8 seconds. With `column` it takes 5.0 seconds. Now, I would love help from programmers smarter than me. I am sure there are a lot of optimization gains to be had in `tv`. I just wanted to make sure potential users are not misled. `tv` is performant.

> Some (most?) tools that output data in columns and fit each one to the largest value in that column need to scan the whole file as a first pass just to start displaying data.

> Not only is it the case with this tool, but from what I'm reading in main.rs it looks like it's also loading the whole file in memory.

2. `tv` reads once, but parse partly. This means that it reads the full file only to grab the number of rows. It only parses(take) the first n rows.

https://github.com/alexhallam/tv/blob/b548f0d19f64438d53f732...

https://github.com/alexhallam/tv/blob/b548f0d19f64438d53f732...



If the goal is to calculate the correct column width, you have to do one pass through the data before writing the first row.

If the file can be read multiple times (not a UNIX stream), you can just read the file twice.

If the file is a stream, instead of retaining the entire dataset in memory, you can write to a temporary file and re-parse it after calculating the widths.


The correct column width is calculated from the first n rows not the full file.

A stream does not work for tv because a stream does not know how many rows are in the file a priori. Displaying the dimensions of the file is a priority for `tv`. I am very happy with that trade-off. I would rather know the dimensions of a file than have a file stream of unknown dimensions.


If you did it the way he's talking about you would stream through the file to find how many rows and write the file as a temp file that you could re-parse for the actual data.

I'm not saying you should or shouldn't, but your use case doesn't bar you from using streams.


I see. Thanks for the clarification.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: