Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It's better to store text in git IMHO.

I think the bias towards text files in Git is only a reflection of Git configuring its defaults to handle text files, such as the diff tool and also file type. If you add a custom file type and configure it to not handle it as text, which includes specifying git attributes to prevent it to update things like newline characters, then Git works just as well.

https://git-scm.com/docs/gitattributes



Diffs are purely a UI mechanic of Git. Computed on demand, but they don't really exist at the data layer, where blobs are stored. That's the real spot where all the problems are.

Binaries can't realistically be packed and compressed most of the time in Git's design, so the net result is there's basically a copy of every version of that file in your repo, forever. That 10MiB binary got modified 5 times? That's 50MiB of bloat that is stuck in your git repository, forever, until the end of time. It doesn't matter if you delete it. Space inefficiency like this is a core issue that compounds many problems in practice.

Then there's the fact a lot of the algorithms start falling over. For example, Git rebases touch the disk multiple times (patch application, update the index) for every commit in the series you rebase. This gets very expensive when the working tree is filled with tons of blobs and the repository is large (many files, even 99.9% small ones) and the series is long.

I sort of doubt most programmers want to write/rewrite 50TiB of binary files in their Git repo every day. Some certainly do I guess, but I suspect most of us just want to shove a few dozen reference PDFs and a CAD file or two into their repository, maybe some zip files or .so files that get auto-updated, and use their basic workflows without things having performance cliffs. Some will want to store game assets, which is harder. But today it's mostly unsatisfying for anything but the smallest and most glacially-moving binary files.


It doesn't, what you see when you display the content is only 1/2 of it, the other half is how the system is able to store the content through delta-compression.

Here "text" or "binary" is a bit of a red herring, what's really important is whether it's diff-able. But in 99% of cases "binary" and "text" are synonyms for "un-diffable" and "diffable".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: