Every file in TAR ends with 1KiB of zeros as “end of file marker”

barneygale · on Sept 17, 2023

Most of the alternatives you suggest are implemented in PAX, which is an extension of the tar format. The EoF block is useful if the archive is written directly to physical media without a filesystem: it lets you determine where the archive ends.

vitiral · on Sept 17, 2023

Wouldn't such a feature be better put on the physical media controller/protocol itself?

h2odragon · on Sept 17, 2023

yes; but then people wanted to be able to read the tapes they'd written long before the hardware and media acquired such features, so TAR kept the ability (it was more don't change things that aint broke, i think).

ARC and ZIP files are written as a fresh take on the idea of archive files, with much more capable hardware, after TAR had been around a couple decades. They have many features designed to use those new hardware capabilities, and were (and still are) very popular because of those.

They have bits that probably seem dated now, too. Breaking archives into floppy size chunks? but without any sort of forward error correction? No format support for unicode? (who cares it wasn't invented when the ZIP file spec was created?)

DamonHD · on Sept 17, 2023

Because because.

TAR was written for very simple/tiny machines by today's standards, and was designed to read/write full valid blocks on physical tapes with constraints on spool-up and spool-down times/distances.

The description here seems reasonable: https://en.wikipedia.org/wiki/Tar_(computing)

vitiral · on Sept 17, 2023

I don't think any of the alternatives I listed can't be done on an embedded device or a tape drive though. I understand that the format is old and so perhaps many of the arbitrary constraints weren't seen as that bad at the time though.

DamonHD · on Sept 17, 2023

This format was developed a long time ago before the luxury of (a) experience and (b) newer more capable storage hardware.

It's really strange to complain that a legacy format is full of bad features for modern tastes and hardware - how do you think it was worked out what bad and good features of formats and hardware might be?

The history in the Wikipedia page that I linked is instructive.

vitiral · on Sept 17, 2023

I'm arguing that these features make no sense _in the old context_ as well. Why waste 1KiB and namelen-100 bytes per file when space is so precious?

beardyw · on Sept 17, 2023

We used to have files consisting of many multiple tapes. We changed the OS to do dead reckoning on the end of each tape so we could stop well clear of the actual end mark. That way individual tapes could be copied and substituted if needed. Hard to see the reason if you don't just know.

vitiral · on Sept 17, 2023

I don't think I understand but maybe that's the point. Maybe it seems mysterious because there were other requirements at the time which were themselves already mysterious?

beardyw · on Sept 17, 2023

If you try to copy a tape to a slightly shorter tape you will be out of luck, and only know when it gets there.

DamonHD · on Sept 17, 2023

Because some tape drives could only read and write whole (512B) blocks, and the way to be relatively sure that you didn't have a new file was to see two blocks of zeros.

The Wikipedia item explains this.

vitiral · on Sept 17, 2023

I think I'm confused. You make it sound like the "tape reader" hardware/driver isn't talking to the "file reader" part in software . Didn't the file reader tell the tape reader the size of the file, so it would already know where the end was (how many blocks it should read)?

stop50 · on Sept 17, 2023

Tar is an descendant from file formats that were written to tapes. Recovery was to the same system & space was not cheap. there is also an error in the maximal size: its 8 Gigabyte not 64. The last byte must be an \0. This is fixed in modern gnu and bsd tar.

vitiral · on Sept 17, 2023

Thanks for the correction.

If space was so important then why waste 1KiB/file? Why waste namelen-100 bytes per file? Why put space into owner/group id?

rini17 · on Sept 17, 2023

There was push to use newer and better designed CPIO format with limited success...seems tar is just too ingrained.

edit: looked into it a bit and it was not much better in fact.

sys_64738 · on Sept 17, 2023

I don't understand... Here is a better way...

No. Don't even think about it. tar is. It always will be that way so leave it.

vitiral · on Sept 17, 2023

I'm not suggesting changing it, I'm trying to understand how it could possibly have been this way in the first place