The confusion arises from the fact that, when referring to RAM, we (mistakenly) use 'kilobyte', 'megabyte' or 'gigabyte' to mean 2^10, 2^20 or 2^30 bytes. The replacement terms of 'kibibyte', 'mebibyte' etc. do not appear to have gained widespread adoption. One can nostalgically appeal to the good old days in which a megabyte was a megabyte, and screw the Bureau International des Poids et Mesures, but those days are long gone.
However, in common usage we now have the pretty awful scenario where, if a person says "a megabyte", the actual number of bytes they're talking about can change depending on where the data is stored! A megabyte of RAM is not a megabyte of hard disk space. The solution would be to standardise on binary prefixes, since we pretty much have to talk about RAM that way, and a consistent measurement across different media is eminently sensible.
The "hard drive maker conspiracy" story is driven by the fact that manufacturers have no real incentive to switch to binary prefixes, because that would make their drives look "smaller". Do you want the 300 gigabyte drive or the 279.4 gibibyte drive? Aside from the fact that hardly anyone knows that a gibibyte is, in the absence of any more information the larger number is probably better. Even worse, if you created a 300 gibibyte drive to compete with the 300 gigabyte drive, consumers would probably not realise that the 300 gibibyte drive is bigger. It's not exactly a conspiracy, but it is a suboptimal arrangement that results from the manufacturers' incentives.
> since we pretty much have to talk about RAM that way, and a consistent measurement across different media is eminently sensible.
Except that the HDD manufacturers are going to keep doing what they've been, and it'll just create more confusion among consumers. "What's the conversion factor between GB and GiB?"
> Even worse, if you created a 300 gibibyte drive to compete with the 300 gigabyte drive, consumers would probably not realise that the 300 gibibyte drive is bigger
One manufacturer could just start using phrases like "300 REAL gigabytes!" and market it aggressively. "7% more storage than the competition! Finally a drive that stores the amount you paid for!"
As for why the -bi- binary prefices haven't caught on, I think one of the biggest obstacles is that they just sound hilariously stupid...
Their claimed storage is correct. "Real gigabytes" are what disk manufacturers, and Linux, and others report. Microsoft calculates gibibytes (GiB, 2^9) but has always labeled them as gigabytes (GB, 10^3), which is what perpetuates this confusion.
They could also just standardize their storage sizes on the higher number and still advertise the higher number -- i.e., instead of selling a 300GB hard drive, they could sell a 322.12GB drive, which stores 300GiB.
The hard drive makers are correct. We should fix the mistake with RAM. 4GB of ram refers to gibibytes, not gigabytes. RAM comes in chunks of powers of 2 due to architecture. There is no reason the capacity of a group of spinning platters needs to be divided up into powers of 2. With that said, there is no reason to rate an SSD in gigabytes over gibibytes.
Ironically, my spell checker shows a little red squiggly under gibibyte. Even the developers don't believe it's a real word.
> "[...] and screw the Bureau International des Poids et Mesures, but those days are long gone."
They're really not. SI still doesn't define any units relating to information quantity, so BIPM isn't relevant. This is one issue where an appeal to authority really doesn't fly. The relevant authorities (IEC, ISO, IEEE, JEDEC, etc.) didn't try to address the ambiguity until the late 90's, after it had spread into the general population.
> The confusion arises from the fact that, when referring to RAM, we (mistakenly) use 'kilobyte', 'megabyte' or 'gigabyte' to mean 2^10, 2^20 or 2^30 bytes.
It's not a mistake. In a living language, weight of usage trumps prescriptivism.
> However, in common usage we now have the pretty awful scenario where, if a person says "a megabyte", the actual number of bytes they're talking about can change depending on where the data is stored! A megabyte of RAM is not a megabyte of hard disk space.
That's not correct. A megabyte of RAM and a megabyte of HD are both 1,048,576 bytes, one megabyte. Again, usage trumps prescriptivism.
I agree that the hard-drive-manufacturer thing is not a "conspiracy;" it's simple false advertising that slips through the cracks of the law because of, again, a foolish belief in prescriptivism on the part of the enforcers.
The only backup awareness days are the days you get audited ( by an auditor or by a disaster). Unfortunately, all other days are "backups cost a lot/are difficult to maintain/have never been tested" days.
Good question, especially when the linked post is wrong, or at least poorly written.
>Because in 1960, the Bureau International des Poids et Mesures decided that the SI prefix G- meant 10^9.
This is wrong. The switch from GB to GiB happened in 1998 and came from International Electrotechnical Commission (IEC)[1], before that a GB was indeed 2^30. Then the old GB became the GiB and the new GB was standardized to use base 10.
both are correct. It's just some people decided that prefixes (G-, M-, etc) taken from a certain standard should mean differently in their area of expertise.
Sorry to split hair but until 1998 GiB didn't exist and GB were indeed 2³⁰ so it's wrong to say GB are now 10⁹ because of a decision of the BIPM from 1960.
GB are 10⁹ because in accordance with its mission the IEC made them so in 1998 which incidentally base its work on this 1960 decision.
Without the IEC, we would still have GB referring to 2^3,. I can tell as I lived through this change as the annoying cs student who correct something the professor said, in front of the class and is right about it.
When you say "GB were indeed 2³⁰", do you mean it was officially defined to be that, or that it was common practice for people to call it that. G- was officially defined in 1960 to be 10^9.
In the case of my submission, I legitimately came across spiped while researching a problem I was working on today, and I thought others might find it useful.
At time of writing there are three links to tarsnap.com on the front page: this one, scrypt, and spiped. I think that means that you win HN, sir. It's been a good run everyone, time to wind things down.
Hard drives used to be sold by the powers of 2 bytes. I distinctly remember when all manufacturers suddenly started stating drive sizes using powers of 10 bytes and the explanation that made the most sense wasn't a sudden desire to conform to SI prefix standards, but to market their drives as having greater capacity.
In fact, I've got a 20MB SCSI drive here which I should hook up to verify...
The article is a strawman: No one argues that the SI prefix G- is base 10. The example: "A 200 GB hard drive holds..." is blatant question begging ("It's 10^9 because it's 10^9").
The real questions are: Why did hard drive manufactures move from a (misnamed) base 2 to base 10? They were confused before but then saw the light (decades later)? Why did all OSes and utilities use base 2? Why do most still use base 2? Why can't we use base 2 now?
Making the consumer think he gets more for his money doesn't give you an advantage over your competitor (if he's doing the same thing), but it does put more money into the industry as a whole. Imagine a home ice cream machine fad. All manufacturers might rise more or less equally, but they all make more money now that people believe their lives are better enhanced by putting their money into ice cream.
As disk sizes started to increase, the difference started to matter more and more. 1KiB is only 2.4% more than 1KB, but 1TiB is 10% more than 1TB.
I think it makes sense that this didn't become significant until disk sizes started reaching GB levels, and therefore manufacturers' choices to switch to base 10.
Besides it making more difference for bigger disks, there is a huge performance penalty on x86 for disks that use sectors that are not a multiple of 1kiB. And since earlier models were MB sized, they couldn't even get those few percent back.
Except when they do. Amazon Web Services uses 2^30-byte GBs for bandwidth and EBS disk sizes. They measure EC2 ephemeral disk sizes in 10^9-byte GBs, though...
Re: AWS using 2^30 for bandwidth. That's a serious error - I have never seen bandwidth ever measured in anything other than SI Prefixes. In fact, on the few occasions I've seen data transmissions measurements called out as GiB, I've done a bit of research, and discovered that the author was incorrect, and that the actual data transmission was really GB. Memory is the only place you should ever see GB mean 2^30 - and, it would be nice (though unlikely) if everyone could just switch to GiB when referring to memory, and then GB=GB, and GiB=GiB.
Is there any proof that hard drive manufactures actually switched? It's always just repeated as truth. I remember megabytes being all over the place in the days of floppy disks, 1000 * 1000 or 1000 * 1024 or 1024 * 1024. Hardware engineers have always been more likely to use 10^9 (ie megabits) where software developers have always been more likely to use megabytes.
Why did hard drive manufactures move from a (misnamed) base 2 to base 10?
A similar question is - why do all gas stations sell gas for x.yy9? It's impossible to sell something for 9/10 of a penny but in the U.S. they all do it.
If you ask the people doing it, you'll get the answer that it serves the consumer better, it's just a coincidence that it happens to make their product look artificially better/cheaper/whatever than it is.
Indeed, the original PC/XT 306-4-17 drive contains 10,653,696 bytes, or slightly more than the 10MB (10,485,760) it was advertised as.
Ditto with flash devices; due to their addressing architecture, they are inherently binarily-capacitised(?) I have here a 16MB USB drive from when they first came out, and it stores exactly 16,777,216 bytes, or 8,388,608 512-byte sectors. Back then, flash memory was all SLC and it was reliable enough that only the few spare bytes on each page were needed for remapping/ECC and the OS's filesystem bad-block management could be used.
I remember wondering why anyone called them 1.44MB. I could understand 1.38MB, and I could understand 1.457664MB, but where in hell did 1.44MB come from?
Now I realize that they mixed 2^10 and 10^3 in the same sentence. Thanks for finally answering a question that's been in the back of my brain for 20+ years!
I call shenanigans. If you want to claim that 10-based kMG prefixes ought to be universal for all things without exception, that's one thing. But the author says:
> Unlike everything else in the world of computing, RAM is addressed in hardware. When you're designing a piece of silicon, you want to have N address lines and have every combination of zeroes and ones map to a memory location — to do otherwise would make the logic far more complicated. Nothing else is addressed this way.
So it's ok to used 2-based kMG for RAM, but not for hard drives? But hard drives get mapped to memory, and memory gets mapped to hard drives. I have pages of memory written out to disk, and I have inodes of files cached into memory. So my hard drive will be subdivided into pieces that are 2-based, and my partitions will normally have a whole number of such pieces, right? Addressing, of whatever sort, is often more convenient if different levels of subdivisions are 2-based (because the arithmetic can be bitwise or mostly bitwise, rather than addition and subtraction everywhere).
...so the idea that hard drives might want to report their entire size in a 2-based unit isn't even remotely as far-fetched as the author claims. It's 2s all the way down.
If you mean GiB, say GiB, if you mean GB say GB. Tarsnap does, based on that over ten year old standard (IEEE 1541-2002) correct thing.
Blaming Tarsnap or people creating hard drives or other media is like when people use some old Frontpage and blame browsers that it's not supported, while IE (used by the majority!) does.
Everytime this question comes up, I like to point out that contrary to common belief, decimal prefixes (as in 1kB = 1000 bytes) are MUCH more commonly used than binary prefixes in the computer industry: http://blog.zorinaq.com/?e=35
Unit prefixes should be unambiguous: 1 GB = 1000 MB; 1 GiB = 2^10 MiB. People who say 1 GB = 1024 MB are wrong. Let's not perpetuate this slip in judgment (while acknowledging that many programs still do use this misbegotten system)
The units were pretty unambiguous for a long time. For a long time disk storage was measured in 2^10x, at least in operating systems, then hard disk manufacturers decided the way they reported disk size would be 10^3x. Obviously the prefixes having a different meaning in a different context is not perfect, but that's nothing new, for people in SI countries kilo is used for kilogram, any other meaning is hardly related. Calories are counted in kcal without ever mentioning the kilo prefix. A lot of the confusion in IT is from adding additional binary xxbi units that feel contrived and aren't very well accepted.
Maybe among IT people they were but, IT isn't everything. If you had shown 1kX to a physicist, he would've automatically assumed that you were talking about 10^3 times X no matter what X was. I actually live (and grew up) in an SI country and the system works beautifully. Sure, people say kilo to refer to 1kg, but this practice gets a huge amount of criticism from teachers over here, who keep insisting that kilo=1000.
I was in a shop looking for a camera SD card and they were all listed with capacities in "Gb". I thought it was unusual to see these sold by the gigabit. (Turned out they weren't.)
"Why is 1 GB equal to 10^9 bytes instead of 2^30?" Answer: it's not. Yes, I know that metric prefixes have official definitions; but inflammable means flammable. Language changes. A kilobyte has meant 2^10 bytes since the very first time anyone ever used the term, and it has continued to mean that in every case except for dishonest drive manufacturers and revisionist prescriptivists. Weight of usage ought to trump.
The idea of using different words to distinguish between real and fake gigabytes is a good one, but it makes the fundamental mistake of getting the "real" one backwards. It should be gigabytes and metric gigabytes, not gibibytes and gigabytes.
Does any operating system or commonly used program report file sizes in terms of 1 GB = 10^9 bytes? As far as I know they all use GiB and MiB, whether they call it that or not.
$ dd if=/dev/zero of=/dev/null bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.157131 s, 6.8 GB/s
'dd' interprets command-line arguments "M" and "K" as 2^20 and 2^10 respectively, but when it reports the total amount copied, it uses the decimal GB (10^9). It also uses decimal SI units in the average speed calculation.
My favorite file manager, Thunar, reports file sizes in decimal units. Not sure if it's distro-specific, though.
Many other programs, like 'df', have a switch (--si) that turns decimal units on or off.
I remember seeing a French Mac many years ago (pre-OS X). I don't know what kind of mega it used (there was no giga back then), but file sizes were reported as "Mo" rather than "MB", for "mega-octets", "octet" being the French word for byte. It sounded much classier to me.
Of course, aficionados of RFCs will know that "octet" is (or at least was) the preferred term for "byte" in IETF documents as well. I don't think this is because the RFC editor was secretly French, although i'm sure he was thoroughly classy; presumably it's because early RFCs were written in the era when the byte had not quite settled down at eight bits, and needed to be unambiguous.
Snow Leopard apparently, which was released in 2009. Ubuntu's policies on the subject seem to be about the same age.
I suppose I should've said that up until five years ago, nearly everyone was using powers of two for file sizes. And Windows/Mac never used the correct prefixes.
I don't think that giving the users something that _is_ wrong is a better solution...
Why not switching to GiB? Power users would understand, and others wouldn't even notice!
I don't think patio11 would be the right choice for this particular market.
cperciva certainly seems smart and capable enough to grow his business in that direction if he so chooses, without handing it over to a 'suit', even if that suit is well thought of for his bombing of HN.
If it helps, remember that patio11 leveraged "Bingo Card Creator" into a position from which he dispensed "expert" advice, and then from there, into paid business consulting gigs. It's his job to plaster the internet with advice, and HN was (and perhaps still is) one of his main advertising outlets.
Truly successful people are generally busy being successful; people who are telling you they're successful and that they can sell you advice to help you be successful too ought to send your alarm bells ringing.
The whole idea of white labeling it would be that we'd be trusting cperciva with the technical aspects under the hood, which he is Freakin' Awesome at, and patio11 with wrapping it in the kind of sales pitch / UI that I can show to a non-technical person and say "Yes, let me spend the company's money on this".
Someone as smart as cperciva can figure out how to "wrap it up in a sales pitch" if he wants it wrapped up in a sales pitch, even if that means hiring someone to do it.
It'll be authentic, and he won't give a disproportionate amount of the value of his creation to someone who has a much less complex job.
The two options aren't "the original owner releases a new layer on top of his service" and "no such layer can exist". That's the whole point of white-labeling, and I doubt cperciva would mind people releasing layers that use his service as a backend, in the same way he releases a layer that adds value to Amazon's storage.
He wouldn't be giving away any value, somebody else would be starting a company that happened to pay him for service.
> He wouldn't be giving away any value, somebody else would be starting a company that happened to pay him for service.
If he wouldn't be giving away value, then how could there be enough margin for said company to survive? The point of white-labeling is to let other people build solutions outside of your core competency, using your system effectively as a commodity.
For someone like cperciva, that would be giving away an awful lot of value to people whose sole contribution would be marketing and extremely high-level software development.
He should hire to fill this gap, not give away his margins to marketing suits.
You mean, like he did in the bit where he said: "Finally, even for RAM calling 2^30 bytes "1 GB" isn't really proper; instead, the IEC binary multiplier prefix "Gi-" should be used."
First, that doesn't refer to the parallel naming schemes, it only criticizes use of "GB" as an acronym.
Second, it's not specific enough -- the fact that two parallel measurement schemes are in effect is the answer to the question he asks in his title. That deserves more than an offhand, incomplete reference.
However, in common usage we now have the pretty awful scenario where, if a person says "a megabyte", the actual number of bytes they're talking about can change depending on where the data is stored! A megabyte of RAM is not a megabyte of hard disk space. The solution would be to standardise on binary prefixes, since we pretty much have to talk about RAM that way, and a consistent measurement across different media is eminently sensible.
The "hard drive maker conspiracy" story is driven by the fact that manufacturers have no real incentive to switch to binary prefixes, because that would make their drives look "smaller". Do you want the 300 gigabyte drive or the 279.4 gibibyte drive? Aside from the fact that hardly anyone knows that a gibibyte is, in the absence of any more information the larger number is probably better. Even worse, if you created a 300 gibibyte drive to compete with the 300 gigabyte drive, consumers would probably not realise that the 300 gibibyte drive is bigger. It's not exactly a conspiracy, but it is a suboptimal arrangement that results from the manufacturers' incentives.