Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Network protocols specify big endian for the order of transmission

Only in the parts specified by the protocol (headers etc). I encourage everyone sending data over network in a novel way to just use little-endian.



That's not a good advice. Only if the sender and receiver are guaranteed to be running on little endian architecture you can make such a claim. A better advice is to always consider the endian-ness when designing protocols and have a strategy to handle it.


How is that "better advice"? Big endian architectures are pretty much dead (x86, ARM and RiscV are all little endian; some ARM chips are bi-endian, but not Apple's) and there's no discernible compelling advantage that would allow a comeback. You absolutely want to specify the byte order in new protocols as little endian.


> and there's no discernible compelling advantage

Actually, for wire encodings, there is, although I've (I-think-)literally never seen any proponent of big endian bring it up (versus the bullshit "it's human-readable" nonsense[0][1]): big endian encodings of unsigned numbers have lexicographic order that matches their numeric order.

The most obvious concrete example of why this is useful is a keys-sorted encoding of a hash table: if you encode keys in size-type-value format, you can check sortedness by lexicographic order of type-value strings (which means you can add new types without old software needing to know how to compare them), and you'll get integer keys in inspection-friendly numeric order rather than semi-random order. (Encoding negative numbers with a type id of T_UINT-1 lets you extend this to them as well.)

At a more abstract level, where (zero-padded) little-endian numbers have the same value at different granularities, this means that big-endian numbers have invariant lexicographic order at different granularities: two strings viewed as bits, bytes, or uint32s are consistently in the same order.

You can kind of use reverse-lexicographic order for some of this, but there are obvious problems with sending data in value-type order rather than type-value, so forward-lexicographic tends to be strongly enforced.

0: "You mean for arabic numerals, except not actual arabic numerals, because Arabic is written right-to-left, so the numbers are little-endian there, but ended up big endian because they stayed least-signifiant-digit-right rather than least-signifiant-digit-first when imported into Latin."

1: "Also, so (supposedly) is decimal and sign-magnitude, but we've (agonizingly slowly) learned that those aren't good ideas."


There are still big-endian-only ARM chips out there, and some of them are basically the only processors in their class (TMS570 in particular); while I wish they were bi-endian, big endian platforms are still alive and well.


There's still way, way more little-endian platforms, and it's not likely to change in the future. So it makes sense to use the representation that's most efficient for them when designing protocols.


The efficiency difference is negligible; the cost of an in-register byte swap is basically zero compared to the cost of getting the word from memory into the register to begin with. The bigger deal is making sure that people remember that we /are/ still in a bi-endian world, and writing protocols and code with that in mind.


The performance cost in isolation is not high, but the benefits of being able or reinterpret cast the buffer you got form the network and access it directly without any intermediate preprocessing is high. Most programming languages do not allow accessing fields in an endian agnostic way, you need accessors and a lot of custom code, so the code complexity benefits of ignoring the issue are great.


There are practically no big endian architectures anymore. Little endian is a sensible default. The weird architectures should bare the burden of complexity.


There are still big-endian-only ARM chips out there, and some of them are basically the only processors in their class (TMS570 in particular); while I wish they were bi-endian, big endian platforms are still alive and well.


That’s just a waste of cycles on encoding and decoding, as most/all senders and receivers are LE nowadays.


ARM big endian, so you're wrong, and it takes literally nanoseconds to byte swap.


Can you provide any numbers showing that more than 0.1% of networked ARM devices are big-endian? Pretty much all modern consumer facing ARM devices (including anything made by Apple as well as Android phones, as well as Nintendo) are little endian either exclusively or per standard configuration.


ARM supports both big & little endian, but basically everyone only uses little endian on ARM.


Not true. Most are setup to run little endian but support both.


I think they mean for any new protocol, defining it as little-endian.


You could call it the krowten byte order.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: