> If the community is offering you a port to an architecture, whether it is 4 days old or 40 years old, that means the community actively wants to use your software on it – otherwise, nobody would put in the effort. Ports like this are hard, and authors like me already know we are fighting an uphill battle just trying to make upstream projects care.
I've had plenty of opensource contributions over the years for some feature or other I don't care about. I used to accept these pull requests. But all too often, the person who wrote the patch disappears. Then for years I receive bug reports about a feature I didn't write and I don't care about. What do I do with those reports? Ignore them? Fix the bugs myself? Bleh.
I don't publish opensource projects so I can be a volunteer maintainer, in perpetuity, for someone else's feature ideas.
If its a small change, or something I would have done myself eventually, then fine. But there is a very real maintenance burden that comes from maintaining support for weird features and rare computer architectures. As this article points out, you need to actively test on real hardware to make sure code doesn't rust. Unfortunately I don't have a pile of exotic computers around that I can use to test my software. And you need to test software constantly or there's a good chance you'll break something and not notice.
That said, is there an easy way to run software in "big endian" mode on any modern computer? I'd happily run my test suite in big endian mode if I could do so easily.
> What do I do with those reports? Ignore them? Fix the bugs myself? Bleh.
"I don't have access to a test environment, but if you want to write a fix, let me know and I may be able to point you in the right direction" is a perfectly reasonable response.
> That said, is there an easy way to run software in "big endian" mode on any modern computer?
QEMU userspace emulation is usually the easiest for most "normal" programs IMO. Once you set it up you just run the other architecture binaries like normal binaries on your test system (with no need to emulate a full system). Very much the same concept as Prism/Rosetta on Windows/macOS for running x86 apps on ARM systems except it can be any target QEMU supports.
The glaring omission from that long post is the term "opportunity cost".
Ensuring a code base indefinitely supports arbitrary architectures carries a substantial code architecture cost. Furthermore, it is difficult to guarantee testing going forward or that the toolchains available for those architectures will continue to evolve with your code base. I'm old enough to have lived this reality back when it was common. It sucked hard. I've also written a lot of code that was portable to some very weird silicon so I know what that entails. It goes far beyond endian-ness, that is just one aspect of silicon portability.
The expectation that people should volunteer their time for low ROI unpleasantness that has a high risk of being unmaintainable in the near future is unreasonable. There are many other facets of the code base where that time may be better invested. That's not "anti-portable", it is recognition of the potential cost to a large base of existing users when you take it on. The Pareto Principle applies here.
Today, I explicitly only support two architectures: 64-bit x86 and ARM (little-endian). It is wonderful that we have arrived at the point where this is a completely viable proposition. In most cases the cost of supporting marginal users on rare architectures in the year 2026 is not worth it. The computing world is far, far less fragmented than it used to be.
> Ensuring a code base indefinitely supports arbitrary architectures carries a substantial code architecture cost.
I'd say just the opposite; it nudges you towards well-factored approaches and ultimately carries a code architecture benefit, just like having automated tests or using structured programming.
For a relatively small set of dimensions this is true. But the more abstractions the code needs to accommodate, the trickier and more prone to leaky abstractions it becomes. Removing one axis of complexity can be incredibly helpful.
For the Ardour codebase (922k LoC at present, ignoring the vendored GTK/Gtkmm trees), we've found that every new architecture and new OS/platform that we've ported to has thrown up challenges and notably improved the code. That has included FPU/SIMD specializations too.
> Today, I explicitly only support two architectures: 64-bit x86 and ARM (little-endian). It is wonderful that we have arrived at the point where this is a completely viable proposition. In most cases the cost of supporting marginal users on rare architectures in the year 2026 is not worth it.
This - and efforts to reintroduce BE should be resisted in the same way as people who want to drive on the other side of the road for pure whimsy.
I note that we've mostly converged on one set of floating point semantics as well, although across a range of bit widths.
Wasm is in an awkward place, because Memory64 is widely but not universally supported. Which means that if you want to support Wasm, you probably have to support 32-bit environments in general. Depending on the project, that can be trivial, but it may also require you to rewrite a lot of low-level code in the project and its dependencies.
> It is usually easy to write code that is endian-safe. Any code that is not endian-safe is poorly written and harder to maintain at best, and possibly obscuring security bugs at worst. Any project maintainer should be jumping for joy when they receive a patch adding a big-endian port of their project, especially if it includes reports that tests pass and the software works. That is the sign of a codebase that has a level of sanity that should not be noteworthy, yet is.
And every single sentence is false.
The tower collapses once you remove any of the bases, let alone all of them.
Main issue I have with it is access to testing. It is extra work to be able to test big endian. I don’t want to think about big endian while writing code but it would be ok to do it if I could easily run tests in big endian.
If you want to keep software working on systems with a 9-bit byte or other weirdness, that's entirely on you. No one else needs or wants the extra complexity. Little endian is logical and won, big endian is backwards and lost for good reason. (Look at how arbitrary precision arithmetic is implemented on a BE system; chances are it's effectively LE anyway.)
You don't need the "Little endian is logical" part.
Most people just don't care and can't be bothered to spend time making sure code is "endian portable".
Couldn't care less if it is easier to "read in crash dumps" TBH.
I don't even write server code to be portable to anything other than x86_64 or lately just use avx512 without any fallback since it is not needed in practice.
I'm not doing anything people care about probably but I imagine it is a similar feeling for people that do.
I would rather have small software that compiles fast than to add 50 #ifdef into it and make it burn my eyes, and spend time thinking "but would this work in big endian"
BE is not logical in any way, it is just a tradition, like the use of decimal numbers.
The use of automatic computers has forced a transition from the use of arbitrary conventions that did not have any logical motivation to the most efficient methods of data representation, like binary numbers in little-endian format.
Little-endian is more efficient even when you compute by pen on paper, if it feels awkward that is just because you were taught differently as a child.
There are special circumstances when a big-endian representation is the right choice, e.g. when you interpret a bit string as a binary polynomial, in order to implement an error-detection code with a CRC. However, for general-purpose numbers, little-endian is the optimum choice.
No, BE is logical because it puts bits and bytes in the same order. That humans use BE is also nice but secondary to that. I don't have strong feelings about whether fifty-one thousand nine hundred sixty-six is written as 0xcafe or 0xefac, but I feel quite comfortable suggesting that 0xfeca is absurd. (FWIW, this is a weak argument for what computers should do; if LE is more efficient for machines then let them use it)
Edit: switched example to hex
Edit2: actually this is still slightly out of whack, but I don't feel like switching to binary so take it as a loose representation rather than literal
My contribution: largest-order-first (big endian) makes sense in real life because people tend to make quick judgements in unreliable situations. For example, take the announcement that you're receiving $132551 dollars. You wouldn't want to hear something like "Hello! You have been awarded one and fifty and five hundred and... and one hundred thousand dollars!", you want to hear "You have been awarded One hundred and thirty two thousand and ... dollars!" The largest sums change decisions dramatically so it makes sense they come first.
On computers however, we basically always use exact arithmetic and exact, fixed logic where learning the higher order doesn't help (we're not doing approximations and decisions based on incomplete information), in fact for mathematical reasons in the exact cases it's usually better to compute and utilize the lowest bits first (e.g. in the case of sums and multiplication algos I am familiar with). [note1]
Overall I'm slightly surprised some automatic/universal translation methods for the most common languages haven't been made, although I guess there may be some significant difficulties or impossibilities (for example, if you send a bunch of bits/bytes outside, there's no general way to predict the endianess it should be in). I suspect LLMs will make this task much easier (without a more traditional universal translation algorithm).
[note1] Also, the time required to receive all bits from say a 64b number as opposed to the first k bits tends to be a negligible or even 0 difference, in both human terms (receiving data over a network) and machine terms (receiving data over a bus; optimizing an algorithm that uses numbers in complicated ways; etc.), again different from human communication and thought.
> My contribution: largest-order-first (big endian) makes sense in real life because people tend to make quick judgements in unreliable situations. For example, take the announcement that you're receiving $132551 dollars. You wouldn't want to hear something like "Hello! You have been awarded one and fifty and five hundred and... and one hundred thousand dollars!", you want to hear "You have been awarded One hundred and thirty two thousand and ... dollars!" The largest sums change decisions dramatically so it makes sense they come first.
And yet in Arabic, the numbers are written in order from the least to the most significant digit, even if they are not really pronounced that way, starting from the numbers in the hundreds and up: "1234" is read as essentially "one thousand two hundred four-and-thirty", the same way the German does it. And yes, the order looks like it's the same as in e.g. English, but Arabic is written right to left. So, no, it's absolutely fine to write numbers in little endian even in the language that pronounces it the big-endian or even the mixed-endian way.
There are plenty of ways for language to be better now that we know far more about arithmetic than when number words were created.
"One Five Five Two Three One" is 6 words, 6 syllables long where as "One Hundred and Thirty Two Thousand" is 6 words, 9 syllables long and conveys less information. Even shortening it to "One Hundred Thirty Two Thousand" is still 5 words, 8 syllables long and conveys less information.
You can also easily convey high order digits first by using a unambiguous "and/add" construction: "Thousand Two Three One Add One Five Five". You have now conveyed the three high order digits in 5 words, 5 syllables. You also convey the full number in 9 words, 9 syllables in contrast to "One Hundred Thirty Two Thousand One Hundred Fifty Five" which is 9 words, 14 syllables.
You could go even further and express things in pseudo-scientific notation which would be even more general and close to as efficient. "Zero E Three (10^3) Two Three One" which is 6 words, 6 syllables, but no longer requires unique separator words like "Thousand", "Million", "Billion", etc. This shows even greater efficiency if you are conveying "One Hundred Thirty Thousand" which would be something more like "Zero E Four (10^4) Three One" since the scientific notation digit position description is highly uniform.
This distinction might seem somewhat arbitrary since this just seems like it is changing the order for the sake of things. However, the advantage of little-endian description is that it is non-contextual. When you say the number "One" it literally always means the one's place "One". If you wish to speak of a different positional "One" you would prefix it with the position e.g. "Zero E Three (10^3) One". In contrast, in the normal way of speaking numbers "One" could mean any positional one. Are you saying "One Hundred", "One Thousand", "One Hundred Million"? You need to wait for subsequent words to know what "One" is being said. Transcription must fundamentally buffer a significant fraction of the word stream to disambiguate.
It also results in the hilariously duplicative "One Hundred Thirty Two Thousand One Hundred Fifty Five" which has positional signifiers for basically every word. "One Hundred Thir-tyThousand One Hundred Fif-ty Five”. Fully 8 of the 14 syllables are used for positional disambiguation to reduce necessary lookahead. "And/Add" constructions get you that for a fraction of the word and syllable count. They allow arbitrary chunking since you can separate digit streams on any boundary. It also reinforces the fact that numbers are just composites of their components which may help with numeracy.
Little endian is actually just better in every respect, expect for compatibility and familiarity, if we use our modern robust knowledge of arithmetic to formulate the grammar rules.
> No, BE is logical because it puts bits and bytes in the same order.
This sounds confused. The "order" of bits is only an artifact of our human notation, not some inherent order. If you look at how an integer is implemented in hardware (say in a register or in combinational logic), you're not going to find the bits being reversed every byte.
> Okay, if you get everyone to write bits the other way I'll endorse LE as intuitive/logical.
You're still confused, unfortunately. (Note: In everything that follows, I'm just pretending "Arabic numerals" came from Arabic. The actual history is more complicated but irrelevant to my point, so let's go with that.)
First, you're confusing intuitive with logical. They are not the same thing. e.g, survivorship bias (look up the whole WWII plane thing) is unintuitive, but extremely logical.
Second, even arguing intuitiveness here doesn't really make sense, because the direction of writing numerals is itself intrinsically arbitrary. If our writing system was such that a million dollars was written as "000,000,1$", suddenly you wouldn't find big-endian any more intuitive.
In fact, if you were an Arabic speaker and your computer was in Arabic (right to left) rather than English (left to right), then your hex editor would display right-to-left on the screen, and you would already find little-endian intuitive!
In other words, the only reason you find this unintuitive is that you speak English, which is (by unfortunate historical luck) written in "big-endian" form! Note that this has nothing to do with being right-to-left but left-to-right, but rather with whether the place values increase or decrease in the same direction as the prose. In Arabic, place values increase in the direction of the prose, which makes little-endian entirely intuitive to an Arabic speaker!
To put it another way, arguing LE is unintuitive is like claiming something being right-handed is somehow more intuitive than left-handed. If that's true, it's because you're used to being right-handed, not because right-handedness itself is somehow genuinely more intuitive. (And neither of these has anything to do with one being more or less logical than the other.)
Until then, I want my bits and bytes notated uniformly.
AFAIK it was only IBM whose CPUs were consistently BE for both bit and byte order (i.e. bit 0 is also the most significant bit.) Every other CPU which is BE for bytes is still LE for bits (bit 0 least significant.)
> this is a weak argument for what computers should do; if LE is more efficient for machines then let them use it
Computers really don't care. Literally. Same number of gates either way. But for everything besides dumping it makes sense that the least significant byte and the least significant bit are numbered starting from zero. It makes intuitive mathematical sense.
Not only dumping, but yes I agree it only matters when humans are in the loop. My most annoying encounters with endianness was when writing and debugging assembly, and I assure you dumping memory was not the only pain point.
> Computers really don't care. Literally. Same number of gates either way.
Eh. That depends; the computer architectures used to be way weirder than what we have today. IBM 1401 used variable-length BCDs (written in big-endian); its version of BCDIC literally used numbers from 1 to 9 as digits "1" to "9" (number 0 was blank/space, and number 10 would print as "0"). So its ADD etc. instructions took pointers to the last digits of numbers added, and worked backwards; in fact, pretty much all of indexing on that machine moved backwards: MOV also worked from higher addresses down to lower ones, and so on.
> If someone has managed to make your software work on 32-bit, you should accept that port as it will help ensure your software remains efficient even on 64-bit systems. After all, if every program is required to fit in 4 GB, that means the 32 GB RAM workstation you are spoiled with can run a full 8 programs on it!
This is just so fundamentally wrong that it makes the whole rest of TFA highly suspect (and yes, most of the rest of it is also pretty wrong).
> and you refuse a community port to another architecture, you are doing a huge disservice to your community
Someone who has a computer that my software can't run on isn't in my community. If they really want to use the software, they have the option of: 1) get a different computer, or 2) maintain their own custom-special port of my software forever.
In other words, they have to JOIN the community if the want the BENEFITS of the community. It's not my job to extend my community to encompass every possible use case and hardware platform.
I think the tricky thing here is that I simply don't have the time, patience, or resources to maintain this stuff. Let's say I have a LE-only project. Someone ports it to work on BE. Now it needs ci for BE. I write a patch in the future and the BE tests fail. Now I need to fix them. Potential contributors need to get the tests to pass. Who's using BE, though? Is the original porter even still using it?
The author betrays their own point with the anecdote about 586 support: they had tests, the tests passed, but the emulator was buggy, masking the issue. Frankly, if you're the Linux kernel and nobody has the hardware to run the tests on an actual device, it says a lot. But it also shows that qemu is struggling to make it work if the emulation isn't working as it should. How is someone who runs a small project supposed to debug a BE issue when you might have to debug the emulator when a user report comes in?
For me, I'll always welcome folks engaging with my work. But I'll be hesitant to take on maintenance of anything that takes my attention away from delivering value to the overwhelming majority of my users, especially if the value of the effort disappears over time (e.g., because nobody is making those CPUs anymore).
The more portable and general your code is, the less use you can make of the hardware.
The article touches on the 32/64 bit split. A lot of the code I write nowadays doesn't run on 32 bit systems, not because it uses a lot of RAM, but because having an actually usable 64-bit address range enables you to write programs that you couldn't on 32 bit.
If you want to write code that works on big endian systems, systems where pointers aren't integers or bytes aren't 8 bits, all the power to you. I'm happy to pretend big endian is not a thing and focus my limited manpower on the hardware that my program will run on.
> Big endian systems store numbers the way us humans do: the largest number is written first.
Obviously the author was trying to just give a quick example to aid visualization, but here's some nitpicking: I can probably come up with at least IV writing systems used by humans that don't use "big endian" for numbers. Or either, really.
Examples: Tally marks, Ancient Egyptian numerals, Hebrew and Attic numerals, and obviously Roman numerals.
Also lots languages in written form order words somewhat... randomly (French, Danish, old English, ...).
The convention that smaller number written to the left of bigger numbers should be subtracted instead of added, is a later addition to the Roman numerals.
The earlier system would write VIIII instead of IX.
With the original Roman numerals, the order of writing was completely irrelevant, because all parts were added and addition is comutative, so VIIII is the same as IIIIV or IIIVI.
Even in the later variant of Roman numerals, you can change the order of many symbols without changing the value.
> In closing, let me reiterate this point so it is crystal clear. If you are a maintainer of a libre software project and you refuse a community port to another architecture, you are doing a huge disservice to your community and to your software’s overall quality. As the Linux kernel has demonstrated, you can accept new ports, and deprecate old ports, as community demands and interest waxes and wanes.
Every feature has a cost and port to a different architecture has a huge cost in ongoing maintenance and testing.
This is open source. The maintainer isn’t refusing a port. The maintainer is refusing to accept being a maintainer for that port.
A person is always free to fork the open source project and maintain the port themselves as a fork.
In my experience, as someone who has gone through this as maintainer of two decent sized projects, that simply doesn't work.
The author of the 'port' probably doesn't know your whole codebase like you, so they are going to need help to get their code polished and merged.
For endian issues, the bugs are often subtle and can occur in strange places (it's hard to grep for 'someone somewhere made an endian assumption'), so you often get dragged into debugging.
Now let's imagine we get everything working, CI set up, I make a PR which breaks the big-endian build. My options are:
1) Start fixing endian bugs myself -- I have other stuff to do!
2) Wait for my 'endian maintainer' to find and fix the bug -- might take weeks, they have other stuff to do!
3) Just disable the endian tests in CI, eventually someone will come complain, maybe a debian packager.
At the end of the day I have finite hours on this earth, and there are just so few big endian users -- I often think there are more packagers who want to make software work on their machine in a kind of 'pokemon-style gotta catch em all', than actual users.
That should have been true, but unfortunately the most popular programming languages do not have distinct data types for bit strings, non-negative numbers, integer residues a.k.a. modular numbers, binary polynomials and binary polynomial residues.
So in C and the like one uses "unsigned" regardless if bit strings or non-negative numbers are needed.
Because no explicit conversions between bit strings and numeric types are used, it is frequent to have expressions where a certain endianness of the numbers is assumed implicitly.
This is the most frequent source of bugs that manifest when a program is run on a machine with an opposite endianness than assumed in the program.
This. In the specific case of endianness, if you have bugs with it you're probably already doing something wrong. But in general, supporting weird architectures is not something that should be expected to be foisted on any arbitrary projects, especially if the person who does the initial port just disappears again.
I don’t have an opinion either way on this authors belief around the port being accepted upstream or not.
I did however learn a lot googling some of the terms they dropped and finding out things like PowerPC architecture getting and update as recently as 2025.
Several of their references I knew from my first tech leads mentioning their own early career. I am surprised at how much still has active development.
> In closing, let me reiterate this point so it is crystal clear. If you are a maintainer of a libre software project and you refuse a community port to another architecture, you are doing a huge disservice to your community and to your software’s overall quality.
> For those who don’t know, endianness is simply how the computer stores numbers. Big endian systems store numbers the way us humans do: the largest number is written first.
Really, what's first? You're so keen on having the big end first, but when it comes to looking at memory, you look... starting at the little end of memory first??? What's up with that?
> I happen to prefer big endian systems in my own development life because they are easier for me to work with, especially reading crash dumps.
It always comes back to this. But that's not a good rationale for either the inconsistency of mixed-endianness where the least significant bit is zero but the most significant byte is zero, or true big endianness, where the least significant bit of a number might be a bit numbered 7 or numbered 15, or even 31 or 63, depending on what size integer it is.
> (Porting to different endianness can help catch obscure bugs.)
Yeah, I'm sure using 9 bit bytes would catch bugs, too, but nobody does that either.
I'm glad I did my undergrad at UC Davis in the mid '00s that valued portability and well-defined behavior rather than proprietary or implementation-specific, non-portable assumptions. The lazy, rationalizing throwing out the baby with the bathwater folks are disappointments to engineering excellence.
It's very hard in most language to portably handle endian-ness -- almost by definition, if your code has an issue where endian-ness effects behavior, it's not-portable.
I tend to take another view point (while I understand yours) -- if it's not tested, it doesn't work. And nowadays it's really hard to test big-endian code. I don't have one, I find running different-endian software in QEMU really annoying and difficult.
> If the community is offering you a port to an architecture, whether it is 4 days old or 40 years old, that means the community actively wants to use your software on it – otherwise, nobody would put in the effort. Ports like this are hard, and authors like me already know we are fighting an uphill battle just trying to make upstream projects care.
I've had plenty of opensource contributions over the years for some feature or other I don't care about. I used to accept these pull requests. But all too often, the person who wrote the patch disappears. Then for years I receive bug reports about a feature I didn't write and I don't care about. What do I do with those reports? Ignore them? Fix the bugs myself? Bleh.
I don't publish opensource projects so I can be a volunteer maintainer, in perpetuity, for someone else's feature ideas.
If its a small change, or something I would have done myself eventually, then fine. But there is a very real maintenance burden that comes from maintaining support for weird features and rare computer architectures. As this article points out, you need to actively test on real hardware to make sure code doesn't rust. Unfortunately I don't have a pile of exotic computers around that I can use to test my software. And you need to test software constantly or there's a good chance you'll break something and not notice.
That said, is there an easy way to run software in "big endian" mode on any modern computer? I'd happily run my test suite in big endian mode if I could do so easily.
> What do I do with those reports? Ignore them? Fix the bugs myself? Bleh.
"I don't have access to a test environment, but if you want to write a fix, let me know and I may be able to point you in the right direction" is a perfectly reasonable response.
> That said, is there an easy way to run software in "big endian" mode on any modern computer?
QEMU userspace emulation is usually the easiest for most "normal" programs IMO. Once you set it up you just run the other architecture binaries like normal binaries on your test system (with no need to emulate a full system). Very much the same concept as Prism/Rosetta on Windows/macOS for running x86 apps on ARM systems except it can be any target QEMU supports.
A post about big-endian testing with QEMU was posted on HN just a few days ago:
https://news.ycombinator.com/item?id=47626462
https://www.hanshq.net/big-endian-qemu.html
The glaring omission from that long post is the term "opportunity cost".
Ensuring a code base indefinitely supports arbitrary architectures carries a substantial code architecture cost. Furthermore, it is difficult to guarantee testing going forward or that the toolchains available for those architectures will continue to evolve with your code base. I'm old enough to have lived this reality back when it was common. It sucked hard. I've also written a lot of code that was portable to some very weird silicon so I know what that entails. It goes far beyond endian-ness, that is just one aspect of silicon portability.
The expectation that people should volunteer their time for low ROI unpleasantness that has a high risk of being unmaintainable in the near future is unreasonable. There are many other facets of the code base where that time may be better invested. That's not "anti-portable", it is recognition of the potential cost to a large base of existing users when you take it on. The Pareto Principle applies here.
Today, I explicitly only support two architectures: 64-bit x86 and ARM (little-endian). It is wonderful that we have arrived at the point where this is a completely viable proposition. In most cases the cost of supporting marginal users on rare architectures in the year 2026 is not worth it. The computing world is far, far less fragmented than it used to be.
> Ensuring a code base indefinitely supports arbitrary architectures carries a substantial code architecture cost.
I'd say just the opposite; it nudges you towards well-factored approaches and ultimately carries a code architecture benefit, just like having automated tests or using structured programming.
For a relatively small set of dimensions this is true. But the more abstractions the code needs to accommodate, the trickier and more prone to leaky abstractions it becomes. Removing one axis of complexity can be incredibly helpful.
For the Ardour codebase (922k LoC at present, ignoring the vendored GTK/Gtkmm trees), we've found that every new architecture and new OS/platform that we've ported to has thrown up challenges and notably improved the code. That has included FPU/SIMD specializations too.
> Today, I explicitly only support two architectures: 64-bit x86 and ARM (little-endian). It is wonderful that we have arrived at the point where this is a completely viable proposition. In most cases the cost of supporting marginal users on rare architectures in the year 2026 is not worth it.
This - and efforts to reintroduce BE should be resisted in the same way as people who want to drive on the other side of the road for pure whimsy.
I note that we've mostly converged on one set of floating point semantics as well, although across a range of bit widths.
Why not wasm?
Wasm is in an awkward place, because Memory64 is widely but not universally supported. Which means that if you want to support Wasm, you probably have to support 32-bit environments in general. Depending on the project, that can be trivial, but it may also require you to rewrite a lot of low-level code in the project and its dependencies.
[flagged]
The whole thing rests on these assertions:
> It is usually easy to write code that is endian-safe. Any code that is not endian-safe is poorly written and harder to maintain at best, and possibly obscuring security bugs at worst. Any project maintainer should be jumping for joy when they receive a patch adding a big-endian port of their project, especially if it includes reports that tests pass and the software works. That is the sign of a codebase that has a level of sanity that should not be noteworthy, yet is.
And every single sentence is false.
The tower collapses once you remove any of the bases, let alone all of them.
Main issue I have with it is access to testing. It is extra work to be able to test big endian. I don’t want to think about big endian while writing code but it would be ok to do it if I could easily run tests in big endian.
If you want to keep software working on systems with a 9-bit byte or other weirdness, that's entirely on you. No one else needs or wants the extra complexity. Little endian is logical and won, big endian is backwards and lost for good reason. (Look at how arbitrary precision arithmetic is implemented on a BE system; chances are it's effectively LE anyway.)
You don't need the "Little endian is logical" part.
Most people just don't care and can't be bothered to spend time making sure code is "endian portable".
Couldn't care less if it is easier to "read in crash dumps" TBH.
I don't even write server code to be portable to anything other than x86_64 or lately just use avx512 without any fallback since it is not needed in practice.
I'm not doing anything people care about probably but I imagine it is a similar feeling for people that do.
I would rather have small software that compiles fast than to add 50 #ifdef into it and make it burn my eyes, and spend time thinking "but would this work in big endian"
> Little endian is logical and won, big endian is backwards and lost for good reason.
No, BE is logical, but LE is efficient (for machines).
BE is not logical in any way, it is just a tradition, like the use of decimal numbers.
The use of automatic computers has forced a transition from the use of arbitrary conventions that did not have any logical motivation to the most efficient methods of data representation, like binary numbers in little-endian format.
Little-endian is more efficient even when you compute by pen on paper, if it feels awkward that is just because you were taught differently as a child.
There are special circumstances when a big-endian representation is the right choice, e.g. when you interpret a bit string as a binary polynomial, in order to implement an error-detection code with a CRC. However, for general-purpose numbers, little-endian is the optimum choice.
No, BE is intuitive for humans who write digits with the highest power on the left.
LE is logical which is also why it is more efficient and more intuitive for humans once they get past “how we write numbers with a pencil”.
No, BE is logical because it puts bits and bytes in the same order. That humans use BE is also nice but secondary to that. I don't have strong feelings about whether fifty-one thousand nine hundred sixty-six is written as 0xcafe or 0xefac, but I feel quite comfortable suggesting that 0xfeca is absurd. (FWIW, this is a weak argument for what computers should do; if LE is more efficient for machines then let them use it)
Edit: switched example to hex
Edit2: actually this is still slightly out of whack, but I don't feel like switching to binary so take it as a loose representation rather than literal
My contribution: largest-order-first (big endian) makes sense in real life because people tend to make quick judgements in unreliable situations. For example, take the announcement that you're receiving $132551 dollars. You wouldn't want to hear something like "Hello! You have been awarded one and fifty and five hundred and... and one hundred thousand dollars!", you want to hear "You have been awarded One hundred and thirty two thousand and ... dollars!" The largest sums change decisions dramatically so it makes sense they come first.
On computers however, we basically always use exact arithmetic and exact, fixed logic where learning the higher order doesn't help (we're not doing approximations and decisions based on incomplete information), in fact for mathematical reasons in the exact cases it's usually better to compute and utilize the lowest bits first (e.g. in the case of sums and multiplication algos I am familiar with). [note1]
Overall I'm slightly surprised some automatic/universal translation methods for the most common languages haven't been made, although I guess there may be some significant difficulties or impossibilities (for example, if you send a bunch of bits/bytes outside, there's no general way to predict the endianess it should be in). I suspect LLMs will make this task much easier (without a more traditional universal translation algorithm).
[note1] Also, the time required to receive all bits from say a 64b number as opposed to the first k bits tends to be a negligible or even 0 difference, in both human terms (receiving data over a network) and machine terms (receiving data over a bus; optimizing an algorithm that uses numbers in complicated ways; etc.), again different from human communication and thought.
> My contribution: largest-order-first (big endian) makes sense in real life because people tend to make quick judgements in unreliable situations. For example, take the announcement that you're receiving $132551 dollars. You wouldn't want to hear something like "Hello! You have been awarded one and fifty and five hundred and... and one hundred thousand dollars!", you want to hear "You have been awarded One hundred and thirty two thousand and ... dollars!" The largest sums change decisions dramatically so it makes sense they come first.
And yet in Arabic, the numbers are written in order from the least to the most significant digit, even if they are not really pronounced that way, starting from the numbers in the hundreds and up: "1234" is read as essentially "one thousand two hundred four-and-thirty", the same way the German does it. And yes, the order looks like it's the same as in e.g. English, but Arabic is written right to left. So, no, it's absolutely fine to write numbers in little endian even in the language that pronounces it the big-endian or even the mixed-endian way.
There are plenty of ways for language to be better now that we know far more about arithmetic than when number words were created.
"One Five Five Two Three One" is 6 words, 6 syllables long where as "One Hundred and Thirty Two Thousand" is 6 words, 9 syllables long and conveys less information. Even shortening it to "One Hundred Thirty Two Thousand" is still 5 words, 8 syllables long and conveys less information.
You can also easily convey high order digits first by using a unambiguous "and/add" construction: "Thousand Two Three One Add One Five Five". You have now conveyed the three high order digits in 5 words, 5 syllables. You also convey the full number in 9 words, 9 syllables in contrast to "One Hundred Thirty Two Thousand One Hundred Fifty Five" which is 9 words, 14 syllables.
You could go even further and express things in pseudo-scientific notation which would be even more general and close to as efficient. "Zero E Three (10^3) Two Three One" which is 6 words, 6 syllables, but no longer requires unique separator words like "Thousand", "Million", "Billion", etc. This shows even greater efficiency if you are conveying "One Hundred Thirty Thousand" which would be something more like "Zero E Four (10^4) Three One" since the scientific notation digit position description is highly uniform.
This distinction might seem somewhat arbitrary since this just seems like it is changing the order for the sake of things. However, the advantage of little-endian description is that it is non-contextual. When you say the number "One" it literally always means the one's place "One". If you wish to speak of a different positional "One" you would prefix it with the position e.g. "Zero E Three (10^3) One". In contrast, in the normal way of speaking numbers "One" could mean any positional one. Are you saying "One Hundred", "One Thousand", "One Hundred Million"? You need to wait for subsequent words to know what "One" is being said. Transcription must fundamentally buffer a significant fraction of the word stream to disambiguate.
It also results in the hilariously duplicative "One Hundred Thirty Two Thousand One Hundred Fifty Five" which has positional signifiers for basically every word. "One Hundred Thir-ty Thousand One Hundred Fif-ty Five”. Fully 8 of the 14 syllables are used for positional disambiguation to reduce necessary lookahead. "And/Add" constructions get you that for a fraction of the word and syllable count. They allow arbitrary chunking since you can separate digit streams on any boundary. It also reinforces the fact that numbers are just composites of their components which may help with numeracy.
Little endian is actually just better in every respect, expect for compatibility and familiarity, if we use our modern robust knowledge of arithmetic to formulate the grammar rules.
> No, BE is logical because it puts bits and bytes in the same order.
This sounds confused. The "order" of bits is only an artifact of our human notation, not some inherent order. If you look at how an integer is implemented in hardware (say in a register or in combinational logic), you're not going to find the bits being reversed every byte.
Okay, if you get everyone to write bits the other way I'll endorse LE as intuitive/logical. Until then, I want my bits and bytes notated uniformly.
> Okay, if you get everyone to write bits the other way I'll endorse LE as intuitive/logical.
You're still confused, unfortunately. (Note: In everything that follows, I'm just pretending "Arabic numerals" came from Arabic. The actual history is more complicated but irrelevant to my point, so let's go with that.)
First, you're confusing intuitive with logical. They are not the same thing. e.g, survivorship bias (look up the whole WWII plane thing) is unintuitive, but extremely logical.
Second, even arguing intuitiveness here doesn't really make sense, because the direction of writing numerals is itself intrinsically arbitrary. If our writing system was such that a million dollars was written as "000,000,1$", suddenly you wouldn't find big-endian any more intuitive.
In fact, if you were an Arabic speaker and your computer was in Arabic (right to left) rather than English (left to right), then your hex editor would display right-to-left on the screen, and you would already find little-endian intuitive!
In other words, the only reason you find this unintuitive is that you speak English, which is (by unfortunate historical luck) written in "big-endian" form! Note that this has nothing to do with being right-to-left but left-to-right, but rather with whether the place values increase or decrease in the same direction as the prose. In Arabic, place values increase in the direction of the prose, which makes little-endian entirely intuitive to an Arabic speaker!
To put it another way, arguing LE is unintuitive is like claiming something being right-handed is somehow more intuitive than left-handed. If that's true, it's because you're used to being right-handed, not because right-handedness itself is somehow genuinely more intuitive. (And neither of these has anything to do with one being more or less logical than the other.)
Until then, I want my bits and bytes notated uniformly.
AFAIK it was only IBM whose CPUs were consistently BE for both bit and byte order (i.e. bit 0 is also the most significant bit.) Every other CPU which is BE for bytes is still LE for bits (bit 0 least significant.)
Your example is only for dumping memory.
> this is a weak argument for what computers should do; if LE is more efficient for machines then let them use it
Computers really don't care. Literally. Same number of gates either way. But for everything besides dumping it makes sense that the least significant byte and the least significant bit are numbered starting from zero. It makes intuitive mathematical sense.
Same number of gates either way
Definitely not, which is why many 8-bit CPUs are LE. Carries propagate upwards, and incrementers are cheaper than a length-dependent subtraction.
Not only dumping, but yes I agree it only matters when humans are in the loop. My most annoying encounters with endianness was when writing and debugging assembly, and I assure you dumping memory was not the only pain point.
> Computers really don't care. Literally. Same number of gates either way.
Eh. That depends; the computer architectures used to be way weirder than what we have today. IBM 1401 used variable-length BCDs (written in big-endian); its version of BCDIC literally used numbers from 1 to 9 as digits "1" to "9" (number 0 was blank/space, and number 10 would print as "0"). So its ADD etc. instructions took pointers to the last digits of numbers added, and worked backwards; in fact, pretty much all of indexing on that machine moved backwards: MOV also worked from higher addresses down to lower ones, and so on.
> BE is intuitive for humans who write digits with the highest power on the left.
But only because when they dump memory, they start with the lowest address, lol.
Why don't these people reverse numberlines and cartesian coordinate systems while they're at it?
A lot of graphics APIs do actually reverse the y-coordinate for historical reasons.
LE is not "logical", it won because the IBM PC compatible won, simple as that.
> If someone has managed to make your software work on 32-bit, you should accept that port as it will help ensure your software remains efficient even on 64-bit systems. After all, if every program is required to fit in 4 GB, that means the 32 GB RAM workstation you are spoiled with can run a full 8 programs on it!
This is just so fundamentally wrong that it makes the whole rest of TFA highly suspect (and yes, most of the rest of it is also pretty wrong).
> and you refuse a community port to another architecture, you are doing a huge disservice to your community
Someone who has a computer that my software can't run on isn't in my community. If they really want to use the software, they have the option of: 1) get a different computer, or 2) maintain their own custom-special port of my software forever.
In other words, they have to JOIN the community if the want the BENEFITS of the community. It's not my job to extend my community to encompass every possible use case and hardware platform.
> I happen to prefer big endian systems in my own development life because they are easier for me to work with, especially reading crash dumps.
If hex editors were mirrored both left to right and right to left, would it be easier to read little endian dumps?
xxd has the `-e` option for exactly this use case:
I think the tricky thing here is that I simply don't have the time, patience, or resources to maintain this stuff. Let's say I have a LE-only project. Someone ports it to work on BE. Now it needs ci for BE. I write a patch in the future and the BE tests fail. Now I need to fix them. Potential contributors need to get the tests to pass. Who's using BE, though? Is the original porter even still using it?
The author betrays their own point with the anecdote about 586 support: they had tests, the tests passed, but the emulator was buggy, masking the issue. Frankly, if you're the Linux kernel and nobody has the hardware to run the tests on an actual device, it says a lot. But it also shows that qemu is struggling to make it work if the emulation isn't working as it should. How is someone who runs a small project supposed to debug a BE issue when you might have to debug the emulator when a user report comes in?
For me, I'll always welcome folks engaging with my work. But I'll be hesitant to take on maintenance of anything that takes my attention away from delivering value to the overwhelming majority of my users, especially if the value of the effort disappears over time (e.g., because nobody is making those CPUs anymore).
The more portable and general your code is, the less use you can make of the hardware.
The article touches on the 32/64 bit split. A lot of the code I write nowadays doesn't run on 32 bit systems, not because it uses a lot of RAM, but because having an actually usable 64-bit address range enables you to write programs that you couldn't on 32 bit.
If you want to write code that works on big endian systems, systems where pointers aren't integers or bytes aren't 8 bits, all the power to you. I'm happy to pretend big endian is not a thing and focus my limited manpower on the hardware that my program will run on.
Small indian. Starting at zero and extending is logical. Big indian is dumb and confusing.
> Big endian systems store numbers the way us humans do: the largest number is written first.
Obviously the author was trying to just give a quick example to aid visualization, but here's some nitpicking: I can probably come up with at least IV writing systems used by humans that don't use "big endian" for numbers. Or either, really.
Examples: Tally marks, Ancient Egyptian numerals, Hebrew and Attic numerals, and obviously Roman numerals.
Also lots languages in written form order words somewhat... randomly (French, Danish, old English, ...).
Roman numerals are big endian though. The current year is written as MMXXVI, not IVXXMM.
The convention that smaller number written to the left of bigger numbers should be subtracted instead of added, is a later addition to the Roman numerals.
The earlier system would write VIIII instead of IX.
With the original Roman numerals, the order of writing was completely irrelevant, because all parts were added and addition is comutative, so VIIII is the same as IIIIV or IIIVI.
Even in the later variant of Roman numerals, you can change the order of many symbols without changing the value.
> In closing, let me reiterate this point so it is crystal clear. If you are a maintainer of a libre software project and you refuse a community port to another architecture, you are doing a huge disservice to your community and to your software’s overall quality. As the Linux kernel has demonstrated, you can accept new ports, and deprecate old ports, as community demands and interest waxes and wanes.
Every feature has a cost and port to a different architecture has a huge cost in ongoing maintenance and testing.
This is open source. The maintainer isn’t refusing a port. The maintainer is refusing to accept being a maintainer for that port.
A person is always free to fork the open source project and maintain the port themselves as a fork.
Hmm, if the author of the port cares, why won't the author of the port become a maintainer of that port? This should be a two-way street.
In my experience, as someone who has gone through this as maintainer of two decent sized projects, that simply doesn't work.
The author of the 'port' probably doesn't know your whole codebase like you, so they are going to need help to get their code polished and merged.
For endian issues, the bugs are often subtle and can occur in strange places (it's hard to grep for 'someone somewhere made an endian assumption'), so you often get dragged into debugging.
Now let's imagine we get everything working, CI set up, I make a PR which breaks the big-endian build. My options are:
1) Start fixing endian bugs myself -- I have other stuff to do!
2) Wait for my 'endian maintainer' to find and fix the bug -- might take weeks, they have other stuff to do!
3) Just disable the endian tests in CI, eventually someone will come complain, maybe a debian packager.
At the end of the day I have finite hours on this earth, and there are just so few big endian users -- I often think there are more packagers who want to make software work on their machine in a kind of 'pokemon-style gotta catch em all', than actual users.
Reminder that the computer's endianness shouldn't matter. You should only care about the endianness of the streams your reading from and writing to.
https://commandcenter.blogspot.com/2012/04/byte-order-fallac...
That should have been true, but unfortunately the most popular programming languages do not have distinct data types for bit strings, non-negative numbers, integer residues a.k.a. modular numbers, binary polynomials and binary polynomial residues.
So in C and the like one uses "unsigned" regardless if bit strings or non-negative numbers are needed.
Because no explicit conversions between bit strings and numeric types are used, it is frequent to have expressions where a certain endianness of the numbers is assumed implicitly.
This is the most frequent source of bugs that manifest when a program is run on a machine with an opposite endianness than assumed in the program.
This. In the specific case of endianness, if you have bugs with it you're probably already doing something wrong. But in general, supporting weird architectures is not something that should be expected to be foisted on any arbitrary projects, especially if the person who does the initial port just disappears again.
I don’t have an opinion either way on this authors belief around the port being accepted upstream or not.
I did however learn a lot googling some of the terms they dropped and finding out things like PowerPC architecture getting and update as recently as 2025.
Several of their references I knew from my first tech leads mentioning their own early career. I am surprised at how much still has active development.
> In closing, let me reiterate this point so it is crystal clear. If you are a maintainer of a libre software project and you refuse a community port to another architecture, you are doing a huge disservice to your community and to your software’s overall quality.
Linus Torvalds disagrees. Vehemently.
https://www.phoronix.com/news/Torvalds-No-RISC-V-BE
> For those who don’t know, endianness is simply how the computer stores numbers. Big endian systems store numbers the way us humans do: the largest number is written first.
Really, what's first? You're so keen on having the big end first, but when it comes to looking at memory, you look... starting at the little end of memory first??? What's up with that?
> I happen to prefer big endian systems in my own development life because they are easier for me to work with, especially reading crash dumps.
It always comes back to this. But that's not a good rationale for either the inconsistency of mixed-endianness where the least significant bit is zero but the most significant byte is zero, or true big endianness, where the least significant bit of a number might be a bit numbered 7 or numbered 15, or even 31 or 63, depending on what size integer it is.
> (Porting to different endianness can help catch obscure bugs.)
Yeah, I'm sure using 9 bit bytes would catch bugs, too, but nobody does that either.
BE was a huge mistake. Arabic numerals originated in a right-to-left language too.
depending on what size integer it is
That's the worst part about BE: values that have a size-dependent term in them, in addition to a subtraction. 2^n vs. 2^(l-n) and 256^N vs 256^(L-N).
According to Linus, BE has been "effectively dead" for at least a decade: https://news.ycombinator.com/item?id=9451284
Arabic numerals originated in India, were languages are written left to right.
I'm glad I did my undergrad at UC Davis in the mid '00s that valued portability and well-defined behavior rather than proprietary or implementation-specific, non-portable assumptions. The lazy, rationalizing throwing out the baby with the bathwater folks are disappointments to engineering excellence.
It's very hard in most language to portably handle endian-ness -- almost by definition, if your code has an issue where endian-ness effects behavior, it's not-portable.
I tend to take another view point (while I understand yours) -- if it's not tested, it doesn't work. And nowadays it's really hard to test big-endian code. I don't have one, I find running different-endian software in QEMU really annoying and difficult.
[flagged]
[flagged]