Are binaries portable across different CPU architectures?

Question

My goal is to be able to develop for embedded Linux. I have experience on bare-metal embedded systems using ARM.

I have some general questions about developing for different cpu targets. My questions are as below:

If I have an application compiled to run on a 'x86 target, linux OS version x.y.z', can I just run the same compiled binary on another system 'ARM target, linux OS version x.y.z'?
If above is not true, the only way is to get the application source code to rebuild/recompile using the relevant toolchain 'for example, arm-linux-gnueabi'?
Similarly, if I have a loadable kernel module (device driver) that works on a 'x86 target, linux OS version x.y.z', can I just load/use the same compiled .ko on another system 'ARM target, linux OS version x.y.z'?
If above is not true, the only way is to get the driver source code to rebuild/recompile using the relevant toolchain 'for example, arm-linux-gnueabi'?

It helps to realize that we don't have an AMD target and an Intel target, just a single x86 target for both. That is because Intel and AMD are sufficiently compatible. It then becomes obvious that the ARM target exists for a specific reason, namely because ARM CPU's aren't compatible with Intel/AMD/x86. — MSalters, Jul 26 '16 at 12:13
No, unless it's bytecode designed to run on a portable runtime environment like the Java Runtime. If you're writing code for embedded use, your code will likely rely on low-level processor-specific optimizations or features and will be very difficult to port, requiring more than just compilation for the target platform (e.g. assembly code changes, possibly rewriting several modules or the entire program). — bwDraco, Jul 26 '16 at 19:49
@MSalters: Actually, we do have an AMD target: amd64 which is often labeled x86-64 (while x86 is usually a re-labelling of i386). Fortunately Intel copied (and later expanded) the AMD architecture so any 64 bit x86 can run amd64 binaries. — slebetman, Jul 27 '16 at 05:04

Elizafox · Accepted Answer · 2016-07-26T17:27:58.380

No. Binaries must be (re)compiled for the target architecture, and Linux offers nothing like fat binaries out of the box. The reason is because the code is compiled to machine code for a specific architecture, and machine code is very different between most processor families (ARM and x86 for instance are very different).

EDIT: it is worth noting that some architectures offer levels of backwards compatibility (and even rarer, compatibility with other architectures); on 64-bit CPU's, it's common to have backwards compatibility to 32-bit editions (but remember: your dependent libraries must also be 32-bit, including your C standard library, unless you statically link). Also worth mentioning is Itanium, where it was possible to run x86 code (32-bit only), albeit very slowly; the poor execution speed of x86 code was at least part of the reason it wasn't very successful in the market.

Bear in mind that you still cannot use binaries compiled with newer instructions on older CPU's, even in compatibility modes (for example, you cannot use AVX in a 32-bit binary on Nehalem x86 processors; the CPU just doesn't support it.

Note that kernel modules must be compiled for the relevant architecture; in addition, 32-bit kernel modules will not work on 64-bit kernels or vice versa.

For information on cross-compiling binaries (so you don't have to have a toolchain on the target ARM device), see grochmal's comprehensive answer below.

It may be worth clarifying about any compatibility (or lack thereof) between x86 and x64, given that some x86 binaries can run on x64 platforms. (I'm not sure this is the case on Linux, but it is on Windows, for instance.) — jpmc26, Jul 26 '16 at 14:20
@jpmc26 it's possible on Linux; but you might need to install compatibility libraries first. x86 support is a non-optional part of Win64 installs. In Linux it's optional; and because the Linux world's much farther along in making 64bit versions of everything available some distros don't default to having (all?) 32bit libraries installed. (I'm not sure how common it is; but have seen a few queries about it from people running mainstreamish distros before.) — Dan Is Fiddling By Firelight, Jul 26 '16 at 15:23
@jpmc26 I updated my answer with your notes; I thought about mentioning that but didn't want to complicate the answer. — Elizafox, Jul 26 '16 at 17:32

score 17 · Answer 2 · edited Apr 13 '17 at 12:22

Elizabeth Myers is correct, each architecture requires a compiled binary for the architecture in question. To build binaries for a different architecture than your system runs on you need a cross-compiler.

In most cases you need to compile a cross compiler. I only have experience with gcc (but I believe that llvm, and other compilers, have similar parameters). A gcc cross-compiler is achieved by adding --target to the configure:

./configure --build=i686-arch-linux-gnu --target=arm-none-linux-gnueabi

You need to compile gcc, glibc and binutils with these parameters (and provide the kernel headers of the kernel at the target machine).

In practice this is considerably more complicated and different build errors pop out on different systems.

There are several guides out there on how to compile the GNU toolchain but I'll recommend the Linux From Scratch, which is continuously maintained and does a very good job at explaining what the presented commands do.

Another option is a bootstrap compilation of a cross-compiler. Thanks to the struggle of compiling cross compilers to different architectures on different architectures crosstool-ng was created. It gives a bootstrap over the toolchain needed to build a cross compiler.

crosstool-ng supports several target triplets on different architectures, basically it is a bootstrap where people dedicate their time to sort out problems occurring during the compilation of a cross-compiler toolchain.

Several distros provide cross-compilers as packages:

arch provides a mingw cross-compiler and and an arm eabi cross compiler out of the box. Apart from other cross compilers in AUR.
fedora contains several packaged cross-compilers.
ubuntu provides an arm cross-compiler too.
debian has an entire repository of cross-toolchains

In other words, check what your distro has available in terms of cross compilers. If your distro does not have a cross compiler for your needs you can always compile it yourself.

References:

ubuntu: Cross-Compile for ARM?

Kernel modules note

If you are compiling your cross-compiler by hand, you have everything you need to compile kernel modules. This is because you need the kernel headers to compile glibc.

But, if you are using a cross-compiler provided by your distro, you will need the kernel headers of the kernel that runs on the target machine.

@mattdm - thanks, answer tweaked, i believe i got the right part of the fedora wiki linked. — grochmal, Jul 26 '16 at 13:51
An easier way than Linux From Scratch to get a Linux and toolchain for another architecture is `crosstool-ng`. You might want to add that to the list. Also, configuring and compiling a GNU cross-toolchain by hand for any given architecture is incredibly involved and far more tedious than just `--target` flags. I suspect that's part of why LLVM is gaining popularity; It's architected in such a way that you don't need a rebuild to target another architecture - instead you can target multiple backends using the same frontend and optimizer libraries. — Iwillnotexist Idonotexist, Jul 26 '16 at 22:03
@IwillnotexistIdonotexist - thanks, i have tweaked the answer further. I have never heard of crosstool-ng before, and it is looks very useful. Your comment has actually been pretty useful for me. — grochmal, Jul 26 '16 at 23:58

score 9 · Answer 3 · answered Jul 26 '16 at 10:56

Note that as a last resort (i.e. when you don't have the source code), you can run binaries on a different architecture using emulators like qemu, dosbox or exagear. Some emulators are designed to emulate systems other than Linux (e.g. dosbox is designed to run MS-DOS programs, and there are plenty of emulators for popular gaming consoles). Emulation has a significant performance overhead: emulated programs run 2-10 times slower than their native counterparts.

If you need to run kernel modules on a non-native CPU, you'll have to emulate the whole OS including the kernel for the same architecture. AFAIK it's impossible to run foreign code inside Linux kernel.

The speed penalty for emulation is often even higher than 10x, but if one is trying to run code written for a 16Mhz machine on a 4GHz machine (a 250:1 difference in speed) an emulator that has a 50:1 speed penalty may still run code much faster than it would have run on the original platform. — supercat, Jul 26 '16 at 23:09

score 7 · Answer 4 · answered Jul 26 '16 at 09:42

7

Not only are binaries not portable between x86 and ARM, there are different flavours of ARM.

The one you are likely to encounter in practice is ARMv6 vs ARMv7. Raspberry Pi 1 is ARMv6, later versions are ARMv7. So it's possible to compile code on the later ones that does not work on the Pi 1.

Fortunately one benefit of open source and Free software is having the source so that you can rebuild it on any architecture. Although this may require some work.

(ARM versioning is confusing, but if there's a V before the number it's talking about the instruction set architecture (ISA). If there isn't, it's a model number like "Cortex M0" or "ARM926EJS". Model numbers have nothing to do with ISA numbers.)

answered Jul 26 '16 at 09:42

pjc50

3,016
18
12

2

... and then there are even different subflavors for the same ARM flavour, and even different ABIs for the exact same hardware (I'm thinking about the whole ARM soft/softfp/hard floating point mess). – Matteo Italia Jul 26 '16 at 16:39
1

@MatteoItalia Ugh. The multiple ABIs were a snafu, a cure to something that was worse than the disease. Some ARMs didn't have VFP or NEON registers at all, some had 16, some 32. On Cortex-A8 and earlier the NEON engine ran a dozen CCs behind the rest of the core, so transferring a vector output to a GPR cost a lot. ARM has gotten round to doing the right thing - mandating a large common subset of features. – Iwillnotexist Idonotexist Jul 27 '16 at 05:04

score 7 · Answer 5 · answered Jul 26 '16 at 13:24

You always need to target a platform. In the simplest case, the target CPU directly runs the code compiled in the binary (this roughly corresponds to MS DOS's COM executables). Let's consider two different platforms I just invented - Armistice and Intellio. In both cases, we'll have a simple hello world program that outputs 42 on the screen. I'll also assume that you're using a multi-platform language in a platform-agnostic manner, so the source code is the same for both:

Print(42)

On Armistice, you have a simple device driver that takes care of printing numbers, so all you have to do is output to a port. In our portable assembly language, this would correspond to something like this:

out 1234h, 42

However, or Intellio system has no such thing, so it has to go through other layers:

mov a, 10h
mov c, 42
int 13h

Oops, we already have a siginificant difference between the two, before we even get to machine code! This would roughly correspond to the kind of difference you have between Linux and MS DOS, or an IBM PC and an X-Box (even though both may use the same CPU).

But that's what OSes are for. Let's assume we have a HAL that makes sure that all different hardware configurations are handled the same way on the application layer - basically, we'll use the Intellio approach even on the Armistice, and our "portable assembly" code ends up the same. This is used by both modern Unix-like systems and Windows, often even in embedded scenarios. Good - now we can have the same truly portable assembly code on both Armistice and Intellio. But what about the binaries?

As we've assumed, the CPU needs to execute the binary directly. Let's look at the first line of our code, mov a, 10h, on Intellio:

20 10

Oh. Turns out that mov a, constant is so popular it has its own instruction, with its own opcode. How does Armistice handle this?

36 01 00 10

Hmm. There's the opcode for mov.reg.imm, so we need another argument to select the register we're assigning to. And the constant is always a 2-byte word, in big-endian notation - that's just how Armistice was designed, in fact, all instructions in Armistice are 4 bytes long, no exceptions.

Now imagine running the binary from Intellio on Armistice: the CPU starts decoding the instruction, finds opcode 20h. On Armistice, this corresponds, say, to the and.imm.reg instruction. It tries to read the 2-byte word constant (which reads 10XX, already a problem), and then the register number (another XX). We're executing the wrong instruction, with the wrong arguments. And worse, the next instruction will be complete bogus, because we actually ate another instruction, thinking it was data.

The application has no chance of working, and it will most likely crash or hang almost immediately.

Now, this doesn't mean that an executable always needs to say it runs on Intellio or Armistice. You just need to define a platform that's independent of the CPU (like bash on Unix), or both the CPU and OS (like Java or .NET, and nowadays even JavaScript, kind of). In this case, the application can use one executable for all the different CPUs and OSes, while there's some application or service on the target system (which targets the correct CPU and/or OS directly) that translates the platform-independent code into something the CPU can actually execute. This may or may not come with a hit to performance, cost or capability.

CPUs usually come in families. For example, all CPUs from the x86 family have a common set of instructions that are encoded in exactly the same way, so every x86 CPU can run every x86 program, as long as it doesn't try to use any extensions (for example, floating point operations or vector operations). On x86, the most common examples today are Intel and AMD, of course. Atmel is a well known company designing CPUs in the ARM family, quite popular for embedded devices. Apple also has ARM CPUs of their own, for example.

But ARM is utterly incompatible with x86 - they have very different design requirements, and have very little in common. The instructions have entirely different opcodes, they are decoded in a different manner, the memory addresses are treated differently... It might be possible to make a binary that runs on both an x86 CPU and an ARM CPU, by using some safe operations to distinguish between the two and jumping to two completely different sets of instructions, but it still means you have separate instructions for both versions, with just a bootstrapper that picks the correct set at runtime.

score 3 · Answer 6 · answered Jul 27 '16 at 07:25

It's possible to re-cast this question into an environment that might be more familiar. By analogy:

"I have a Ruby program that I want to run, but my platform only has a Python interpreter. Can I use the Python interpreter to run my Ruby program, or do I have to rewrite my program in Python?"

An instruction set architecture ("target") is a language -- a "machine language" -- and different CPUs implement different languages. So asking an ARM CPU to run an Intel binary is very much like trying to run a Ruby program using a Python interpreter.

mr.spuratic · Answer 7 · 2020-12-28T11:31:18.733

gcc uses the terms ''architecture'' to mean the ''instruction set'' of a specific CPU, and "target" covers the combination of CPU and architecture, along with other variables such as ABI, libc, endian-ness and more (possibly including "bare metal"). A typical compiler has a limited set of target combinations (probably one ABI, one CPU family, but possibly both 32- and 64-bit). A cross-compiler usually means either a compiler with a target other than the system it runs on, or one with multiple targets or ABIs (see also this).

Are binaries portable across different CPU architectures?

In general, no. A binary in conventional terms is native object code for a particular CPU or CPU family. But, there are several cases where they might be moderately to highly portable:

one architecture is a superset of another (commonly x86 binaries target i386 or i686 rather than the latest and greatest x86, e.g. -march=core2)
one architecture provides native emulation or translation of another (you might have heard of Crusoe), or provides compatible co-processors (e.g. PS2)
the OS and runtime support multiarch (e.g. ability to run 32-bit x86 binaries on x86_64), or make the VM/JIT seamless (Android using Dalvik or ART)
there is support for "fat" binaries which essentially contain duplicate code for each supported architecture

If you somehow manage to solve this problem, the other portable binary problem of myriad library versions (glibc I'm looking at you) will then present itself. (Most embedded systems save you from that particular problem at least.)

If you haven't already, now is a good time to run gcc -dumpspecs and gcc --target-help to see what you're up against.

Fat binaries have various drawbacks, but still have potential uses (EFI).

There are two further considerations is missing from the other answers however: ELF and the ELF interpreter, and the Linux kernel support for arbitrary binary formats. I won't go into detail about binaries or bytecode for non-real processors here, though it is possible to treat these as "native" and execute Java or compiled Python bytecode binaries, such binaries are independent of the hardware architecture (but depend instead on the relevant VM version, which ultimately runs a native binary).

Any contemporary Linux system will use ELF binaries (technical details in this PDF), in the case of dynamic ELF binaries the kernel is in charge of loading the image into memory but it is the job of the ''interpreter'' set in the ELF headers to do the heavy lifting. Normally this involves making sure all the dependent dynamic libraries are available (with the help of the ''Dynamic'' section which lists the libraries and some other structures which list the required symbols) — but this is almost a general purpose indirection layer.

$ file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses \
shared libs), stripped
$ readelf -p .interp /bin/ls
    String dump of section '.interp':
      [     0]  /lib/ld-linux.so.2

(/lib/ld-linux.so.2 is also an ELF binary, it does not have an interpreter, and is native binary code.)

The problem with ELF is that the header in the binary (readelf -h /bin/ls) marks it for a specific architecture, class (32- or 64-bit), endian-ness and ABI (Apple's "universal" fat binaries use an alternate binary format Mach-O instead which solves this problem, this originated on NextSTEP). This means that an ELF executable must match the system it is to be run on. One escape hatch is the interpreter, this can be any executable (including one that extracts or maps architecture specific subsections of the originally binary and invokes them), but you still are constrained by the type(s) of ELF your system will allow to run. (FreeBSD has an interesting way of handling Linux ELF files, its brandelf modifies the ELF ABI field.)

There is (using binfmt_misc) support for Mach-O on linux, there's an example there that shows you how to create and run a fat (32- & 64-bit) binary. Resource forks/ADS, as originally done on the Mac, could be a workaround, but no native Linux filesystem supports this.

More or less the same thing applies to kernel modules, .ko files are also ELF (though they have no interpreter set). In this case there's an extra layer which uses the kernel version (uname -r) in the search path, something that could theoretically be done instead in ELF with versioning, but at some complexity and little gain I suspect.

As noted elsewhere, Linux does not natively support fat binaries, but there is an active fat-binary project: FatELF. It's been around for years, it was never integrated into the standard kernel partly due to (now expired) patent concerns. At this time it requires both kernel and toolchain support. It does not use the binfmt_misc approach, this side-steps the ELF header issues and allows for fat kernel modules too.

If i have an application compiled to run on a 'x86 target, linux OS version x.y.z', can i just run the same compiled binary on another system 'ARM target, linux OS version x.y.z'?

Not with ELF, it won't let you do this.

If above is not true, the only way is to get the application source code to rebuild/recompile using the relevant toolchain 'for example, arm-linux-gnueabi'?

The simple answer is yes. (Complicated answers include emulation, intermediate representations, translators and JIT; except for the case of "downgrading" an i686 binary to only use i386 opcodes they're probably not interesting here, and the ABI fixups are potentially as hard as translating native code.)

Similarly, if i have a loadable kernel module (device driver) that works on a 'x86 target, linux OS version x.y.z', can i just load/use the same compiled .ko on another system 'ARM target, linux OS version x.y.z'?

No, ELF won't let you do this.

If above is not true, the only way is to get the driver source code to rebuild/recompile using the relevant toolchain 'for example, arm-linux-gnueabi'?

The simple answer is yes. I believe FatELF lets you build a .ko that is multi-architecture, but at some point a binary version for every supported architecture has to be created. Things which require kernel modules often come with the source, and are build as required, e.g. VirtualBox does this.

This is already a long rambling answer, there's only one more detour. The kernel already has a virtual machine built in, albeit a dedicated one: the BPF VM which is used to match packets. The human readable filter "host foo and not port 22") is compiled to a bytecode and the kernel packet filter executes it. The new eBPF is not just for packets, in theory that VM code is portable across any contemporary linux, and llvm supports it but for security reasons it's probably not going to be suitable for anything other than administrative rules.

Now, depending on how generous you are with the definition of a binary executable, you can (ab)use binfmt_misc to implement fat binary support with a shell script, and ZIP files as a container format:

#!/bin/bash

name=$1
prog=${1/*\//}      # basename
prog=${prog/.woz/}  # remove extension
root=/mnt/tmpfs
root=$(TMPDIR= mktemp -d -p ${root} woz.XXXXXX)
shift               # drop argv[0], keep other args

arch=$(uname -m)                  # i686
uname_s=$(uname -s)               # Linux
glibc=$(getconf GNU_LIBC_VERSION) # glibc 2.17
glibc=${glibc// /-}               # s/ /-/g

# test that "foo.woz" can unzip, and test "foo" is executable
unzip -tqq "$1" && {
  unzip -q -o -j -d ${root} "$1"  "${arch}/${uname_s}/${glibc}/*" 
  test -x ${root}/$prog && ( 
    export LD_LIBRARY_PATH="${root}:${LD_LIBRARY_PATH}"
    #readlink -f "${root}/${prog}"   # for the curious
    exec -a "${name}" "${root}/${prog}" "$@" 
  )
  rc=$?
  #rm -rf -- "${root}/${prog}"       # for the brave
  exit $rc
}

Call this "wozbin", and set it up with something like:

mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
printf ":%s:%s:%s:%s:%s:%s:%s" \
  "woz" "E" "" "woz" "" "/path/to/wozbin" ""  > /proc/sys/fs/binfmt_misc/register

This registers .woz files with the kernel, the wozbin script is invoked instead with its first argument set to the path of an invoked .woz file.

To get a portable (fat) .woz file, simply create a test.woz ZIP file with a directory hierarchy so:

i686/ 
    \- Linux/
            \- glibc-2.12/
armv6l/
    \- Linux/
            \- glibc-2.17/

Within each arch/OS/libc directory (an arbitrary choice) place the architecture-specific test binary and components such as .so files. When you invoke it the required subdirectory is extracted in to a tmpfs in-memory filesystem (on /mnt/tmpfs here) and invoked.

Update: see also the Cosmopolitan project which has a similar idea, and provides a solution for multi-platform executables using a polyglot format.

score 0 · Answer 8 · answered Oct 24 '19 at 00:07

berry boot, solve some of your problems.. but it not resoplve problem how to run on arm hf , normall/regullAr linux distro for x86-32/64bit.

I think it should be built in in isolinux(boatloader linux on usb) some live converter what could recognize regullar distro and in ride/live convert to hf.

Why? Because if each linux can be converted by berry boot to working on arm-hf so it could be able to build in bery boot mechanism to isolinux what we boot using for exaple eacher or built in ubuntu creat start up disk.

Are binaries portable across different CPU architectures?

8 Answers8

Kernel modules note