Frank DENIS random thoughts.

The best WebAssembly runtime may be no runtime at all

2023-12-11T00:00:00+01:00

When we think “a fast AOT WebAssembly compiler and runtime”, we typically think about V8, Wasmer, WasmEdge or Wasmtime.

All of these have in common that they are large, complicated pieces of software, that come with a lot of overhead, and only work on a limited set of platforms.

But how about transpiling WebAssembly code to C source code, and leveraging the state-of-the-art optimization passes of C compilers?

This is the approach taken by the wasm2c tool from the WABT package, as well as the single-file WebAssembly transpiler used to bootstrap the Zig compiler.

The output of these tools is really a line-by-line conversion of the WebAssembly code to dumb, unoptimized C code.

There are instant benefits to this. First, the ouptut is kinda human readable, which is useful for debugging. A WebAssembly function shows up as a regular C function, that can be directly called from C or any language with a C FFI.

Take existing C code, compile it to WebAssembly, transpile it back to C, and you get the same code, but sandboxed. The transformation acts as a sanitizer that improves safety by restricting the range of virtual memory accessible to each instance.

Of course, that works with any WebAssembly module, no matter what original languages it was written in.

With this approach, assembling different WebAssembly modules also becomes very easy.

Startup time is negligible. There’s no overhead. No runtime either. Just WebAssembly functions directly transpiled to C functions, that are trivial to embed in any project.

The reponsibility to compile that source code to native code is left to a regular C compiler. If the generated C source code is portable enough, this is also a great way to compile and run WebAssembly on embedded targets and new operating systems, that require custom compilers.

This approach can also greatly improve security and reliability. Because they don’t perform any optimization, WebAssembly to C transpilers are extremely small and simple. And the resulting code can even be compiled with formally-verified C compilers such as CompCert for high assurance code generation.

But how about features, usability and performance?

w2c2

w2c2 is probably the most advanced of these transpilers.

Among other things, it supports many WebAssembly extensions, including WASI-core and threads.

This is a fantastic piece of software, but it unfortunately requires a bit of work to setup and use.

Installing w2c2

Clone the w2c2 repository:

git clone 
cd w2c2

Compile it:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

Define where the files will be installed:

export W2C2_DIR=/tmp/w2c2

Install them:

install -d "$W2C2_DIR"/{bin,lib,include/{w2c2,wasi}}
install -s w2c2/w2c2 "$W2C2_DIR"/bin/
install wasi/libw2c2wasi.a "$W2C2_DIR"/lib/
install -m 0644 ../w2c2/w2c2_base.h "$W2C2_DIR"/include/w2c2/
install -m 0644 ../wasi/wasi.h "$W2C2_DIR"/include/wasi/

Alright, let’s get back to the root directory, and add w2c2 to the PATH environment variable:

cd ../..
export PATH="$W2C2_DIR"/bin:$PATH
rehash # only needed on some shells such as zsh

On macOS, the w2c2 command weights about 150KB only. Yes, that’s all we need to compile WebAssembly modules. For reference, the wasmer executable alone is 42MB large.

Creating a WebAssembly file

Next, let’s clone the Zig project boilerplate in order to get an example WebAssembly module:

mkdir test && cd test
zig init
zig build -Dtarget=wasm32-wasi -Doptimize=ReleaseSmall \
  -Dcpu=baseline+bulk_memory+sign_ext+nontrapping_fptoint

The resulting WebAssembly module can be found in zig-out/bin/test.wasm.

By defaut, WebAssembly modules created by Zig aim at maximizing portability, so they target the baseline virtual CPU, that doesn’t enable any WebAssembly extension.

However, bulk memory, sign extension and non-trapping floating point are not a problem for w2c2, so we enable them.

The above example doesn’t use threads, but since Zig supports WASI threads, we could add +atomic to the list of features in order to run multithreaded WebAssembly code.

Compiling WebAssembly to C

Now that we have a test.wasm WebAssembly module, the time has come to convert it to C.

mkdir transpiled && cd transpiled

w2c2 -p ../zig-out/bin/test.wasm test.c

The above command creates the test.c and test.h files, containing a C version of our WebAssembly module.

Since the module was named test, automatically generated functions are given test prefix. This allows multiple modules to be used together in the same application without name collisions.

Let’s try compiling this to native code, using a C compiler.

zig cc test.c -I. -I"$W2C2_DIR"/include -L"$W2C2_DIR"/lib -lw2c2wasi

error: undefined reference to symbol_main
error: undefined reference to symbol _wasiMemory
    note: referenced in /private/tmp/w2c2/lib/libw2c2wasi.a(wasi.c.o)
error: undefined reference to symbol_trap
    note: referenced in /Users/j/.cache/zig/o/5868ac77d407c9e2eae52a37fd79e706/test.o

The code contains our translated functions, but we still need to instantiate the module and call functions in order to get an application we can actually run.

Let’s create a main.c file to do so:

#include 
#include 
#include 
#include 
#include "test.h"

void trap(Trap trap)
{
    fprintf(stderr, "TRAP: %s\n", trapDescription(trap));
    abort();
}

wasmMemory *wasiMemory(void *instance)
{
    return test_memory((testInstance *) instance);
}

extern char **environ;

int main(int argc, char *argv[])
{
    testInstance i;

    testInstantiate(&i, NULL);
    if (!wasiInit(argc, argv, environ)) {
        fprintf(stderr, "failed to initialize WASI\n");
        return 1;
    }
    test__start(&i);
    testFreeInstance(&i);

    return 0;
}

In WebAssembly commands, the entry point is named _start(). Functions transpiled to C are named _, so in order to call the _start() function, we simply call test__start() with the instance as an argument.

Now that we have a main function, let’s compile all this, and link the w2c2 WASI-core implementation by the way since our test example uses it.

zig cc -o test -O3 -s main.c test.c \
  -I. -I"$W2C2_DIR"/include/w2c2 -I"$W2C2_DIR"/include/wasi \
  -L"$W2C2_DIR"/lib -lw2c2wasi

Done!

We can now run our application:

./test

All your codebase are belong to us.

Yep, that’s it! Our WebAssembly module got converted into a small, self-contained, native executable. Executable size and memory usage are ridiculously small compared to traditional runtimes.

How about speed?

Certainly, code blindly transpiled to C cannot be as efficient as a dedicated WebAssembly compiler, right?

Let’s compile the famous libsodium test/benchmark suite to WebAssembly:

zig build -Denable_benchmarks -Dtarget=wasm32-wasi -Doptimize=ReleaseFast \
  -Dcpu=baseline+bulk_memory+sign_ext+nontrapping_fptoint

The resulting tests are placed into the zig-out/bin directory.

Now, let’s try a pretty heavy one, such as the Ed25519 signature test: zig-out/bin/sign.wasm.

The tests print the time they take to complete, excluding initialization and finalization.

First, with wasmtime, enabling all the supported extensions:

❯ time wasmtime run --wasm-features=all test.wasm
625928560000

wasmtime run --wasm-features=all /tmp/test.wasm  124.76s user 0.02s system 99% cpu 2:05.21 total

Then, after transpiling the module to C using w2c2 and compiling it with zig cc:

❯ time ./test
361604070000

./test  73.38s user 0.01s system 99% cpu 1:13.65 total

You got that right. A naive transpilation to C almost always beats dedicated WebAssembly compilers, sometimes by a large margin. That may change over time, though.

Downsides

Of course, there are downsides. An obvious one being that this is incompatible with just-in-time compilation or singlepass compilation. Compilation is not fast, since an initial transpilation pass is required.

The wasm-to-C approach would thus not be a good fit for running arbitrary code in a Web browser.

But no matter what the runtime is, with the wasm32 target, memory accesses are limited to 32-bit offsets. Taking advantage of this and virtual memory, runtimes usually add guard pages around each instance’s memory in order to catch accesses outside the reserved areas.

But the code currently generated by w2c2 doesn’t automatically setup these guard pages. This is still left to the application.

As an alternative to guard pages, for wasm64 or on platforms without virtual memory, some traditional WebAssembly compilers can decorate memory accesses with bound-checking code. w2c2 currently can’t, but it totally could.

In spite of having excellent support for WebAssembly extensions, WASI-core and WASI-threads, w2c2 intentionally doesn’t have bells and whistles, some of them being critical for some applications. For example, traditional runtimes often support gas metering and preemption, which are mandatory for many applications.

Is it for you?

If the justification to use WebAssembly is the ability to improve code safety, then w2c2 (and the generic wasm-to-C approach) is for you. The result will be small, efficient and easy to use in any language with a C FFI.

On platforms not supported by big runtimes, w2c2 may also work for you, and the resulting native code will be faster than interpreters such as wasm3.

For all other cases, traditional runtimes still offer way more features and flexibility.

LLVM performance improvements for WebAssembly

2023-09-26T00:00:00+02:00

WebAssembly has been an LLVM target for a long time.

Every release of LLVM brings regressions, but also new optimization passes that can affect WebAssembly.

But WebAssembly cannot be directly executed by CPUs. In order to be executed, it has to go through yet another compiler, or just-in-time compiler. So, even without new optimization passes, a slight change in the LLVM output can impact the WebAssembly compiler, and have significant impact on performance.

Zig 0.11 includes LLVM 16, while Zig 0.12 merged LLVM 17 the day it was released. At the time of testing, the C library for WASI (wasi-libc) was exactly the same in both compiler versions.

So, how does LLVM 17 compare to LLVM 16 performance-wise?

The picture below represents the performance gains of LLVM 17 (zig cc 0.11) over LLVM 16 (zig cc 0.12), as a percentage.

This is the standard libsodium benchmark, using libsodium 1.0.19, run with 500 iterations for each test, on a Zen4 CPU, {under,over}clocking turned off. Exact same libsodium version and build script, with the simd128 and bulk_memory extensions turned on.

WebAssembly modules were compiled to native code and executed using Intel’s WebAssembly runtime (iwasm). In the previous libsodium benchmarks, iwasm was previously shown to be the fastest runtime, along with WebAssembly to C transpilers (w2c2, wasm2c).

A few regressions, some improvements, but also some massive gains, +14% and +17%, for two similar tests. With the exception of these tests, there’s nothing really significant.

At that point, it may be worth looking an extra revision back. What performance gains did we get from upgrading from LLVM 15 to LLVM 16?

Let’s find out, by using zig cc 0.10, that used LLVM 15.

Whoops! LLVM 15 is much faster than LLVM 16 on the AEGIS tests. It’s likely that there was a nasty regression for the particular patterns triggered by the AEGIS tests in LLVM 16.

So, maybe we should skip LLVM 16 and compare LLVM 15 to LLVM 17:

Verdict:

LLVM 15 was the fastest
LLVM 16 had a significant regression
LLVM 17 fixes the LLVM 16 regression, but doesn’t bring anything over LLVM 15. Performance is very similar.

Note that we just benchmarked code written in C. Maybe code written in Zig, Swift, Rust, etc. performs differently.

Discuss it on Reddit’s WebAssembly forum.

Performance of WebAssembly runtimes in 2023

2023-01-04T00:00:00+01:00

Using libsodium in a web browser has been possible since 2013, thanks to the excellent Emscripten project.

Since then, WebAssembly was introduced. A more efficient way to run code not originally written in JavaScript in a web browser.

And libsodium added first-class support for WebAssembly in 2017. On web browsers supporting it, and in allowed contexts allowing it, that gave a nice speed boost. Like JavaScript, the same code could seamlessly run on multiple platforms.

Also like JavaScript, applications started to use WebAssembly server-side. Still like JavaScript, and ignoring bugs in runtime implementations, it doesn’t allow untrusted code to read or write memory outside of a sandbox. That alone makes it a compelling choice for application plug-ins, function-as-a-service services, smart contracts and more.

In 2019, support for a new WebAssembly target (wasm32-wasi) was added to libsodium, making it possible to use the library outside web browsers, even without a JavaScript engine.

As of today, multiple runtimes support wasm32-wasi, but on the same platform, the same code can run with very different performance across runtimes.

Benchmarking abilities for wasm32-wasi were thus added to libsodium.

This benchmark proved to be more representative of real-world performance than micro-benchmarks. Sure, libsodium is a crypto library. But the diversity of the primitives being measured exercises the vast majority of optimizations implemented (or not) by WebAssembly runtimes/compilers/JITs, and this benchmark turns out to be a good representative of real-world applications.

Since its introduction, the libsodium benchmark has been widely used by runtimes to improve their optimization pipelines, by researchers to measure the impact of experiments on WebAssembly, and by users to pick the best runtimes for their workload.

But it’s been a while since results were published here. Meanwhile, runtimes have improved, so an update was overdue.

What happened since the last benchmark

InNative is still actively maintained, but still didn’t get WASI support. While it looks worth benchmarking, this is unfortunately still a showstopper for the time being.
Lucet has been EOL’d, as Wasmtime and Wasmer provide the same features, using the same code generator.
Fizzy doesn’t seem to be actively maintained any more.
WAVM doesn’t seem to be maintained any more either. WAVM implemented proposals before everybody else, and has consistently been the fastest runtime. Has EPIC Games given up on WebAssembly? Minor updates keep being made in the fork used by FAASM, but there’s no more activity in the upstream repository. This is a big loss for the WebAssembly community.
SSVM became WasmEdge, the runtime from the Cloud Native Computing Foundation. The project is very active, and has been focused on performance since day one. It comes with a lot of features, including the ability to run plug-ins.
Wasm3’s development pace seems to have slowed down. However, it remains the only WebAssembly runtime that can easily be embedded into any project, with minimal footprint, and amazing performance for an interpreter. It still doesn’t have any competition in that category.
Wasmtime quickly went from version 0.40 to version 3.0.1, with version 4 being round the corner. Every release is an opportunity to update Cranelift, the code generator it is based on. A lot of improvements were recently made to Cranelift, so it’s high time to see how they reflect in benchmarks.
Wasmer kept releasing unique tools and features, such as the ability to generate standalone binaries. Their single-pass compiler also got updated. Let’s put it to the test!
Wamr saw a bunch of new releases. Pre-built binaries are also now available. This is very good news, as the compilation process used to be tedious.
Wazero is a new, zero-dependency runtime for Go. While it hasn’t reached version 1.0 yet, it looks extremely promising.
Node has excellent support for WebAssembly and WASI. Yet, the latter still isn’t enabled by default, and requires command-line flags (--experimental-wasm-bigint --experimental-wasi-unstable-preview1).
Bun showed up as a modern alternative to node, based on JavaScriptCore. It doesn’t support WASI out of the box, but wasmer-js emulates it perfectly.
GraalVM now includes experimental support for WebAssembly.

Most other runtime projects (Asmble, Wac, Windtrap, Wagon, Py-Wasm, …) appear to have been abandoned.

Compiling C code to WebAssembly

WebAssembly includes several proposals to improve performance, such as the ability to perform tail-call optimizations, support for 128-bit vectors, threads, and bulk memory operations (memcpy/memset).

Unfortunately, all these interesting additions are still only proposals that runtimes may or may not implement, and may or may not enable by default. An application compiled with support for one of these proposals will crash on a runtime that doesn’t support it, or didn’t enable it.

Ideally, every WebAssembly application should be distributed for multiple targets (wasm32-wasi-generic, wasm32-wasi-generic+simd128, wasm-wasi-generic+simd128+threads+tail_calls, etc). But this is unmanageable, and runtimes don’t even implement the ability to automatically select a suitable target.

So, in practice, to write applications that are compatible with a wide range of runtimes, using these proposals is not an option. Bulk memory operations may be an exception: they are supported by the vast majority of runtimes, and enabled by default in wasm3, wasmedge, wasmer, wasm2c and node.

To benchmark the runtimes presented above, we need to build the library for the wasm32-wasi flavor of WebAssembly.

In order to do so, and since libsodium is written in C, we have to use a C/C++ (cross-)compiler that can target wasm32-wasi: the zig cc command from the Zig toolchain.

Previously, compiling libsodium to wasm32-wasi required setting the compiler to the zig cc --target=wasm32-wasi command. But the library source code now includes a Zig build file that can be used as an alternative to autotools. So, for this benchmark, the library was built with the following command:

zig build \
  -Dtarget=wasm32-wasi \
  -Drelease-fast \
  -Denable_benchmarks=true \
  -Dcpu=generic+bulk_memory

The Zig version used was 0.11.0-dev.863+4809e0ea7, and libsodium was at revision 58ae64d319246e5530c.

Along with the library itself, the resulting zig-out/bin/ folder contains WebAssembly files for every benchmark.

Methodology

Tests output the time it took for them to run. So, the benchmark ignores the setup/teardown time individual runtimes may have. Similarly, the compilation time required by ahead-of-time compilers is intentionally not measured. We only measure actual execution performance.

The benchmark was run on a Zen 2 CPU provided by Scaleway. Nothing else was running on the instance, tests were pinned to a single CPU core, and each test was run 200 times in order to further reduce noise.

Prior to being run, the WebAssembly files were further optimized with the wasm-opt -O4 --enable-bulk-memory command (part of the binaryen tools).

The exact set of WebAssembly files used for the benchmark can be found here.

The following runtimes have been benchmarked:

iwasm, which is part of the WAMR (“WebAssembly micro runtime”) package - pre-compiled files downloaded from their repository
wasm2c, included in the Zig source code for bootstrapping the compiler
wasmer 3.0, installed using the command shown on their website. The 3 backends have been individually tested
wasmtime 4.0, compiled from source
node 18.7.0 installed via the Ubuntu package
bun 0.3.0, installed via the command show on their website
wazero from git rev 796fca4689be6, compiled from source

Unfortunately, GraalVM couldn’t take part of the benchmark, as its support for WASI appears to be very limited. The absence of random_get() required to generate random numbers was a blocker. It also doesn’t support bulk memory operations. Maybe next time?

Tests have also been grouped by category. This significantly improves readability compared to previous iterations of the benchmark.

Results have been median normalized. The X-axis represents how much slower a runtime is compared to the median performance (1). So, smaller is better, and a result of 2 means that the runtime was 2x slower than the median.

AEAD benchmark

The authenticated encryption tests feature a small function called many times (for every input block) with different parameters. Performance typically depends on auto-vectorization and register allocation.

None of the contestants appears to have been able to auto-vectorize anything, so results are fairly similar.

Authentication benchmark

These tests are based on a software implementation of the SHA-2 (SHA-256, SHA-512, SHA-512/256) hash function.

Like AEADs, they rely on a function called many times with different parameters. However, there are also quite a lot of constants, that compilers have an opportunity to inline.

In these tests, wasmedge doesn’t perform well compared to Wasmer with the LLVM backend, which is unexpected considering the fact that they are both based on LLVM.

The JavaScriptCore engine, represented by bun, shows some serious room for improvement on these tests. It seems to lack optimization passes that even single-pass compilers have.

Box benchmark

These tests combine bitwise operations and arithmetic. Results are fairly similar, with the exception of single-pass compilers that cannot afford expensive optimizations.

Arithmetic over elliptic curves

A lot of arithmetic, mostly using 64-bit registers. Unsurprisingly, results are similar to the box tests.

Hashing benchmark

BLAKE2B and SHA-2 hashing. This is similar to the authentication benchmark.

bun remains slower than other runtimes, even though BLAKE2B doesn’t use large constant tables like SHA-2 does. There’s definitely a pattern found in hash functions that JavaScriptCore cannot properly optimize yet.

Key exchange and key derivation benchmark

Same hash functions under the hood, but with a major difference in the input of these functions: here, inputs are short. Ignoring single-pass compilers, results are fairly close.

Metamorphic benchmark

Apparently not a lot of wiggle room for optimizations here: with the usual exceptions, all runtimes perform almost exactly the same, with the exception of Wazero.

These tests perform memory allocations and require a lot of random numbers (obtained via a WASI syscall). This is what may cause Wazero to be so slow, rather than how the code is optimized.

One-time authentication benchmark

This is benchmarking the Poly1305 function. This is simple arithmetic using 64-bit registers. Multi-pass compilers seem to apply the same optimizations; so do single-pass compilers.

Password hashing benchmark

Password hashing functions rely on memory reads and writes in unpredictable locations. There’s probably not much to optimize beyond what the C compiler already did.

Diffie-Hellman benchmark

Arithmetic over finite fields. Optimization opportunities include usage of ADCX/ADOX for carry propagation.

Overall, all runtimes perform quite well, with the exception of Wazero, which is significantly slower than Wasmer with the singlepass backend.

Secretbox benchmark

Here, we benchmark authenticated encryption using salsa20, chacha20 and poly1305. Simple arithmetic on things that mostly fit on 32-bit and 64-bit registers.

Keygen benchmark

These tests are all about randomness extraction. There’s not much to optimize here. The tests mainly measure the overhead of WASI calls to get random bytes, and, very likely, the over of WASI calls in general.

The difference between node and bun is expected, as node has native support for WASI, while bun requires the wasmer.js emulation layer. It shouldn’t take long for bun to also include native WASI support, which should bring it in the same ballpark as other runtimes.

wasmedge, however, doesn’t have the WASI emulation excuse. Calling external functions in wasmedge may have more overhead than with other runtimes.

Signature benchmark

Arithmetic over elliptic curves, triggering common optimization passes, and making very few WASI syscalls. Here, wasmedge, also based on LLVM manages to be faster than wasmer with the LLVM backend.

Utilities benchmark

Codecs, comparisons functions, and other simple helper functions, designed to intentionally prevent compiler optimizations, as a best effort to make them constant-time.

As expected, performance is very similar across runtimes.

Stream ciphers benchmark

Salsa20 and Chacha20 stream ciphers, so, mostly bitwise arithmetic involving vectors of 32-bit values. There are opportunities for auto-vectorization, but this is not trivial.

Performance is very similar, with the exception of single-pass compilers.

Cumulative benchmark

iwasm, wasmer (LLVM backend) and wasmedge are all using LLVM for code generation. Intuitively, one may expect very similar performance. But this is not the case.

But iwasm, a newcomer in this benchmark (at least in AOT mode), was consistently the fastest runtime. This is a pleasant surprise. Its performance is likely to be similar to wavm, that it can be considered as replacement for.

That being said, wasmer is very close.

In the LLVM category, I was expecting wasmedge to take the lead given how much performance-focused the project has been since the beginning. Its results were a little bit disappointing. V8 (represented by node) is about as fast.

On the other hand, runtimes using cranelift for code generation (wasmer with that backend, and wasmtime) perform virtually the same.

Zooming out a little bit, what is really impressive is how close the LLVM-based results and the cranelift-based results are.

That’s right. The cranelift code generator has become as fast as LLVM. This is extremely impressive considering the fact that cranelift is a relatively young project, written from scratch by a very small (but obviously very talented) team.

The JavaScriptCore engine, represented by bun, was disappointing. As a JavaScript engine, it is blazing fast. As a WebAssembly engine, not so much yet. Some of these results can be explained by the lack of a native WASI implementation, but there’s probably more room for optimization.

wasmer with the singlepass backend and wazero have in common that they compile very quickly, in a streaming fashion. This is a critical property for some applications involving untrusted inputs.

The flipside is that expensive optimizations cannot be done. This is an inevitable tradeoff.

wazero’s results are not great yet. But with its seamless integration with the Go language involves additional constraints, especially on Intel CPUs that have a limited number of registers. wazero may perform much better on ARM CPUs, and this is something that we will measure soon.

In any case, the wazero team is taking performance very seriously, and already started investigating these results.

Comparison against native code

How does WebAssembly compare against native code?

The chart above represents the ratio between the time taken to run individual tests with the fastest WebAssembly runtime, and the same tests with the library compiled as native code, with architecture-specific optimizations.

Two outliers have been hidden: the aegis128l and aegis256 tests. Native code takes advantage of native CPU instructions to compute AES rounds. WebAssembly, on the other hand, cannot take advantage of these instructions and has to fall back to slow, software implementations.

As a result, these AES-based tests were 80 times slower than native code when running WebAssembly. This is not representative of most applications. However, it highlights a real limitation of WebAssembly for cryptographic operations relying on AES.

Ignoring this, when using the fastest runtime, WebAssembly was only about 2.32 times slower (median) than native code with architecture-specific optimizations. Not bad!

Verdict

We have four classes of WebAssembly runtimes:

Interpreters, and in that category, wasm3 is probably still the best option. It’s also far easier to embed than anything else.
LLVM/Cranelift/V8-based runtimes with comparable performance.
JavaScriptCore-based runtimes – slightly behind the rest yet.
Single-pass compilers.

iwasm

If you’re looking for the best performer, iwasm is currently the one to choose.

iwasm is part of the WAMR project.

Compared to other options, it is intimidating.

It feels like a kitchen sink, including disparate components (IDE integration, an “application framework library”, remote management, an SDK) that makes it appear as a complicated solution to simple problems. The documentation is also a little bit messy and overwhelming.

It also has a lot of knobs, both at compile-time and at run-time. The memory usage tuning page is scary. I wish there was just a max_memory knob, rather than multiple settings that may or may not work depending on applications and how they were compiled. That being said, this also applies to other runtimes, and the default values are probably reasonable.

Ignoring this, there is a simple C API, as well as bindings for Python and Go. So in these languages, it is actually quite easy to use.

The runtime itself is very small (50 KB) and has a small memory footprint. It can be tailored to applications in order to reduce it even further, making it an excellent choice for constrained environments. Especially since it supports platforms such as Zephyr, VxWorks, NuttX and RT-Thread out of the box.

LLVM/Cranelift/V8-based runtimes

node, wasmtime, wasmedge and wasmer are in the same ballpark.

None of them has a stable API yet. This is especially true for Rust APIs. If your application depends on the Rust APIs of wasmer or wasmtime, be prepared for maintenance costs.

However, the C APIs are far more stable: wasmtime’s C API only had one recent breaking change, while wasmer completely revamped the API once, and kept it stable since.

The wasmedge C API is very different, and has many more features. Breaking changes still happen, but they are generally minor and easy to deal with.

node (or, actually, V8) has by far the most stable API, even though WASI support in node is still considered experimental.

So, node (or V8+uvwasi when embedded) is a fine, safe, conservative choice, and for WebAssembly, there’s not a lot of performance to gain in switching to something else. And yes, like others, V8 supports full module AOT compilation, too.

However, in order to run WebAssembly code, the entire V8 engine has to be shipped. This is a huge dependency. Not an issue for applications also needing support for JavaScript, but this is definitely overkill for applications that only need WebAssembly.

This is where wasmtime, wasmedge and wasmer come into play. They are smaller (= reduced attack surface), and include features suitable for running third party code/smart contracts, such as gas metering.

For most users, there are no significant differences between these three runtimes. They share similar features (such as AOT compilation) and run code the same way, roughly at the same speed.

They have some unique features that can make a difference to some applications, though.

wasmer has the largest ecosystem, with libraries making it very easy to use from many programming languages, as well as applications such as PostgreSQL and Visual Studio Code.

It also includes the ability to generate standalone binaries for all supported platforms, has a package manager and more. In spite of some breaking changes, it seems more focused on API stability than alternatives, making it a reasonable choice for applications that are planned to be maintained for a long time.

wasmedge is the runtime from the CNCF, and what Docker uses to run containers with WebAssembly applications. So, even if this is not your runtime of choice, testing that your WebAssembly code properly runs on it is highly recommended, as its popularity is likely to skyrocket.

wasmedge comes with networking support, although this is currently limited to Rust and JavaScript applications. But one of the best features of wasmedge is its plug-in system, allowing it to be extended without having to change the core runtime. Out of the box, it comes with plug-ins implementing the wasi-nn and wasi-crypto proposals, as well as an HTTP client plug-in and a simple way to run external commands in a controlled way.

wasmedge includes libraries to easily embed it into C, Rust, Go, JavaScript and Python applications. It supports preemption (this is nice!), instruction counting and gas measuring/limiting,

CVEs for quite a few vulnerabilities in wasmtime have been assigned. Some are severe (use after free, out-of-bound accesses, type confusion…) possibly leading to secret disclosure and arbitrary code execution.

Is it a bad thing? Actually not. To my knowledge, V8, JavaScriptCore and wasmtime are the only WebAssembly runtimes having publicly disclosed vulnerabilities so far.

Which is quite surprising, especially since some of these vulnerabilities were in cranelift, which is also used by wasmer.

wasmtime did responsible security disclosure after every vulnerability, and went above and beyond to prevent similar vulnerabilities from happening again.

Changes are made carefully, and the project is also constantly being fuzzed to discover new bugs.

If the intent is to run arbitrary, untrusted code outside a browser environment, wasmtime feels like the most secure option.

Without any optional features, a minimal executable linked against libwasmtime weights 22 MB, even if AOT is used and compilation functions are not called.

This is a bit more than wasmer (a basic standalone executable produced by wasmer weights 16 MB) but far less than wasmedge, that only ships a shared library that weights no less than 45 MB, which is as big as the node library (46 MB)!

To summarize, it’s hard to strongly recommend one of these runtimes. You can’t go wrong with any of them, unless your application needs one of their unique features.

JavaScriptCore

JavaScriptCore is to Safari what V8 is to Chrome: a portable, full-featured JavaScript engine, backed by excellent, multi-level interpreters and compilers.

Both are used in the vast majority of web browsers we use today, but also, outside the browser in node (V8) and bun (JavaScriptCore), as well as in function-as-a-service services and as a way to extend applications.

And JavaScriptCore performance is excellent, often exceeding the one of V8.

Building on top of their existing compilers, they both eventually got support for WebAssembly.

Unfortunately, JavaScriptCore currently doesn’t perform as well as V8 for WebAssembly. As of today, if you’re looking for a JavaScript+WebAssembly engine, I would thus recommend V8.

That being said, JavaScriptCore developers reached out to me, asking for a license file to be added to the benchmark files. So, it is very likely that they will fix the optimizations JavaScriptCore is currently missing, and the engine will eventually perform as well, or even better than other runtimes.

Wazero

To embed WebAssembly modules in a Go application, wazero should be your go to choice unless performance is absolutely critical.

wazero is entirely written in Go, with zero dependencies. That comes with many advantages: safety, portability, perfect integration with the language, and adding it to existing projects is trivial and adds negligible overhead compared to alternatives.

wasm2c

Now has come the time to introduce wasm2c, which is clearly in a different category.

Can a WebAssembly compiler include a WebAssembly runtime in order to run a WebAssembly version of itself to compile itself again? If that sounds confusing, go read how Zig got rid of the C++ compiler.

wasm2c is a WASM to C transpiler. The transpiler itself was written in a couple days, is super simple, fits in ~2000 lines of self-contained code and doesn’t try to be smart nor optimize anything. It just literally translates individual WebAssembly opcodes into very dumb C code, and lets a C compiler take care of all the optimization work. It also includes minimal support for WASI.

The wabt package includes a similar tool, with more advanced features. But looking at what the simplest implementation could do seemed interesting.

Zero dependencies needed besides a C compiler (here, I used zig cc since it was already installed). The runtime overhead is… zero: there’s no runtime! Memory overhead is also negligible. But how about performance?

Looking at this benchmark’s results, performance of such a trivial approach turns out to be outstanding. In fact, it is neck-to-neck with the fastest WebAssembly runtime in every single test.

But there’s a catch. The two main selling points of WebAssembly are portability (this is bytecode after all), and the guarantee that an application cannot access arbitrary memory locations.

Code compiled from wasm2c’s output provides the former: a single copy of WebAssembly code can run on multiple platforms.

However it doesn’t currently do anything to enforce the latter. Doing so would require adding guard pages before the stack and after linear memory segments, or extra operations such as applying a mask to pointers used in load/store operations to keep them within a safe range.

But when the WebAssembly code to run is trusted, this very compact and simple approach works amazingly well, and was an excellent decision to bootstrap a compiler.

Leveraging this approach, w2c2 looks like a very promising project to keep an eye on (claiming to be only 7% slower than native code!)

Kudos, everyone!

A lot happened in the world of WebAssembly runtimes in the past two years. Projects were abandoned, but new projects saw the light, exciting new features have been added to existing runtimes and impressive performance improvements have been made (cranelift now being as fast as LLVM). There’s no clear winner any more, only good options.

We also saw new applications of WebAssembly outside browser environments, and consistent ways to use WebAssembly across programming languages (Extism). WebAssembly also became an essential component for confidential computing (Vera Cruz, Enarx, Inclavare).

WebAssembly support in languages and tools also improved. Swift can now emit WebAssembly code. C/C++ code can be compiled to WASI using zig cc instead of manually patching clang or having to install yet another entire copy of LLVM. tinygo became more mature. And new programming languages targeting WebAssembly were born.

As to using WebAssembly in web browsers, and among other things, Cowasm was opensourced: an alternative to Emscripten developed by SageMath, with real-time collaboration features. This alone unlocks a lot of possibilities.

The future for the WebAssembly ecosystem definitely looks bright!

It doesn't work

2021-03-26T00:00:00+01:00

8 AM. Like any other day, I take a quick glance at my GitHub notifications via Octobox.

“Problem”
“Doesn’t work”
“Broken”
“Failure”
“Error”
“Bug”
“Doesn’t work”
“Crash”
“Can’t build”
“Install failed”
“Not working”
“Help”
“Does not compile”
“Bug”
“Not connecting”
“Problem”

Perfect. Nothing special.

The vast majority of open source projects were born the same way. A desire to quickly solve a personal need, or to learn something.

Why not make the code public? It may be useful to other people and having the code archived and indexed by GitHub is convenient.

The first lines of code have been written. That code is not pretty, but it partially solves the initial problem already. Time to push it to GitHub. That feels like a great achievement.

The issue tracker may quickly start to fill in. “Thank you, this is exactly what I needed” would be a heartwarming thing to read, but you will never see such a comment ever. Most likely:

“It doesn’t work.”

Obviously, the project DOES “work.” You’ve been using it daily for a couple days already, and it does exactly what you wanted it to do.

The reporter may not have been able to install it, to configure it the way they wanted, or to use it to solve a different problem than yours.

They are actually asking for free support services. In the real life, explicitly asking for help feels natural. It may even start with “hello” and end with “thank you.”

In the GitHub issue tracker, we don’t call for help. We complain about something we tried to do, and that “didn’t work.”

Certainly, this is an issue tracker. This is a place to complain, not to provide positive feedback.

However, what is often overlooked is the psychological impact this can have.

Every new filled issue is akin to a new item in a “todo” list for the project developer. It has to be handled in some way. By reading and trying to understand it, and by responding to it in order to solve a stranger’s issue. This alone, puts some mental pressure. Watching a list of open issues grow is stressful. It feels like a never-ending todo list that you never really asked for, and whose resolution is not going to improve your own issues.

The vast majority of these support tickets are negatively formulated. If a user didn’t manage to install the software on their device, they will call it a bug. If there’s a syntax error in their configuration file, they will report that “it crashes.” Everything else is a “problem,” “doesn’t work” or “fails.”

While it was certainly not the intent, this overwhelming negativity has consequences. It makes developers gradually feel like shit. Their software is just a pile of junk that cannot do anything but fail.

As the number of projects you release as open source grows, the number of issues grows as well. Complains keep coming even for projects you don’t really use any more. Can these issues be ignored? Every time you make a new project public, you sign an implicit contract with “the community” to support it, forever. And support is not much about bug fixes than helping users solve problems you never had yourself, whose root causes are often completely unrelated to your very projects.

A report for an actual bug, something that can be easily reproduced on an already running successful installation of the software, and that also affects the developers, is immensely welcome. Unfortunately, these are pretty much nonexistent.

Adding categories and templates doesn’t help. “It doesn’t work” calls to solve individual problems will eventually end up in categories you may expect actual bug reports in. So, you still need to check what’s behind every “doesn’t work” description in case a real bug report would hide in there.

Behind some of these “doesn’t work” issues are anonymous company employees, opening issues with accounts having no activity on GitHub besides opening support and “feature requests” tickets. With “thumb up” votes from similar accounts appearing right after the post. “Doesn’t work. We are waiting for a solution.” If this is not a “doesn’t work” ticket, this is a command: “Tag a new release. This is blocking our processes.”

There is no “we.” I am not part of your team, nor am I supposed to do the work you are being paid for, no matter if you somehow use one of my projects or not.

There’s a sense that installing free software grants the ability to get free support from their maintainers. And to some extent, it does. Because it’s hard to say “do your homework” and close the ticket. Every closed ticket needs a justification. Which will live forever in the issue tracker history, and that people are going to look at, even years after the fact. So, you may have to keep solving users’ problems, even if they are completely unrelated to the way you run the software yourself.

“Doesn’t install on the Titan framework running BeOS for iPhone 4 (Chinese version 2.7 Pro)”

Some tickets refer to environments and tools you may have never heard of. Or to custom builds. Or to configurations insanely more complicated than what you ever used.

“I don’t know” and “I don’t care” would be honest answers. The former is not a valid justification for closing the issue. The latter would backfire. So, you spend time Googling for that obscure thing, try to understand the user’s issue from the pieces of the puzzle you somehow manage to collect, and tentatively come up with a credible answer. All you really want is that issue to be closed in the best possible way: by the user themselves.

Meanwhile, stress intensifies. Every single new ticket is causing stress and anxiety. Not about the content, but about what will have to be done in order to get it closed. How much time and effort it is going to take.

Only with the hope of having some time left to actually work on the project. Time spent helping people is not spent writing code.

Projects maintainers know how to install and use the software. And for their use case, it works. If the documentation is incomplete, one has to remember that this very documentation was written as a gift, to help other people. If the project doesn’t work in an environment maintainers don’t use, they shouldn’t be blamed for it.

Having feedback is great. Realizing that a project is useful to other people is fantastic and encouraging. The ability to report bugs and make suggestions is very powerful. But this is not what the GitHub issue tracker is mostly used for. It is used to complain or ask for personalized help, describing what was tried and didn’t succeed as a “bug,” or as a “problem” in a piece of software that’s “broken,” “fails” and “doesn’t work.”

People working in support departments have to handle far more painful situations, all day long. If only for that, they deserve a lot of respect.

However, they have training. They know how to handle different types of customers. They can transfer to other people. They have relevant skills and experience.

Project maintainers don’t have these. Furthermore, support employees are supporting a company’s products. They certainly share a corporate culture, however, complains are targeted to the company’s work, not to their own work.

A “doesn’t work” ticket on a personal project issue tracker is something we take personally. There’s no one to ask for help about how to handle it in the best way, no manager or coworker to transfer the case to. Ignoring it will not make it disappear. It will still show up every single day until it eventually gets closed somehow.

Until then, “doesn’t work” tickets piling up make your work look like junk and hurt.

The GitHub issue tracker made me cry more than once. I couldn’t get any sleep after having closed tickets without substantial justification. Sometimes, I’d still like to somehow be able to say “please leave me alone” as support requests keep coming. And “ping” messages being posted on old issues that I postponed the review of because deciphering the actual problem being described was challenging.

On some projects, I eventually had to give up, and closed the issue tracker. But “doesn’t work” issues kept coming. As comments on old commits instead, because these can’t be disabled.

Issues vary a lot from one project to another, though. On a project meant to be used only by people already familiar with the domain, “it doesn’t work” issues are far less common. But instead of a free customer service center, the issue tracker can become a free consulting service, with people asking how to build an application or protocol, with your software being used somehow. It’s hard not to help. It’s hard to say no. So, spend time solving other people’s problems while you are struggling with your own, unrelated work? That’s the only way they will close the issue themselves. If you want it closed prematurely, you need a justification. “Sorry, I don’t have time” is not a justification for not leaving the ticket open. Even if this is the truth, and the best thing to do for your own mental health.

Open communication is great and necessary. And an issue tracker is definitely a very valuable tool. But it is a many-to-few topology, fueling a constant stream of negativity (in its form), which can eventually be mentally devastating.

Mitigations

Over the past 24 months, after repeated nervous breakdowns due to GitHub issues, I did a few things that helped a lot.

Probably the most important decision was to enforce a hard limit on the time spent daily handling issues, trying to convince myself that it was okay not to respond immediately.

With github-auto-locker, closed issues are automatically locked after a couple months. If an issue is closed, the case is over. Having a “me too” popping up on something discussed years ago is annoying. If there’s something new, please open a new ticket, especially since the software may have changed a lot since the original discussion and the context may be different.

After 30 days of inactivity, issues are closed with a timeout label. If there hasn’t been any activity for that long, it is unlikely that leaving the ticket open is going to change anything. Maybe it was a feature request that no one is interested in implementing. Maybe the reporter was asked for details but is never going to provide them. Maybe nobody knows how to answer the question or what it even means.

A timeout label doesn’t mean that the issue is going to be ignored. By opposition to a closed, unlabeled issue, it means that this is something to possibly give a new look at, later, when time permits. A lot of feature requests were closed in that state, but eventually got implemented later.

This helps a lot. It helps reducing the size of the task lists showing up every time you log in, with the same issues being displayed repeatedly, along with their age, reminding you that the clock is ticking. A shorter list, with fresher issues, is a little bit less depressing.

Finally, I learned to say “no” or “I don’t know.” Sometimes in a harsh way just for the sake of getting a ticket closed, but for my own sanity, that was necessary.

Benchmark of WebAssembly runtimes - 2021 Q1

2021-02-22T00:00:00+01:00

Libsodium has been fully supporting WebAssembly as a target for quite a long time. This includes its built-in benchmark suite, that can run both in web browsers and in a variety of standalone WebAssembly runtimes.

The benchmark covers different types of cryptographic primitives. Some are purely computational tasks, some are memory-hard, some require efficient register allocation, some require optimal instruction scheduling, some can greatly benefit from vectorization, some don’t benefit from vectorization at all. It also includes utility functions such as codecs.

At the end of the day, this benchmark may not be a bad representative of how different compilers would optimized real-world code. It can also help quantify the overhead of WebAssembly vs native code.

For the third time, WebAssembly runtimes were compared using this benchmark.

State of the WebAssembly runtimes in 2021

There are quite a lot of WebAssembly runtimes around. However, in early 2021, most of them are still in early stage, have been abandoned, or don’t support the full specification or WASI.

That can be easily explained. WebAssembly as a byte code and a memory model specification looks really simple and fun to implement, akin to writing a toy VM for a corewar game.

This is actually how it originally looked. Since then, WebAssembly evolved, and keeps evolving to become a full-fledged platform. Implementing a WebAssembly runtime has become significantly more complicated and time-consuming. Runtimes such as EOS kept the bytecode and chose to build their own ecosystem.

Meanwhile, ton of improvements were made by already well-established runtimes in the past few months. A good opportunity to run a new round of the libsodium benchmarks to see if major performance changes can be observed.

Older versions of the following runtimes had been tested in the previous rounds:

Additional runtimes for that round:

Second State VM (with some caveats)
V8 (nodeJS)
Intel WAMR (“fast interpreter” mode)
wasm3

The following runtimes were considered but couldn’t be used to run the benchmark:

InNative: no WASI support
fizzly: failed with imported function wasi_snapshot_preview1.poll_oneoff is required

All of these were compiled from their git code on 02/21/2021, in release mode, with the exception of NodeJS, that was installed using the stock precompiled packages. Latest stable Rust, latest Xcode, everything was up-to-date.

All the benchmarks are included in libsodium 1.0.18-stable released on 02/21/2021, and were run using the ./dist/wasm32-wasi.sh --bench command. The system compiler was LLVM 11.1.0, installed via Homebrew, and the output was automatically pre-optimized by the previous script with wasm-opt -O4. The exact same resulting WASM files were used with all the runtimes, on macOS and Linux. Individual runtimes were called by that script.

In order for the comparison between WebAssembly and native code to remain fair and representative of real-world performance, WebAssembly and native builds were compiled with the same, default optimization flags.

Compilers (JIT & AOT)

Good news: all the tests passed flawlessly on all runtimes. Previously, the same benchmark triggered bugs in some implementations, that have been fixed since.

WAVM remains the fastest WebAssembly runtime, beating the competition in every single test.

WAVM is based on LLVM. Intuitively, we may expect the LLVM backend of wasmer to perform pretty much the same, but WAVM was still about 15% faster than wasmer.

V8 (nodejs) being the second top performer was quite of a surprise. WAVM and wasmer-llvm compile everything ahead of time, taking the time to apply as many optimizations as they can. V8 has two WebAssembly compilers: LiftOff, a single-pass compiler, and TurboFan, a massively parallel compiler that applies more aggressive optimizations. The combination provides instant start-up. Then, optimal speed is not going to be reached immediately, but I didn’t expect TurboFan to be that good after it kicks in.

Previous V8 libsodium benchmarks relied on wasmer-js, but as Node now includes an experimental WASI implementation, this is what was used here.

wasmer with the singlepass backend also did really well here. It performing almost as well as the JIT backend was unexpected, and I ran these benchmarks twice to make sure that I didn’t get something wrong. A look at individual tests shows the same pattern for each of them, with two minor exceptions. singlepass is also very fast to compile.

Another surprise is wasmtime performance. Remember that it uses cranelift, a very young code generator, written from scratch. Its performance is already not far from LLVM-based runtimes, which is a very impressive achievement.

lucet is the slowest WebAssembly runtime. But in the previous rounds, it was neck-to-neck with wasmtime.

So, what happened? lucet’s performance likely didn’t change. What is more likely to have happened is that wasmtime got a brutal speed boost after switching to the latest cranelift version and the new backends it comes with. lucet and wasmer-cranelift are bound to see the same performance boost once they also upgrade their cranelift dependency.

Trying to confirm that theory, I updated the cranelift dependency to the 0.23.0 version, and enabled the new-x64-backend feature flag to use the new backend. It didn’t make any significant difference, so there has to be something else that wasmtime does and that lucet and wasmer don’t.

Interpreters

Among the existing WebAssembly interpreters, two of them have good performance and WASI support: wasm3 and wamr (Intel’s micro-runtime).

wasm3 is a really great piece of code. It is just a bunch of small, portable, zero-dependencies C code, that can easily be embedded into any project, including iOS applications. It can also itself be compiled to WebAssembly, which is pretty cool. System requirements: ~64Kb for code and ~10Kb RAM. This is ridiculously low, making wasm3 a decent choice for constrained environments. I also found it to be very convenient for debugging WebAssembly linking issues.

WAMR or iamr is another small runtime, developed by Intel. It is slightly more complicated to use, as it requires a specific toolchain to be compiled and has various features that can be chosen at compile-time. It can also leverage LLVM to provide JIT and AOT capabilities, which is a little bit disturbing and probably not why you would use such a project for. However, it just got a “fast interpreter”, so I had to give it a spin.

So, where are we?

Native code is about 30 times faster than wasm3. Let’s face it: this is quite good! We’re talking about an interpreter here. Versus native code highly optimized for a target CPU. And a WebAssembly transform in the way, requiring the interpreter to do bound checking everywhere.

wamr was slower than wasm3 on every single test. Native code is about 46 times faster. Which remains decent for an interpreter.

Battle of the LLVMs

SSVM (Second State VM) is an LLVM-based runtime focused on performance. It comes as two separate applications: a compiler, that generates a native shared library, and a runtime, that loads a library precompiled with the former tool. Like wasmer, it can insert “gas counting” operations, which is useful for smart contracts.

Unfortunately, SSVM only supports Linux/x86_64. So, I couldn’t run it on my laptop along with the other tests. I thus conducted another set of benchmarks, on a Linux/x86_64 machine, to compare LLVM-based runtimes.

Although wasmer’s LLVM backend is consistently faster than other LLVM-based runtimes, the resulting code is not the fastest.

wavm remains undethroned, but SSVM comes really close.

Unfortunately, there’s a catch. 5 of the tests crashed on SSVM with the following message:

ERROR [default] execution failed: unreachable, Code: 0x89

Understandably, SSVM is still a young project, so it may not be very stable yet.

ARM CPUs

The previous benchmarks were done on machines with an Intel CPU. But with ARM CPUs being omnipresent on mobile devices, and gaining a lot of traction on servers and laptops, I had to run quick benchmarks on an ARM-based CPU, too.

Libsodium (especially 1.0.18-stable) doesn’t have optimized code for ARM CPUs, so native code has little to no advantage over a WebAssembly equivalent.

This time, I used prepackaged software.

WAVM supports aarch64 CPUs, but doesn’t provide pre-compiled binaries, and compiling LLVM on my router would take forever. So, I had to pass on it for now.

lucet only supports x86_64, so, not an option either.

That leaves us with wasmtime, wasmer, node, and interpreters.

The llvm backend of wasmer failed with:

1: module instantiation failed (engine: jit, compiler: llvm)
2: Compilation error: unknown ELF relocation 283

native/jit didn’t work any better:

Compilation error: Architecture aarch64 not supported

singlepass maybe? Nope:

The `singlepass` compiler is not included in this binary.

The binary was installed via the curl command indicated on the project’s home page. Sorry Wasmer, but I have to give up at that point. So, only the cranelift backend could be tested.

On this platform, with benchmarks not having platform-specific optimizations, wasmtime runs the test suite at ~54% the speed of native code. Not bad at all!

Node is a little bit behind, running at around 49% the speed of native code.

Even with the same backend, wasmer is behind wasmtime, even on aarch64. wasmtime really got some nice performence improvements recently!

Of course, wasm3 runs fine on aarch64, but at about 3% of native speed.

Individual results

Individual results can be downloaded here. With minor exceptions, runtimes ranked the same in every test. The WebAssembly files used for these benchmarks can also be downloaded here. They all print their own execution time, excluding initialization.

Verdict

These new results are interesting. Amazing work has been done by all runtimes in the past few months, both to implement new features and to improve performance.

If you are looking for a general recommendation for running WebAssembly code in a headless environment, here we go:

-> Just use Node.

Not exactly what I was expecting. But Node is probably a good answer for most people. V8 has become a really damn good engine to run WebAssembly code, Node has a working WASI implementation, so there is absolutely nothing wrong in using Node, even just to run a pure WebAssembly application.

This is a boring and conservative choice. But it does the job. Node has the advantage of being already available in most operating systems/linux distros. Instant integration with JavaScript is also a big plus. And you’ll be only one step away from also running your code in a web browser or Electron, with comparable performance.

This advice may come a bit premature, though. As of today, WASI support in Node still has the “experimental” tag. But it’s there, and it works. It may not be the most memory-efficient solution, though. So, if this is a concern, you may want to look at alternatives.

Are you looking for the best possible performance to build a “serverless” infrastructure? WAVM may be better a choice, combined with “snapshots” of the linear memory made after initialization, as done in the excellent FAASM.

For untrusted applications, and if you are into Rust, you may want to consider lucet and its unique ability to stop/resume/reschedule execution.

Are you looking for a one-runtime-to-rule-them-all thing for multiple components of your application, written in a variety of programming languages? wasmer will probably be the easiest to integrate.

Are you looking for a way to test the latest WebAssembly proposals? Check out wasmtime. Ditto if you need to run WebAssembly modules from Zig. wasmtime is fast, even on ARM CPUs, although I wish I could test on one of these fancy new Apple M1 machines.

Finally, if you need something simple, lightweight, that works everywhere, wasm3 is your friend.