Frank DENIS random thoughts.

Updated WebAssembly benchmark using libsodium

WebAssembly runtimes keep improving every day and quite a few things have changed since the last time I benchmarked WebAssembly runtimes using libsodium.

So maybe now is a good time for a new run.

People suggested that wasm-opt from binaryen should be run before compiling WebAssembly code. This step was thus added.

This benchmark can be reproduced using the dist-build/wasm32-wasi.sh --bench command from libsodium 1.0.18-stable.

Compared runtimes:

  • WAVM 2019-10-21 (default configuration)
  • Lucet 0.3.0 (with –opt-level 2)
  • Wasmtime 0.2.0 (with optimizations turned on)
  • Wasmer 0.8.0 and 0.9.0 (Singlepass/Cranelift/LLVM backends)

Results:

  • WAVM beats everything else, hands down.
  • Wasmer’s singlepass backend compiles quickly, but, as expected, is consistently way slower than other backends and runtimes.
  • Even though Wasmtime, Lucet and Wasmer/Cranelift use the same code generator (Cranelift), their performance is not identical. Wasmtime is the slowest, followed by Lucet neck-to-neck with Wasmer. Different versions of Cranelift and different WASI implementations may explain the difference.
  • I was expecting Wasmer with its LLVM backend to perform the same as WAVM, also based on LLVM. Before version 0.9, Wasmer was noticeably slower, but they quickly found and addressed the issue. It’s now pretty close to WAVM, although the latter remains the fastest runtime to date.

WAVM is not only a speed daemon, but it is also the most featureful implementation yet, fully supporting WebAssembly 1.0, WASI, and more extensions than any other runtime (SIMD, threads, reference types, exceptions, sign extension, multiple memories…)

It is also quite portable, and comes with an extensive C and C++ API to be easily embedded in other projects. Its development pace is also quite impressive.

Test lucet native wasmer wasmer-llvm wasmer-singlepass wasmer09-llvm wasmtime wavm
aead_chacha20poly1305 50.68 17.66 52.68 57.74 100.0 34.52 56.84 25.01
aead_chacha20poly13052 51.9 21.27 56.76 58.7 100.0 33.44 54.22 26.81
aead_xchacha20poly1305 43.29 18.24 44.77 47.83 100.0 33.23 46.23 23.15
auth 58.19 13.94 51.32 40.43 100.0 30.93 57.62 17.69
auth2 60.55 18.59 72.86 46.23 100.0 37.06 67.21 21.11
auth3 77.19 19.61 65.79 47.56 100.0 39.97 69.8 21.8
auth5 61.82 17.32 58.98 43.61 100.0 32.16 66.07 22.48
auth6 56.26 28.91 67.22 44.94 100.0 44.29 67.68 36.65
auth7 54.81 17.32 58.92 51.92 100.0 33.05 67.62 21.01
box 61.35 24.33 75.08 55.27 100.0 42.55 71.76 32.02
box2 43.67 15.02 44.35 39.35 63.88 27.35 100.0 20.63
box7 62.14 11.76 69.53 54.42 100.0 42.56 78.37 32.69
box8 61.09 11.7 68.11 53.53 100.0 42.28 72.2 32.24
box_easy 61.3 11.68 71.99 53.05 100.0 42.42 70.63 32.27
box_easy2 50.25 11.86 62.67 55.21 100.0 49.02 65.63 31.18
box_seal 62.32 15.15 73.48 51.39 100.0 40.3 76.2 30.29
box_seed 85.53 25.57 74.55 52.58 100.0 41.89 76.41 28.12
chacha20 100.0 34.54 98.8 100.0 FAIL 61.89 96.21 55.55
codecs 62.4 28.16 62.47 55.39 100.0 38.23 60.43 31.23
core1 64.52 58.06 58.06 58.06 100.0 51.61 54.84 45.16
core2 65.62 50.0 53.12 59.38 100.0 46.88 53.12 43.75
core3 55.23 20.87 64.18 43.18 100.0 35.92 66.21 25.26
core4 45.45 14.05 15.7 100.0 50.41 12.4 15.7 11.57
core5 90.62 56.25 56.25 84.38 100.0 43.75 56.25 43.75
core6 51.35 100.0 51.35 62.16 94.59 43.24 94.59 37.84
core_ed25519 60.41 17.45 66.05 54.28 100.0 41.59 69.34 33.44
core_ristretto255 61.25 18.49 67.56 55.1 100.0 42.78 70.09 33.04
ed25519_convert 63.7 17.26 70.9 54.26 100.0 42.6 74.24 31.19
generichash 49.32 15.7 53.74 51.27 100.0 32.09 59.02 24.11
generichash2 60.56 9.39 56.43 39.11 100.0 31.0 64.36 18.87
generichash3 43.3 8.57 50.28 37.04 100.0 28.96 53.18 17.33
hash 36.57 11.37 100.0 26.29 63.84 31.97 40.9 12.91
hash3 60.38 20.38 81.51 38.49 100.0 36.98 65.66 19.25
kdf 49.0 10.3 52.21 43.18 100.0 31.82 62.91 19.62
keygen 43.72 20.82 42.83 41.16 100.0 26.96 44.63 19.08
kx 61.74 16.01 83.51 53.01 100.0 42.23 72.26 32.83
metamorphic 55.01 16.99 60.13 46.22 100.0 34.07 61.75 24.02
onetimeauth 56.46 40.82 85.71 84.35 100.0 63.27 59.18 29.93
onetimeauth7 54.85 32.67 56.73 57.68 100.0 40.16 59.71 35.51
pwhash_argon2i 33.55 7.83 54.59 48.62 100.0 35.65 56.24 23.58
pwhash_argon2id 34.2 8.25 56.19 50.58 100.0 35.69 58.47 23.2
pwhash_scrypt 48.95 12.08 58.7 42.76 100.0 44.77 65.75 25.38
pwhash_scrypt_ll 48.33 12.87 59.23 42.54 100.0 44.57 66.02 25.52
randombytes 60.79 24.91 73.75 68.91 100.0 47.43 76.69 42.17
scalarmult 68.74 20.26 69.25 51.67 100.0 43.26 72.77 30.28
scalarmult2 67.57 25.68 72.97 50.38 100.0 41.94 80.5 28.28
scalarmult5 63.56 12.17 71.6 55.31 100.0 44.59 74.73 33.27
scalarmult6 62.56 12.19 70.3 54.95 100.0 42.47 69.93 34.82
scalarmult7 63.78 11.96 69.79 54.34 100.0 44.03 72.6 32.58
scalarmult8 62.61 18.24 67.83 54.28 100.0 42.45 71.44 32.5
scalarmult_ed25519 61.73 15.05 71.44 53.44 100.0 42.32 73.14 32.38
scalarmult_ristretto255 62.49 14.45 70.45 52.42 100.0 41.73 74.41 30.68
secretbox 51.36 26.89 70.09 65.26 100.0 43.96 58.01 33.08
secretbox2 29.99 15.22 28.95 32.76 100.0 24.68 29.18 15.22
secretbox7 51.9 16.02 55.52 56.37 100.0 37.45 58.1 31.02
secretbox8 53.14 38.03 56.43 58.82 100.0 39.44 58.66 34.84
secretbox_easy 55.1 23.9 56.36 57.67 100.0 37.83 64.6 46.08
secretbox_easy2 66.65 24.02 56.55 56.52 100.0 46.69 54.22 48.52
secretstream_xchacha20poly1305 89.79 31.7 96.58 100.0 FAIL 63.66 95.61 55.95
shorthash 44.2 34.98 54.44 62.63 100.0 42.32 45.39 32.94
siphashx24 51.06 34.89 48.23 64.54 100.0 36.88 43.26 35.6
sodium_utils 42.92 25.94 49.19 68.04 100.0 32.74 49.97 28.03
stream 55.44 17.01 62.42 45.77 100.0 37.0 66.03 26.29
stream2 54.48 20.16 62.28 42.64 100.0 35.7 65.33 24.53
stream3 49.07 47.22 69.44 51.85 100.0 52.78 64.81 31.48
stream4 49.1 24.55 74.73 55.96 100.0 49.46 51.62 27.08
verify1 39.32 14.05 45.45 38.39 100.0 28.03 53.47 22.78
xchacha20 78.6 26.12 90.47 72.59 FAIL 54.69 100.0 42.07

Download more readable results as a PDF file here: WebAssembly benchmark

If speed is your primary concern, choosing WAVM should be a no brainer.

On the other hand, Wasmer’s ecosystem is second to none, already providing excellent libraries for many programming languages and applications (PostgreSQL!) and a well-designed package system.

Wasmtime can generate debugging information, ready to use with LLVM. This makes it a great choice for development.

And Lucet was designed to run multiple instances on the same thread. To do so, it provides a way for an application to pause (yield), so that it can be resumed later by the host. That feature seems to be missing from other runtimes.