Introduction and index of this series is here.
Someone at work posted a “Web Development With Assembly” meme as a joke, and I pulled off a “well, actually” card pointing to WebAssembly. At that point I just had to make my toy path tracer work there.
So here it is: aras-p.info/files/toypathtracer
Porting to WebAssembly
The “porting” process was super easy, I was quite impressed how painless it was. Basically it was:
- Download & install the official Emscripten SDK, and follow the instructions there.
- Compile my source files, very similar to invoking
clangon the command line, just Emscripten compiler is
emcc. This was the full command line I used:
emcc -O3 -std=c++11 -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s EXTRA_EXPORTED_RUNTIME_METHODS='["cwrap"]' -o toypathtracer.js main.cpp ../Source/Maths.cpp ../Source/Test.cpp
- Modify the existing code to make both threads & SIMD (two things that Emscripten/WebAssembly lacks at the moment) optional. Was just a couple dozen lines of code starting here in this commit.
- Write the “main” C++ entry point file that is specific for WebAssembly, and the HTML page to host it.
How to structure the main thing in C++ vs HTML? I basically followed the “Emscripting a C library to Wasm” doc by Google, and “Update a canvas from wasm” Rust example (my case is not Rust, but things were fairly similar). My C++ entry file is here (main.cpp), and the HTML page is here (toypathtracer.html). All pretty simple.
And that’s basically it!
Ok how fast does it run?
At the moment WebAssembly does not have SIMD, and does not have “typical” (shared memory) multi-threading support.
The Web almost got multi-threading at start of 2018, but then Spectre and Meltdown happened, and threading got promptly turned off. As soon as you have ability to run fast atomic instructions on a thread, you can build a really high precision timer, and as soon as you have a high precision timer, you can start measuring things that reveal what sort of thing got into the CPU caches. Having “just” that is enough to start building basic forms of these attacks.
By now the whole industry (CPU, OS, browser makers) scrambled to fix these vulnerabilities, and threading might be coming back to Web soon. However at this time it’s not enabled by default in any browsers yet.
All this means that the performance numbers of WebAssembly will be substantially lower than other CPU implementations – after all, it will be running on just one CPU core, and without any of the SIMD speedups we have done earlier.
Anyway, the results I have are below (higher numbers are better). You can try yourself at aras-p.info/files/toypathtracer
|Intel Core i9 8950HK 2.9GHz (MBP 2018)||macOS 10.13||Safari 11||5.8|
|Intel Xeon W-2145 3.7GHz||Windows 10||Chrome 70||5.3|
|AMD ThreadRipper 1950X 3.4GHz||Windows 10||Firefox 64||4.7|
|iPhone XS / XR (A12)||iOS 12||Safari||4.4|
|iPhone 8+ (A11)||iOS 12||Safari||4.0|
|iPhone SE (A9)||iOS 12||Safari||2.5|
|Galaxy Note 9 (Snapdragon 845)||Android 8.1||Chrome||2.0|
|iPhone 6 (A8)||iOS 12||Safari||1.7|
For reference, if I turn off threading & SIMD in the regular C++ version, I get 7.0Mray/s on the Core i9 8950HK MacBookPro. So WebAssembly at 5.1-5.8 Mray/s is slightly slower, but not “a lot”. Is nice!
All code is on github at