One issue with Wasm is you essentially can't target it with a single-pass compiler, unlike just about any real machine. Wasm can only represent reducible control flow, so you have to pass your control-flow graph through some variation of the Relooper[1,2]. I don't know if upstream tcc can do that (there are apparently some forks?..).
> you essentially can't target it with a single-pass compiler,
That might be true if your source language has goto, but for other languages that start with structured control flow, it's possible to just carry the structure through and emit Wasm directly from the AST.
Sure, I was speaking in the context of C specifically. (In non-simplistic compilers, you may not want to preserve the source structure anyway—e.g. in Scheme or Lua with tail calls all over the place.)
If all you want to do is compile and run c code in the browser you could run tcc in the blink x86_64 emulator, running in wasm.
It would take ~300Kb, less than the js & css used in the average webpage
The whole LLVM toolchain is a bit big. I think we can reduce much more the size. We actually researched on using tcc but unfortunately tcc doesn’t have a wasm backend (for generating wasm output). It would be awesome if they added it!
Cranelift is a fast, secure, relatively simple and innovative compiler backend. It takes an intermediate representation of a program generated by some frontend and compiles it to executable machine code. Cranelift is meant to be used as a library within an "embedder".
It is in successful use by the Wasmtime WebAssembly virtual machine, for just-in-time (JIT) and ahead-of-time (AOT) compilation, and also as an experimental backend for the Rust compiler.
Cranelift is an optimizing compiler, but it aims to take a fresh look at which optimizations are necessary. We have explicitly avoided features -- such as advanced alias analysis or use of undefined behavior -- that have historically led to subtle miscompilations in other compilers. Cranelift consists of about 200 thousand lines of code; in contrast, e.g. LLVM consists of over 20 million lines of code, a hundred times larger. This difference also allows Cranelift to be relatively approachable to developers, researchers, auditors and others who wish to understand how it works.
I recently wanted to use tcc for a homebaked programming sideproject and was surprised to find it's no longer supported anymore, at least not by Fabrice Bellard. Upstream git still has some light activity but no releases. I wasn't sure how good of an idea it is to rely on it as a code generator.
Very cool! I've been watching the "toolchains in Wasm" landscape for a while, and seeing a Clang/LLVM toolchain running in Wasm is awesome!
YoWASP has also had an LLVM toolchain working in Wasm for a while too[1], although it seems like this version solves the subprocess problem by providing an implementation of `posix_spawn` whereas the YoWASP one uses some patches to avoid subprocesses altogether
My biggest question marks around this version are about runtime/platform support. As I understand it, this toolchain uses WASIX, which (AFAICT) works with Wasmer's own runtime and with a browser shim, but with none of the other runtimes. Are there plans to get WASIX more widely adopted across more runtimes, or to get WASIX caught up to the latest WASI standard (preview2)? Or maybe even better, bring the missing features from WASIX to mainline WASI like `posix_spawn`[2]? I'd love to be able to adopt this toolchain, but it doesn't seem like WASIX support has really caught on across the other runtimes
A few weeks ago, I tried to compile Clang to WebAssembly, but got several different errors, and tried fixing a lot of them, but some of them seemed kind of impossible to fix, so I thought I would try again at a later date. However it seems I will not need to try again. I feel angry that someone made a convenient solution before I did, but also happy, because this probably implies that they made a consistent process to compile Clang for WASM.
It's pretty misleading not to mention the performance overhead. That's an obvious downside and quite easy to benchmark. Skipping any discussion of performance feels like sweeping it under the marketing rug :/
It’s sadly a bit more of a proof of concept than a hackable project. The docker build in the readme did work last time I tried, and there is a demo site at https://jprendes.github.io/emception/, but I’ve failed to modify it in the past to do other things
It is possible provided some care. I was looking into this with WAForth which compiles the wasm and loads it via a host function (ie. it is the hosts responsibility to make it available). I wanted to enable dynamic loading of words from disk which requires some book keeping and shuffling a bunch of bytes around during compilation to write out the bits necessary to have the host do that linking. It isn't impossible to do, just tedious and in my case, having to write it in WAT is a pain.
The Clang WASI SDK weights about 100Mb compressed. We optimized things a bit but still have a way to go (we are not yet compressing in the network). I believe we can serve everything in about 30Mb
They insist on it because it is the proper way to measure data rates on serial bit streams where out-of-band encoding doesn't divide up on octet boundaries.
Like most things in software the use cases are the limits of one's imagination. The browser has always been a Turing complete development environment so this is just another demonstration.
Now all this needs is a simple OS running in a browser, that can edit and compile itself, post the resulting binary onto a WebDAV somewhere, and reload itself from there.
Then it becomes a fully self-sustaining OS that can live forever in a browser.
All you need is a virtual filesystem of some sort, a way to download, a way to upload, an editor, a compiler, and a VT100 JS library. We already have WASI for the rest.
If the JS is too undesired, then perhaps go the old framebuffer graphics mode (e.g. a region of the WASM memory that is interpreted as an ASCII screen, or maybe even as a full bitmap buffer). Then JavaScript side just needs to forward keyboard/mouse into memory and that screen region out of memory.
WASIX already does all the other stuff you mentioned, including in the browser. The one thing it's missing is GUI, mainly because there's no standard GUI interface in POSIX.
Do you have a proper link to the webtransport-p2p idea? I've done a few searches but I think there's some mix of current implementation and deprecated implementation somehow.
I don't know why it's fallen off, to be honest, or what was raised against it. Highly desireable to a lot of p2p folk, a very promising webrtc datatransport replacement.
Couldn’t a tcc or similarly simple C compiler be used instead of a 100MB Clang? Where’s the C to wasm compiler hiding?
One issue with Wasm is you essentially can't target it with a single-pass compiler, unlike just about any real machine. Wasm can only represent reducible control flow, so you have to pass your control-flow graph through some variation of the Relooper[1,2]. I don't know if upstream tcc can do that (there are apparently some forks?..).
[1] http://troubles.md/why-do-we-need-the-relooper-algorithm-aga...
[2] https://medium.com/leaningtech/solving-the-structured-contro...
> you essentially can't target it with a single-pass compiler,
That might be true if your source language has goto, but for other languages that start with structured control flow, it's possible to just carry the structure through and emit Wasm directly from the AST.
Sure, I was speaking in the context of C specifically. (In non-simplistic compilers, you may not want to preserve the source structure anyway—e.g. in Scheme or Lua with tail calls all over the place.)
Presumably C's `switch` is also a problem.
Yes, I don't recall all the confusing elements and technicalities of what's allowed in Switch statements in C offhand but here are a few brainfscks:
https://old.reddit.com/r/C_Programming/comments/16kg48y/mind...
https://old.reddit.com/r/programminghorror/comments/ylc7f3/w...
I went down a rabbithole and wow.
Found a comment from the author of https://github.com/stclib/STC apparently and then came across this example:
https://stackoverflow.com/a/76887723
gcc -E -ISTC/include co.cAfter running it through a preprocessor, it gives me this.
This is true. In Theta (https://github.com/ThetaLang/Theta) this is exactly what we do -- no need for more than one pass for the WASM codegen.
If all you want to do is compile and run c code in the browser you could run tcc in the blink x86_64 emulator, running in wasm. It would take ~300Kb, less than the js & css used in the average webpage
The whole LLVM toolchain is a bit big. I think we can reduce much more the size. We actually researched on using tcc but unfortunately tcc doesn’t have a wasm backend (for generating wasm output). It would be awesome if they added it!
Check out https://github.com/tyfkda/xcc, I've only used the native backend, but it's small and fast.
Nice! I didn’t know the project. Thanks for sharing!
This project is also very much worth checking out.
https://cranelift.dev/
From the page:
Cranelift is a fast, secure, relatively simple and innovative compiler backend. It takes an intermediate representation of a program generated by some frontend and compiles it to executable machine code. Cranelift is meant to be used as a library within an "embedder".
It is in successful use by the Wasmtime WebAssembly virtual machine, for just-in-time (JIT) and ahead-of-time (AOT) compilation, and also as an experimental backend for the Rust compiler.
Cranelift is an optimizing compiler, but it aims to take a fresh look at which optimizations are necessary. We have explicitly avoided features -- such as advanced alias analysis or use of undefined behavior -- that have historically led to subtle miscompilations in other compilers. Cranelift consists of about 200 thousand lines of code; in contrast, e.g. LLVM consists of over 20 million lines of code, a hundred times larger. This difference also allows Cranelift to be relatively approachable to developers, researchers, auditors and others who wish to understand how it works.
I recently wanted to use tcc for a homebaked programming sideproject and was surprised to find it's no longer supported anymore, at least not by Fabrice Bellard. Upstream git still has some light activity but no releases. I wasn't sure how good of an idea it is to rely on it as a code generator.
It's alive and kicking my friend https://repo.or.cz/tinycc.git/shortlog
We wait for grischka to decide when to announce a new release https://lists.nongnu.org/archive/html/tinycc-devel/2024-10/m...
I see thanks, that's great.
clang can target wasm already.
Very cool! I've been watching the "toolchains in Wasm" landscape for a while, and seeing a Clang/LLVM toolchain running in Wasm is awesome!
YoWASP has also had an LLVM toolchain working in Wasm for a while too[1], although it seems like this version solves the subprocess problem by providing an implementation of `posix_spawn` whereas the YoWASP one uses some patches to avoid subprocesses altogether
My biggest question marks around this version are about runtime/platform support. As I understand it, this toolchain uses WASIX, which (AFAICT) works with Wasmer's own runtime and with a browser shim, but with none of the other runtimes. Are there plans to get WASIX more widely adopted across more runtimes, or to get WASIX caught up to the latest WASI standard (preview2)? Or maybe even better, bring the missing features from WASIX to mainline WASI like `posix_spawn`[2]? I'd love to be able to adopt this toolchain, but it doesn't seem like WASIX support has really caught on across the other runtimes
[1]: https://discourse.llvm.org/t/rfc-building-llvm-for-webassemb... [2]: https://github.com/WebAssembly/WASI/issues/414
A few weeks ago, I tried to compile Clang to WebAssembly, but got several different errors, and tried fixing a lot of them, but some of them seemed kind of impossible to fix, so I thought I would try again at a later date. However it seems I will not need to try again. I feel angry that someone made a convenient solution before I did, but also happy, because this probably implies that they made a consistent process to compile Clang for WASM.
It's pretty misleading not to mention the performance overhead. That's an obvious downside and quite easy to benchmark. Skipping any discussion of performance feels like sweeping it under the marketing rug :/
> Skipping any discussion of performance feels like sweeping it under the marketing rug
Expecting performance while compiling C in the browser feels redundant right now though.
You can compile C using JavaScript and target DOS if you are hard core enough. https://github.com/Mati365/ts-c-compiler
Didn't Gary Bernhardt do this in 2014? /sarcasm
reference https://www.destroyallsoftware.com/talks/the-birth-and-death...
GCC? That's easy! :-) What about a complete system? https://webvm.io
Shameless plug: we are hosting a WebVM Hackathon next week (11-14 October) over Discord. For more information: https://cheerpx.io/hackathon
Can I compete with exaequOS?
No, the competition is explicitly based around CheerpX: an X86 virtualization technology built on top of WebAssembly
What about something that doesn't even require WebAssembly and is faster? https://bellard.org/jslinux/
very unscientific benchmark of `clang hello.c`, after a few runs to make sure the code is downloaded/cached:
jslinux: 4.7s
wasmer: 1.3s
webvm: 1.2s
Is it possible/already existing to have interactive C++ lessons where the user's C++ code is compiled an run client-side in a web page?
Absolutely! You can even run clang in wasm targeting x86_64, and then emulate the resulting program using the blink x86_64 emulator.
I'm working on something similar, where students can compile intel assembly and run it client-side: https://github.com/robalb/x86-64-playground
See: emception
https://github.com/jprendes/emception
Thanks, I'm seeing but the documentation is so scarce and I'm not a proefficient C expert.
What syntax can be used to run emception? Thank you.
It’s sadly a bit more of a proof of concept than a hackable project. The docker build in the readme did work last time I tried, and there is a demo site at https://jprendes.github.io/emception/, but I’ve failed to modify it in the past to do other things
There is a fork at https://github.com/emception/emception that is trying to make it more production ready, but it looks like that may have stalled
Definitely been possible for at least 5 years now. Would probably be a weekend project now.
Not really, on Firefox
If what I want is not an executable but a shared library, does this get me anything?
I currently have a use case that uses a server running an emscripten build (using SMODULARIZE and some exports, I suppose it’s not a true dylib)
Importing a wasm module from a wasm module is (non)surprisingly impossible to do -- you have to have a linker, abi and all that.
It is possible provided some care. I was looking into this with WAForth which compiles the wasm and loads it via a host function (ie. it is the hosts responsibility to make it available). I wanted to enable dynamic loading of words from disk which requires some book keeping and shuffling a bunch of bytes around during compilation to write out the bits necessary to have the host do that linking. It isn't impossible to do, just tedious and in my case, having to write it in WAT is a pain.
Yep, you need to do the nasty bits by hand, that's what I mean.
100MB on every page refresh just to compile C is a pretty bold direction to go in.
Except if/when it's cached.
I don’t want my cache requirements ballooning by 100mb.
> note: it requires a 100MB download
Is this how big a clang toolchain usually is?
The Clang WASI SDK weights about 100Mb compressed. We optimized things a bit but still have a way to go (we are not yet compressing in the network). I believe we can serve everything in about 30Mb
MB, right? Mb is megabit
I only have to bring this up because network providers still insist on measuring bits
They insist on it because it is the proper way to measure data rates on serial bit streams where out-of-band encoding doesn't divide up on octet boundaries.
They insist on it because big number sell better.
Cling (the interactive C++ interpreter) should also compile to WASM.
There's a xeus-cling Jupyter kernel, which supports interactive C++ in notebooks: https://github.com/jupyter-xeus/xeus-cling
There's not yet a JupyterLite (WASM) kernel for C or C++.
What's the use case?
Like most things in software the use cases are the limits of one's imagination. The browser has always been a Turing complete development environment so this is just another demonstration.
I was also asking exactly the same question.
Every few years, new progresses might remind me of this talk by Gary Bernhardt:
https://www.destroyallsoftware.com/talks/the-birth-and-death...
Yeah, mostly because WebAssembly is the new kid in bytecode town.
Now all this needs is a simple OS running in a browser, that can edit and compile itself, post the resulting binary onto a WebDAV somewhere, and reload itself from there.
Then it becomes a fully self-sustaining OS that can live forever in a browser.
Something like exaequOS? https://exaequos.com
Check out Jeff Lindsay's Wanix project: https://wanix.sh/
Very interesting idea but I have to say that those goals are not possible with a simple OS, at least by OS definitions of simple :P
The old https://webassembly.sh/ and the new https://wasmer.sh/ came a long way already.
All you need is a virtual filesystem of some sort, a way to download, a way to upload, an editor, a compiler, and a VT100 JS library. We already have WASI for the rest.
If the JS is too undesired, then perhaps go the old framebuffer graphics mode (e.g. a region of the WASM memory that is interpreted as an ASCII screen, or maybe even as a full bitmap buffer). Then JavaScript side just needs to forward keyboard/mouse into memory and that screen region out of memory.
The framebuffer idea is used in this wasm doom port: https://github.com/diekmann/wasm-fizzbuzz/tree/main/doom
WASIX already does all the other stuff you mentioned, including in the browser. The one thing it's missing is GUI, mainly because there's no standard GUI interface in POSIX.
It is possible. I already embedded chibicc in exaequOS. I will continue with xcc and clang
And then use webrtc (or ideally someone can revive a webtransport-p2p please!) to serve itself from a page to other people.
Ideally http3 over webtransport-p2p!
Then add some network discovery so we can advertise & find what's available on our networks!
Do you have a proper link to the webtransport-p2p idea? I've done a few searches but I think there's some mix of current implementation and deprecated implementation somehow.
What is it that needs reviving?
The spec is inactive, afaik, no implementations. Got it backwards, pardon, p2p-webtransport. https://github.com/w3c/p2p-webtransport
I don't know why it's fallen off, to be honest, or what was raised against it. Highly desireable to a lot of p2p folk, a very promising webrtc datatransport replacement.
"Yeah, yeah, but your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should." ....
"We do what we must
Because we can"