- I jot down various silly ideas for MuScript in the future here
% id = "01HA0GPJ8B05TABAEC9JV73VRR"
+ parallelization with an event loop
% id = "01HA0GPJ8B48K60BWQ2XZZ0PB5"
+ the thing with a pass-based architecture is that with enough locking, it *may* be easy to parallelize.
% id = "01HA0GPJ8B5QB6DF8YVTYC2HJY"
+ I can imagine parallelization existing on many levels here.
% id = "01HA0GPJ8BMQ8JAH09YGC3Y7VW"
- if you have a language like Zig where every line can be tokenized independently, you can spawn a task per line.
% id = "01HA0GPJ8BKGJ15YYNS9C1QRYB"
- you could analyze all the type declarations in parallel. though with dependencies it gets hairy.
% id = "01HA0GPJ8B2984NNB7E8X0Z1X7"
- you could analyze the contents of classes in parallel.
% id = "01HA0GPJ8BFYNKVA6Z923E2370"
- thing is, this stuff gets pretty hairy when you get to think about dependencies between all these different stages.
% id = "01HA0GPJ8B915YCDR0MZSW5RN1"
- hence why we haven't seen very many compilers that would adopt this architecture; most of them just sort of do their thing on one thread, and expect to parallelize by spawning more processes
of `cc` or `rustc` or what have you.
% id = "01HA0GPJ8BHF1KRM8KGFMT1875"
+ where with a pass-based architecture the problem is dependencies between independent stages you're trying to parallelize, with a query-based architecture like MuScript's it gets even harder,
because the entire compiler is effectively a dependency ~~hell~~ machine.
% id = "01HA0GPJ8B20D3MKV6TMB19GW2"
- one query depends on 10 subqueries, which all depend on their own subqueries, and so on and so forth.
% id = "01HA0GPJ8BA4T1C36R8WFJ3G2F"
- but there's _technically_ nothing holding us back from executing certain queries in parallel.
% id = "01HA0GPJ8B3FGTM417H46N0EXD"
- in fact, imagine if we executed _all_ queries in parallel.
% id = "01HA0GPJ8BENERTESAFQJ7G3R9"
+ enter: the `async` event loop compiler
% id = "01HA0GPJ8BF9E0MW1WS2EJY0J8"
- there is a central event loop that distributes tasks to be done to multiple threads
% id = "01HA0GPJ8B5WA232A319JTEFM2"
- every query function is `async`
% id = "01HA0GPJ8BQ15PT2M4BTA1ZNDX"
- meaning it suspends execution of the current function, writes back "need to compute the type ID of `Person` because that's not yet available" into a concurrent set (very important that it's a _set_) - let's call this set the "TODO set"
% id = "01HA0GPJ8B34GP8TD21AY8NXS1"
- on the next iteration, the event loop spawns a task for each element of the TODO set, and the tasks compute all the questions asked
% id = "01HA0GPJ8BKPHBPN44HM6WHKDJ"
- because we're using a set, the computation is never duplicated; remember that if an answer has already been memoized, it does not spawn a task and instead returns the answer immediately
% id = "01HA0GPJ8B0V2VJMAV1YCQ19Q8"
- though this may be hard to do with Rust because, as far as I know, there is no way to suspend a function conditionally? **(needs research.)**
% id = "01HA0GPJ8BBREEJCJRWPJJNR3N"
- once there are no more tasks in the queue, we're done compiling
+ this is because UnrealScript has some fairly idiosyncratic syntax which requires us to treat _some_ things in braces `{}` as strings, such as `cpptext`
% id = "01HAS9RREBQY6AWTXMD6DNS9DF"
- ```unrealscript
cpptext
{
// this has to be parsed as C++ code, which has some differing lexing rules
- we even lex variable metadata `var int Something <ToolTip=bah>;` using the lexer, storing invalid characters and errors as some `InvalidCharacter` token kind or something
+ and that's without emitting diagnostics - let the parser handle those instead
% id = "01HAS9RREBWZKAZGFKH3BXE409"
- one place where the current approach of the lexer eagerly emitting diagnostics fails is the case of `<ToolTip=3D location>`, where `3D` is parsed as a number literal with an invalid suffix and thus errors out