diff --git a/content/fmt2.dj b/content/fmt2.dj index 5399a67..8fe4649 100644 --- a/content/fmt2.dj +++ b/content/fmt2.dj @@ -19,7 +19,8 @@ So I started thinking about how I could improve on that. I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\ Until today. -This write-up describes this alternative version of the library in detail. +This write-up describes this an improved version of the library in detail, with support for positional arguments, and much smaller generated code size! +I hope you like it. ## Usage @@ -63,7 +64,7 @@ assert(strcmp( ) == 0); ``` -Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, and never allows reading arguments out of bounds. +Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, gives back the amount of characters that would be written, and never allows reading arguments out of bounds. As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`). This library does the same thing, though using the new format string syntax. @@ -153,10 +154,10 @@ This trick with a template erasing `T*` into `void*` is actually really useful i If there's any knowledge worth remembering from this article, it would be this technique. Aside from that, we once again make use of parameter packs. -This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an ordinary [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers). +This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers). -Among the `const` soup, you may notice the `ffuncs` array being `static const`. -This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's `.rodata` section instead of copying the function pointers onto the stack. +Among the [`const` soup](https://cdecl.org/), you may notice the `ffuncs` array being `static const`. +This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's read-only data section, instead of generating code to write the function pointers onto the stack. Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered. @@ -300,9 +301,14 @@ The same [extra goodies][page:fmt#Extras] as in the previous post can be used (i This library is slightly larger than the previous, being 73 lines of code long. I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though. -I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries. +...is what I would say if I really _cared_ about squeezing every last line of code out of the library, but I don't! +I wrote this little library to be simple, extensible, and maintainable, so don't treat it as code golf. +Go with the extra lines of code. +This version of the library is better. +It's in the same ballpark either way, and honestly I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.\ And `printf`. + Don't forget about `printf`. @@ -322,20 +328,241 @@ If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-b ### Code size The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly. -Instead, it initialises the lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values. + +The new string formatter instead initialises a lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values. This is a good thing for embedded use cases. You should prefer this implementation over the previous one for that. -I couldn't get clang to inline references to `write_value` into the function table, though. -There's always an intermediary function generated from the instantiation of `write_value_erased`. +In my game, the size emitted into the executable for instantiations of `format` is *61.7%* of the previous version! +Read on for a detailed analysis. -This can get pretty bad when passing array types into format arguments (e.g. string literals)---generating many redundant variants of the function, like `write_value_erased`, `write_value_erased`, etc. +--- -I believe this is due to function pointers of different types not always being interchangeable according to the C++ standard. -You could work around this little inefficiency by casting between function pointers, and it _does_ [seem](https://stackoverflow.com/a/559671) [safe](https://stackoverflow.com/q/11647220) to me in this case (the function pointers have the same return type, the same number of arguments, and the types of arguments `const void*` and `const char*` are compatible)---but I don't really think it's worth it. +To illustrate this a bit, let's set an example. +I have a function `usages` which uses `format` in a few different ways. -It's not an insurmountable task, but _not_ doing it and letting the compiler do all the necessary ABI shuffling---at the expense of an extra `jmp` in case it's not needed---is probably not a big performance cost, either. +```cpp +void usages( + String_Buffer& buf, + const char* filename, + const char* part_name, int part_index, + Entity_Id entity_id, Vec3 position) +{ + format(buf, "/prop/{}", filename); + format(buf, "Part #{} ({})", part_index, part_name); + format(buf, "{} at {}", entity_id, position); +} +``` + +This will generate three separate template instantiations of `format`: + +```cpp +void format(String_Buffer&, char const*, char const* const&) +void format(String_Buffer&, char const*, int const&, char const* const&) +void format(String_Buffer&, char const*, Entity_Id const&, Vec3 const&) +``` + +Clang 21.1.0 with `-O3` inlines these into the calling function, but let's consider them out of line for this example. +Each instantiation, after inlining, results in code that looks like this: + +```cpp +void format( + String_Buffer& buf, + char const* fstr, + int const& a1, + char const* const& a2) +{ + if (next_hole(buf, fstr)) + write_value(buf, a1); + if (next_hole(buf, fstr)) + write_value(buf, a2); + while (next_hole(buf, fstr)) {} +} +``` + +From the machine code perspective, this comes out to 120 bytes of code. +This doesn't seem like a lot, but it quickly multiplies when you consider that format function calls are going to end up having different sets of arguments types. + +My game currently has about 16k lines of code, though barely any user-facing text right now---most of it is logs and ImGui strings---but there are quite a few unique instantiations of `format` (35 to be exact.) +Here's the full list with their byte sizes (click the *bold* header to unfold the list). + +:::: details + +::: summary + +List of `fmt::format` instantiations sorted by byte size + +::: + +::: details-content + +```cpp +0x5c <> +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x7c +0x7c +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0xb0 +0xc0 +0xc0 +0xc0 +0xc0 +0xc0 +0xd2 +0xd2 +0xf2 +0xf2 +``` + +::: + +:::: + +Summing it all up, that's 5164 bytes of machine code. +For 35 unique combinations of arguments! + +Now, let's replace the previous function with the new one. +Granted, this is using the new `%n` syntax which is incompatible with the old `{}`, and I haven't replaced the format strings---but the format string themselves do not affect the machine code size, so that's fine. + +First, there's the static data for the function lookup tables. + +:::: details + +::: summary + +List of lookup tables from instantiations of the new `fmt::format` + +::: + +::: details-content + +```cpp +0x00 <> +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x18 +0x18 +0x18 +0x18 +0x18 +0x18 +0x20 +0x20 +0x28 +0x28 +``` + +::: + +:::: + +This comes out at 576 bytes total, obviously with tables for more arguments taking up more space. + +In an embedded setting, this will likely be a lot less due to a smaller (16-bit or 32-bit) memory space, and therefore 2× or 4× smaller pointers. + +Now, for the functions themselves. +Remember that these instantiations only set up the lookup tables for `format_untyped`, so they're likely to be inlined into the caller---though I've inhibited that with the `[[gnu::noinline]]` attribute, to sum up the figures for this post. + +:::: details + +::: summary + +List of instantiations of the new `fmt::format` + +::: + +::: details-content + +```cpp +0x3f <> +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x4e +0x4e +0x4e +0x4e +0x4e +0x4e +0x53 +0x53 +0x5d +0x5d +``` + +::: + +:::: + +That's 2611 bytes of machine code, and summing it up with the space taken up by lookup tables, comes out at 3187 bytes in the executable. +That's *61.7%* the size of the previous version---quite a hefty save! + +And this would only multiply in larger codebases. +Imagine the megabytes of disk space saved if a refactor of this scale were done on the Unreal Engine... ### C version @@ -348,3 +575,6 @@ The worst part would probably be emulating the parameter packs, because the prep Maybe in another post. +--- + +Thank you once again to my friend Tori for reviewing a draft of this post! diff --git a/src/html/djot.rs b/src/html/djot.rs index 45a6f8d..5e475de 100644 --- a/src/html/djot.rs +++ b/src/html/djot.rs @@ -311,6 +311,9 @@ impl<'a> Writer<'a> { } match c { + Container::Heading { id, .. } => { + write!(out, r##">"##)?; + } Container::TableCell { alignment, .. } if !matches!(alignment, Alignment::Unspecified) => { @@ -445,7 +448,7 @@ impl<'a> Writer<'a> { } out.push_str("

"); } - Container::Heading { level, .. } => write!(out, "")?, + Container::Heading { level, .. } => write!(out, "
")?, Container::TableCell { head: false, .. } => out.push_str(""), Container::TableCell { head: true, .. } => out.push_str(""), Container::Caption => out.push_str(""), diff --git a/static/css/doc.css b/static/css/doc.css index dbcfa38..b4c6349 100644 --- a/static/css/doc.css +++ b/static/css/doc.css @@ -56,6 +56,12 @@ main.doc { grid-column: main; } + & hr, + & pre, + & th-literate-program { + grid-column: left-code / right-wide; + } + & p { padding-top: 0.5lh; padding-bottom: 0.5lh; @@ -67,7 +73,8 @@ main.doc { padding-bottom: 0.5lh; } - & h3 { + & h3, + & h4 { margin: 0; padding-top: 0.5lh; padding-bottom: 0.5lh; @@ -90,7 +97,6 @@ main.doc { & pre, & th-literate-program { padding: 0.8rem var(--code-block-h-padding); - grid-column: left-code / right-wide; & code { --recursive-wght: 500; @@ -149,6 +155,69 @@ main.doc { text-align: center; } + & details { + /* I wanted this to work on grid layout, but currently it is impossible to set +
to display: grid; across all browsers. + Instead you have to include a element after the summary. */ + + --details-marker-size: var(--code-block-h-padding); + --details-indent-size: var(--code-block-h-padding); + + grid-column: left-code / right-wide; + + padding-top: 0.5lh; + padding-bottom: 0.5lh; + + & > summary { + display: flex; + flex-direction: row; + align-items: center; + + --recursive-wght: 600; + border-bottom: 1px solid var(--border-1); + + cursor: pointer; + + &::before { + content: ""; + + display: block; + width: calc(2 * var(--details-marker-size)); + height: calc(2 * var(--details-marker-size)); + flex-shrink: 0; + + background-image: var(--icon-expand); + background-position: 50% 50%; + background-repeat: no-repeat; + } + } + + &[open] > summary::before { + background-image: var(--icon-collapse); + } + + & > details-content { + display: grid; + grid-template-columns: + [indent] auto + [main] 1fr; + + border-bottom: 1px solid var(--border-1); + + &::before { + content: ""; + display: block; + width: 100%; + margin: 0 var(--details-indent-size); + border-left: 1px solid var(--border-1); + } + + & > * { + grid-column: main; + } + } + } + & .wide { grid-column: left-wide / right; } @@ -209,15 +278,18 @@ main.doc { & .doc-text { --code-block-grid-space: 0; - & pre, - & th-literate-program { + & details { + --details-marker-size: 1.6rem; + --details-indent-size: 1.6rem; + } + + & > pre, + & > th-literate-program { /* Stretch to whole page. This way of doing it feels a bit brittle, though. It might be good to refactor this to CSS grid at some point. */ padding-left: var(--doc-padding); padding-right: var(--doc-padding); - margin-left: calc(var(--doc-padding) * -1); - margin-right: calc(var(--doc-padding) * -1); border-radius: 0; border-left: none; border-right: none; @@ -229,6 +301,20 @@ main.doc { } } + & > pre, + & > th-literate-program, + & > details { + margin-left: calc(var(--doc-padding) * -1); + margin-right: calc(var(--doc-padding) * -1); + } + + & > details { + & > summary, + & > details-content { + padding-right: var(--doc-padding); + } + } + & figure figcaption { &.overlay-bottom-right { position: static; diff --git a/static/css/main.css b/static/css/main.css index 4e77930..0a4bb77 100644 --- a/static/css/main.css +++ b/static/css/main.css @@ -376,6 +376,41 @@ a.secret { text-decoration: none; } +/* Links to headings should be invisible by default, only appearing on hover. */ + +h1, +h2, +h3, +h4, +h5, +h6 { + & > a { + color: var(--text-color); + text-decoration: none; + + &:visited { + color: var(--text-color); + } + + &:hover { + text-decoration: underline; + } + } +} + +@media (hover: none) { + h1, + h2, + h3, + h4, + h5, + h6 { + & > a { + text-decoration: underline; + } + } +} + /* Make blockquotes a bit prettier */ blockquote { @@ -500,17 +535,17 @@ section.feed { /* Titles */ & h2 { - & a, - & a:visited { + & a { color: var(--text-color); - } + text-decoration: underline; - & a:visited { - color: color-mix( - in srgb, - var(--background-color), - var(--text-color) 60% - ); + &:visited { + color: color-mix( + in srgb, + var(--background-color), + var(--text-color) 60% + ); + } } } @@ -718,8 +753,6 @@ h1.page-title { text-decoration: underline; text-decoration-color: transparent; - transition: var(--transition-duration) text-decoration-color; - &:hover { text-decoration-color: var(--text-color); }