diff --git a/content/fmt2.dj b/content/fmt2.dj
index 5399a67..8fe4649 100644
--- a/content/fmt2.dj
+++ b/content/fmt2.dj
@@ -19,7 +19,8 @@ So I started thinking about how I could improve on that.
I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\
Until today.
-This write-up describes this alternative version of the library in detail.
+This write-up describes this an improved version of the library in detail, with support for positional arguments, and much smaller generated code size!
+I hope you like it.
## Usage
@@ -63,7 +64,7 @@ assert(strcmp(
) == 0);
```
-Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, and never allows reading arguments out of bounds.
+Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, gives back the amount of characters that would be written, and never allows reading arguments out of bounds.
As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`).
This library does the same thing, though using the new format string syntax.
@@ -153,10 +154,10 @@ This trick with a template erasing `T*` into `void*` is actually really useful i
If there's any knowledge worth remembering from this article, it would be this technique.
Aside from that, we once again make use of parameter packs.
-This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an ordinary [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).
+This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).
-Among the `const` soup, you may notice the `ffuncs` array being `static const`.
-This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's `.rodata` section instead of copying the function pointers onto the stack.
+Among the [`const` soup](https://cdecl.org/), you may notice the `ffuncs` array being `static const`.
+This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's read-only data section, instead of generating code to write the function pointers onto the stack.
Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered.
@@ -300,9 +301,14 @@ The same [extra goodies][page:fmt#Extras] as in the previous post can be used (i
This library is slightly larger than the previous, being 73 lines of code long.
I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though.
-I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.
+...is what I would say if I really _cared_ about squeezing every last line of code out of the library, but I don't!
+I wrote this little library to be simple, extensible, and maintainable, so don't treat it as code golf.
+Go with the extra lines of code.
+This version of the library is better.
+It's in the same ballpark either way, and honestly I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.\
And `printf`.
+
Don't forget about `printf`.
@@ -322,20 +328,241 @@ If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-b
### Code size
The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly.
-Instead, it initialises the lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
+
+The new string formatter instead initialises a lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
This is a good thing for embedded use cases.
You should prefer this implementation over the previous one for that.
-I couldn't get clang to inline references to `write_value` into the function table, though.
-There's always an intermediary function generated from the instantiation of `write_value_erased`.
+In my game, the size emitted into the executable for instantiations of `format` is *61.7%* of the previous version!
+Read on for a detailed analysis.
-This can get pretty bad when passing array types into format arguments (e.g. string literals)---generating many redundant variants of the function, like `write_value_erased`, `write_value_erased`, etc.
+---
-I believe this is due to function pointers of different types not always being interchangeable according to the C++ standard.
-You could work around this little inefficiency by casting between function pointers, and it _does_ [seem](https://stackoverflow.com/a/559671) [safe](https://stackoverflow.com/q/11647220) to me in this case (the function pointers have the same return type, the same number of arguments, and the types of arguments `const void*` and `const char*` are compatible)---but I don't really think it's worth it.
+To illustrate this a bit, let's set an example.
+I have a function `usages` which uses `format` in a few different ways.
-It's not an insurmountable task, but _not_ doing it and letting the compiler do all the necessary ABI shuffling---at the expense of an extra `jmp` in case it's not needed---is probably not a big performance cost, either.
+```cpp
+void usages(
+ String_Buffer& buf,
+ const char* filename,
+ const char* part_name, int part_index,
+ Entity_Id entity_id, Vec3 position)
+{
+ format(buf, "/prop/{}", filename);
+ format(buf, "Part #{} ({})", part_index, part_name);
+ format(buf, "{} at {}", entity_id, position);
+}
+```
+
+This will generate three separate template instantiations of `format`:
+
+```cpp
+void format(String_Buffer&, char const*, char const* const&)
+void format(String_Buffer&, char const*, int const&, char const* const&)
+void format(String_Buffer&, char const*, Entity_Id const&, Vec3 const&)
+```
+
+Clang 21.1.0 with `-O3` inlines these into the calling function, but let's consider them out of line for this example.
+Each instantiation, after inlining, results in code that looks like this:
+
+```cpp
+void format(
+ String_Buffer& buf,
+ char const* fstr,
+ int const& a1,
+ char const* const& a2)
+{
+ if (next_hole(buf, fstr))
+ write_value(buf, a1);
+ if (next_hole(buf, fstr))
+ write_value(buf, a2);
+ while (next_hole(buf, fstr)) {}
+}
+```
+
+From the machine code perspective, this comes out to 120 bytes of code.
+This doesn't seem like a lot, but it quickly multiplies when you consider that format function calls are going to end up having different sets of arguments types.
+
+My game currently has about 16k lines of code, though barely any user-facing text right now---most of it is logs and ImGui strings---but there are quite a few unique instantiations of `format` (35 to be exact.)
+Here's the full list with their byte sizes (click the *bold* header to unfold the list).
+
+:::: details
+
+::: summary
+
+List of `fmt::format` instantiations sorted by byte size
+
+:::
+
+::: details-content
+
+```cpp
+0x5c <>
+0x6c
+0x6c
+0x6c
+0x6c
+0x6c
+0x6c
+0x6c
+0x6c
+0x6c
+0x6c
+0x7c
+0x7c
+0x8e
+0x8e
+0x8e
+0x8e
+0x8e
+0x8e
+0x8e
+0x8e
+0x8e
+0x8e
+0x8e
+0x8e
+0xb0
+0xc0
+0xc0
+0xc0
+0xc0
+0xc0
+0xd2
+0xd2
+0xf2
+0xf2
+```
+
+:::
+
+::::
+
+Summing it all up, that's 5164 bytes of machine code.
+For 35 unique combinations of arguments!
+
+Now, let's replace the previous function with the new one.
+Granted, this is using the new `%n` syntax which is incompatible with the old `{}`, and I haven't replaced the format strings---but the format string themselves do not affect the machine code size, so that's fine.
+
+First, there's the static data for the function lookup tables.
+
+:::: details
+
+::: summary
+
+List of lookup tables from instantiations of the new `fmt::format`
+
+:::
+
+::: details-content
+
+```cpp
+0x00 <>
+0x08
+0x08
+0x08
+0x08
+0x08
+0x08
+0x08
+0x08
+0x08
+0x08
+0x08
+0x08
+0x10
+0x10
+0x10
+0x10
+0x10
+0x10
+0x10
+0x10
+0x10
+0x10
+0x10
+0x10
+0x18
+0x18
+0x18
+0x18
+0x18
+0x18
+0x20
+0x20
+0x28
+0x28
+```
+
+:::
+
+::::
+
+This comes out at 576 bytes total, obviously with tables for more arguments taking up more space.
+
+In an embedded setting, this will likely be a lot less due to a smaller (16-bit or 32-bit) memory space, and therefore 2× or 4× smaller pointers.
+
+Now, for the functions themselves.
+Remember that these instantiations only set up the lookup tables for `format_untyped`, so they're likely to be inlined into the caller---though I've inhibited that with the `[[gnu::noinline]]` attribute, to sum up the figures for this post.
+
+:::: details
+
+::: summary
+
+List of instantiations of the new `fmt::format`
+
+:::
+
+::: details-content
+
+```cpp
+0x3f <>
+0x47
+0x47
+0x47
+0x47
+0x47
+0x47
+0x47
+0x47
+0x47
+0x47
+0x47
+0x47
+0x49
+0x49
+0x49
+0x49
+0x49
+0x49
+0x49
+0x49
+0x49
+0x49
+0x49
+0x49
+0x4e
+0x4e
+0x4e
+0x4e
+0x4e
+0x4e
+0x53
+0x53
+0x5d
+0x5d
+```
+
+:::
+
+::::
+
+That's 2611 bytes of machine code, and summing it up with the space taken up by lookup tables, comes out at 3187 bytes in the executable.
+That's *61.7%* the size of the previous version---quite a hefty save!
+
+And this would only multiply in larger codebases.
+Imagine the megabytes of disk space saved if a refactor of this scale were done on the Unreal Engine...
### C version
@@ -348,3 +575,6 @@ The worst part would probably be emulating the parameter packs, because the prep
Maybe in another post.
+---
+
+Thank you once again to my friend Tori for reviewing a draft of this post!
diff --git a/src/html/djot.rs b/src/html/djot.rs
index 45a6f8d..5e475de 100644
--- a/src/html/djot.rs
+++ b/src/html/djot.rs
@@ -311,6 +311,9 @@ impl<'a> Writer<'a> {
}
match c {
+ Container::Heading { id, .. } => {
+ write!(out, r##">"##)?;
+ }
Container::TableCell { alignment, .. }
if !matches!(alignment, Alignment::Unspecified) =>
{
@@ -445,7 +448,7 @@ impl<'a> Writer<'a> {
}
out.push_str("
");
}
- Container::Heading { level, .. } => write!(out, "")?,
+ Container::Heading { level, .. } => write!(out, "")?,
Container::TableCell { head: false, .. } => out.push_str(""),
Container::TableCell { head: true, .. } => out.push_str(""),
Container::Caption => out.push_str(""),
diff --git a/static/css/doc.css b/static/css/doc.css
index dbcfa38..b4c6349 100644
--- a/static/css/doc.css
+++ b/static/css/doc.css
@@ -56,6 +56,12 @@ main.doc {
grid-column: main;
}
+ & hr,
+ & pre,
+ & th-literate-program {
+ grid-column: left-code / right-wide;
+ }
+
& p {
padding-top: 0.5lh;
padding-bottom: 0.5lh;
@@ -67,7 +73,8 @@ main.doc {
padding-bottom: 0.5lh;
}
- & h3 {
+ & h3,
+ & h4 {
margin: 0;
padding-top: 0.5lh;
padding-bottom: 0.5lh;
@@ -90,7 +97,6 @@ main.doc {
& pre,
& th-literate-program {
padding: 0.8rem var(--code-block-h-padding);
- grid-column: left-code / right-wide;
& code {
--recursive-wght: 500;
@@ -149,6 +155,69 @@ main.doc {
text-align: center;
}
+ & details {
+ /* I wanted this to work on grid layout, but currently it is impossible to set
+ to display: grid; across all browsers.
+ Instead you have to include a element after the summary. */
+
+ --details-marker-size: var(--code-block-h-padding);
+ --details-indent-size: var(--code-block-h-padding);
+
+ grid-column: left-code / right-wide;
+
+ padding-top: 0.5lh;
+ padding-bottom: 0.5lh;
+
+ & > summary {
+ display: flex;
+ flex-direction: row;
+ align-items: center;
+
+ --recursive-wght: 600;
+ border-bottom: 1px solid var(--border-1);
+
+ cursor: pointer;
+
+ &::before {
+ content: "";
+
+ display: block;
+ width: calc(2 * var(--details-marker-size));
+ height: calc(2 * var(--details-marker-size));
+ flex-shrink: 0;
+
+ background-image: var(--icon-expand);
+ background-position: 50% 50%;
+ background-repeat: no-repeat;
+ }
+ }
+
+ &[open] > summary::before {
+ background-image: var(--icon-collapse);
+ }
+
+ & > details-content {
+ display: grid;
+ grid-template-columns:
+ [indent] auto
+ [main] 1fr;
+
+ border-bottom: 1px solid var(--border-1);
+
+ &::before {
+ content: "";
+ display: block;
+ width: 100%;
+ margin: 0 var(--details-indent-size);
+ border-left: 1px solid var(--border-1);
+ }
+
+ & > * {
+ grid-column: main;
+ }
+ }
+ }
+
& .wide {
grid-column: left-wide / right;
}
@@ -209,15 +278,18 @@ main.doc {
& .doc-text {
--code-block-grid-space: 0;
- & pre,
- & th-literate-program {
+ & details {
+ --details-marker-size: 1.6rem;
+ --details-indent-size: 1.6rem;
+ }
+
+ & > pre,
+ & > th-literate-program {
/* Stretch to whole page.
This way of doing it feels a bit brittle, though.
It might be good to refactor this to CSS grid at some point. */
padding-left: var(--doc-padding);
padding-right: var(--doc-padding);
- margin-left: calc(var(--doc-padding) * -1);
- margin-right: calc(var(--doc-padding) * -1);
border-radius: 0;
border-left: none;
border-right: none;
@@ -229,6 +301,20 @@ main.doc {
}
}
+ & > pre,
+ & > th-literate-program,
+ & > details {
+ margin-left: calc(var(--doc-padding) * -1);
+ margin-right: calc(var(--doc-padding) * -1);
+ }
+
+ & > details {
+ & > summary,
+ & > details-content {
+ padding-right: var(--doc-padding);
+ }
+ }
+
& figure figcaption {
&.overlay-bottom-right {
position: static;
diff --git a/static/css/main.css b/static/css/main.css
index 4e77930..0a4bb77 100644
--- a/static/css/main.css
+++ b/static/css/main.css
@@ -376,6 +376,41 @@ a.secret {
text-decoration: none;
}
+/* Links to headings should be invisible by default, only appearing on hover. */
+
+h1,
+h2,
+h3,
+h4,
+h5,
+h6 {
+ & > a {
+ color: var(--text-color);
+ text-decoration: none;
+
+ &:visited {
+ color: var(--text-color);
+ }
+
+ &:hover {
+ text-decoration: underline;
+ }
+ }
+}
+
+@media (hover: none) {
+ h1,
+ h2,
+ h3,
+ h4,
+ h5,
+ h6 {
+ & > a {
+ text-decoration: underline;
+ }
+ }
+}
+
/* Make blockquotes a bit prettier */
blockquote {
@@ -500,17 +535,17 @@ section.feed {
/* Titles */
& h2 {
- & a,
- & a:visited {
+ & a {
color: var(--text-color);
- }
+ text-decoration: underline;
- & a:visited {
- color: color-mix(
- in srgb,
- var(--background-color),
- var(--text-color) 60%
- );
+ &:visited {
+ color: color-mix(
+ in srgb,
+ var(--background-color),
+ var(--text-color) 60%
+ );
+ }
}
}
@@ -718,8 +753,6 @@ h1.page-title {
text-decoration: underline;
text-decoration-color: transparent;
- transition: var(--transition-duration) text-decoration-color;
-
&:hover {
text-decoration-color: var(--text-color);
}