diff --git a/content/fmt2.dj b/content/fmt2.dj index 5399a67..8fe4649 100644 --- a/content/fmt2.dj +++ b/content/fmt2.dj @@ -19,7 +19,8 @@ So I started thinking about how I could improve on that. I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\ Until today. -This write-up describes this alternative version of the library in detail. +This write-up describes this an improved version of the library in detail, with support for positional arguments, and much smaller generated code size! +I hope you like it. ## Usage @@ -63,7 +64,7 @@ assert(strcmp( ) == 0); ``` -Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, and never allows reading arguments out of bounds. +Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, gives back the amount of characters that would be written, and never allows reading arguments out of bounds. As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`). This library does the same thing, though using the new format string syntax. @@ -153,10 +154,10 @@ This trick with a template erasing `T*` into `void*` is actually really useful i If there's any knowledge worth remembering from this article, it would be this technique. Aside from that, we once again make use of parameter packs. -This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an ordinary [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers). +This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers). -Among the `const` soup, you may notice the `ffuncs` array being `static const`. -This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's `.rodata` section instead of copying the function pointers onto the stack. +Among the [`const` soup](https://cdecl.org/), you may notice the `ffuncs` array being `static const`. +This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's read-only data section, instead of generating code to write the function pointers onto the stack. Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered. @@ -300,9 +301,14 @@ The same [extra goodies][page:fmt#Extras] as in the previous post can be used (i This library is slightly larger than the previous, being 73 lines of code long. I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though. -I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries. +...is what I would say if I really _cared_ about squeezing every last line of code out of the library, but I don't! +I wrote this little library to be simple, extensible, and maintainable, so don't treat it as code golf. +Go with the extra lines of code. +This version of the library is better. +It's in the same ballpark either way, and honestly I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.\ And `printf`. + Don't forget about `printf`. @@ -322,20 +328,241 @@ If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-b ### Code size The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly. -Instead, it initialises the lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values. + +The new string formatter instead initialises a lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values. This is a good thing for embedded use cases. You should prefer this implementation over the previous one for that. -I couldn't get clang to inline references to `write_value` into the function table, though. -There's always an intermediary function generated from the instantiation of `write_value_erased`. +In my game, the size emitted into the executable for instantiations of `format` is *61.7%* of the previous version! +Read on for a detailed analysis. -This can get pretty bad when passing array types into format arguments (e.g. string literals)---generating many redundant variants of the function, like `write_value_erased`, `write_value_erased`, etc. +--- -I believe this is due to function pointers of different types not always being interchangeable according to the C++ standard. -You could work around this little inefficiency by casting between function pointers, and it _does_ [seem](https://stackoverflow.com/a/559671) [safe](https://stackoverflow.com/q/11647220) to me in this case (the function pointers have the same return type, the same number of arguments, and the types of arguments `const void*` and `const char*` are compatible)---but I don't really think it's worth it. +To illustrate this a bit, let's set an example. +I have a function `usages` which uses `format` in a few different ways. -It's not an insurmountable task, but _not_ doing it and letting the compiler do all the necessary ABI shuffling---at the expense of an extra `jmp` in case it's not needed---is probably not a big performance cost, either. +```cpp +void usages( + String_Buffer& buf, + const char* filename, + const char* part_name, int part_index, + Entity_Id entity_id, Vec3 position) +{ + format(buf, "/prop/{}", filename); + format(buf, "Part #{} ({})", part_index, part_name); + format(buf, "{} at {}", entity_id, position); +} +``` + +This will generate three separate template instantiations of `format`: + +```cpp +void format(String_Buffer&, char const*, char const* const&) +void format(String_Buffer&, char const*, int const&, char const* const&) +void format(String_Buffer&, char const*, Entity_Id const&, Vec3 const&) +``` + +Clang 21.1.0 with `-O3` inlines these into the calling function, but let's consider them out of line for this example. +Each instantiation, after inlining, results in code that looks like this: + +```cpp +void format( + String_Buffer& buf, + char const* fstr, + int const& a1, + char const* const& a2) +{ + if (next_hole(buf, fstr)) + write_value(buf, a1); + if (next_hole(buf, fstr)) + write_value(buf, a2); + while (next_hole(buf, fstr)) {} +} +``` + +From the machine code perspective, this comes out to 120 bytes of code. +This doesn't seem like a lot, but it quickly multiplies when you consider that format function calls are going to end up having different sets of arguments types. + +My game currently has about 16k lines of code, though barely any user-facing text right now---most of it is logs and ImGui strings---but there are quite a few unique instantiations of `format` (35 to be exact.) +Here's the full list with their byte sizes (click the *bold* header to unfold the list). + +:::: details + +::: summary + +List of `fmt::format` instantiations sorted by byte size + +::: + +::: details-content + +```cpp +0x5c <> +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x6c +0x7c +0x7c +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0x8e +0xb0 +0xc0 +0xc0 +0xc0 +0xc0 +0xc0 +0xd2 +0xd2 +0xf2 +0xf2 +``` + +::: + +:::: + +Summing it all up, that's 5164 bytes of machine code. +For 35 unique combinations of arguments! + +Now, let's replace the previous function with the new one. +Granted, this is using the new `%n` syntax which is incompatible with the old `{}`, and I haven't replaced the format strings---but the format string themselves do not affect the machine code size, so that's fine. + +First, there's the static data for the function lookup tables. + +:::: details + +::: summary + +List of lookup tables from instantiations of the new `fmt::format` + +::: + +::: details-content + +```cpp +0x00 <> +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x08 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x10 +0x18 +0x18 +0x18 +0x18 +0x18 +0x18 +0x20 +0x20 +0x28 +0x28 +``` + +::: + +:::: + +This comes out at 576 bytes total, obviously with tables for more arguments taking up more space. + +In an embedded setting, this will likely be a lot less due to a smaller (16-bit or 32-bit) memory space, and therefore 2× or 4× smaller pointers. + +Now, for the functions themselves. +Remember that these instantiations only set up the lookup tables for `format_untyped`, so they're likely to be inlined into the caller---though I've inhibited that with the `[[gnu::noinline]]` attribute, to sum up the figures for this post. + +:::: details + +::: summary + +List of instantiations of the new `fmt::format` + +::: + +::: details-content + +```cpp +0x3f <> +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x47 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x49 +0x4e +0x4e +0x4e +0x4e +0x4e +0x4e +0x53 +0x53 +0x5d +0x5d +``` + +::: + +:::: + +That's 2611 bytes of machine code, and summing it up with the space taken up by lookup tables, comes out at 3187 bytes in the executable. +That's *61.7%* the size of the previous version---quite a hefty save! + +And this would only multiply in larger codebases. +Imagine the megabytes of disk space saved if a refactor of this scale were done on the Unreal Engine... ### C version @@ -348,3 +575,6 @@ The worst part would probably be emulating the parameter packs, because the prep Maybe in another post. +--- + +Thank you once again to my friend Tori for reviewing a draft of this post!