fmt2: revision

This commit is contained in:
りき萌 2025-09-15 23:22:06 +02:00
parent 530c137223
commit 51e3c0cda3

View file

@ -19,7 +19,8 @@ So I started thinking about how I could improve on that.
I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\ I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\
Until today. Until today.
This write-up describes this alternative version of the library in detail. This write-up describes this an improved version of the library in detail, with support for positional arguments, and much smaller generated code size!
I hope you like it.
## Usage ## Usage
@ -63,7 +64,7 @@ assert(strcmp(
) == 0); ) == 0);
``` ```
Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, and never allows reading arguments out of bounds. Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, gives back the amount of characters that would be written, and never allows reading arguments out of bounds.
As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`). As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`).
This library does the same thing, though using the new format string syntax. This library does the same thing, though using the new format string syntax.
@ -153,10 +154,10 @@ This trick with a template erasing `T*` into `void*` is actually really useful i
If there's any knowledge worth remembering from this article, it would be this technique. If there's any knowledge worth remembering from this article, it would be this technique.
Aside from that, we once again make use of parameter packs. Aside from that, we once again make use of parameter packs.
This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an ordinary [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers). This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).
Among the `const` soup, you may notice the `ffuncs` array being `static const`. Among the [`const` soup](https://cdecl.org/), you may notice the `ffuncs` array being `static const`.
This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's `.rodata` section instead of copying the function pointers onto the stack. This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's read-only data section, instead of generating code to write the function pointers onto the stack.
Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered. Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered.
@ -300,9 +301,14 @@ The same [extra goodies][page:fmt#Extras] as in the previous post can be used (i
This library is slightly larger than the previous, being 73 lines of code long. This library is slightly larger than the previous, being 73 lines of code long.
I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though. I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though.
I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries. ...is what I would say if I really _cared_ about squeezing every last line of code out of the library, but I don't!
I wrote this little library to be simple, extensible, and maintainable, so don't treat it as code golf.
Go with the extra lines of code.
This version of the library is better.
It's in the same ballpark either way, and honestly I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.\
And `printf`. And `printf`.
Don't forget about `printf`. Don't forget about `printf`.
@ -322,20 +328,241 @@ If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-b
### Code size ### Code size
The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly. The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly.
Instead, it initialises the lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
The new string formatter instead initialises a lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
This is a good thing for embedded use cases. This is a good thing for embedded use cases.
You should prefer this implementation over the previous one for that. You should prefer this implementation over the previous one for that.
I couldn't get clang to inline references to `write_value` into the function table, though. In my game, the size emitted into the executable for instantiations of `format` is *61.7%* of the previous version!
There's always an intermediary function generated from the instantiation of `write_value_erased`. Read on for a detailed analysis.
This can get pretty bad when passing array types into format arguments (e.g. string literals)---generating many redundant variants of the function, like `write_value_erased<char [6]>`, `write_value_erased<char [7]>`, etc. ---
I believe this is due to function pointers of different types not always being interchangeable according to the C++ standard. To illustrate this a bit, let's set an example.
You could work around this little inefficiency by casting between function pointers, and it _does_ [seem](https://stackoverflow.com/a/559671) [safe](https://stackoverflow.com/q/11647220) to me in this case (the function pointers have the same return type, the same number of arguments, and the types of arguments `const void*` and `const char*` are compatible)---but I don't really think it's worth it. I have a function `usages` which uses `format` in a few different ways.
It's not an insurmountable task, but _not_ doing it and letting the compiler do all the necessary ABI shuffling---at the expense of an extra `jmp` in case it's not needed---is probably not a big performance cost, either. ```cpp
void usages(
String_Buffer& buf,
const char* filename,
const char* part_name, int part_index,
Entity_Id entity_id, Vec3 position)
{
format(buf, "/prop/{}", filename);
format(buf, "Part #{} ({})", part_index, part_name);
format(buf, "{} at {}", entity_id, position);
}
```
This will generate three separate template instantiations of `format`:
```cpp
void format<char const*>(String_Buffer&, char const*, char const* const&)
void format<int, char const*>(String_Buffer&, char const*, int const&, char const* const&)
void format<Entity_Id, Vec3>(String_Buffer&, char const*, Entity_Id const&, Vec3 const&)
```
Clang 21.1.0 with `-O3` inlines these into the calling function, but let's consider them out of line for this example.
Each instantiation, after inlining, results in code that looks like this:
```cpp
void format<int, char const*>(
String_Buffer& buf,
char const* fstr,
int const& a1,
char const* const& a2)
{
if (next_hole(buf, fstr))
write_value(buf, a1);
if (next_hole(buf, fstr))
write_value(buf, a2);
while (next_hole(buf, fstr)) {}
}
```
From the machine code perspective, this comes out to 120 bytes of code.
This doesn't seem like a lot, but it quickly multiplies when you consider that format function calls are going to end up having different sets of arguments types.
My game currently has about 16k lines of code, though barely any user-facing text right now---most of it is logs and ImGui strings---but there are quite a few unique instantiations of `format` (35 to be exact.)
Here's the full list with their byte sizes (click the *bold* header to unfold the list).
:::: details
::: summary
List of `fmt::format` instantiations sorted by byte size
:::
::: details-content
```cpp
0x5c <>
0x6c <bool>
0x6c <char [128]>
0x6c <char [20]>
0x6c <char [32]>
0x6c <char const*>
0x6c <char*>
0x6c <double>
0x6c <int>
0x6c <unsigned int>
0x6c <unsigned long>
0x7c <Format_Hex>
0x7c <float>
0x8e <Writer::Chunk*, unsigned long long>
0x8e <char [128], char const*>
0x8e <char const*, Entity_Id>
0x8e <char const*, Vec2i>
0x8e <char const*, Vec3>
0x8e <char const*, char [80]>
0x8e <char const*, char const*>
0x8e <char const*, char*>
0x8e <char const*, unsigned int>
0x8e <int, char [32]>
0x8e <int, char const*>
0x8e <int, int>
0xb0 <char [32], char const*, char const*>
0xc0 <Writer::Chunk*, unsigned long long, char const*>
0xc0 <char const*, int, char const*>
0xc0 <char const*, int, int>
0xc0 <int, int, char const*>
0xc0 <int, int, int>
0xd2 <char const*, int, int, char const*>
0xd2 <int, char const*, int, int>
0xf2 <char const*, int, int, unsigned int, char const*>
0xf2 <int, int, int, int, char const*>
```
:::
::::
Summing it all up, that's 5164 bytes of machine code.
For 35 unique combinations of arguments!
Now, let's replace the previous function with the new one.
Granted, this is using the new `%n` syntax which is incompatible with the old `{}`, and I haven't replaced the format strings---but the format string themselves do not affect the machine code size, so that's fine.
First, there's the static data for the function lookup tables.
:::: details
::: summary
List of lookup tables from instantiations of the new `fmt::format`
:::
::: details-content
```cpp
0x00 <>
0x08 <Format_Hex>
0x08 <bool>
0x08 <char [128]>
0x08 <char [20]>
0x08 <char [32]>
0x08 <char const*>
0x08 <char*>
0x08 <double>
0x08 <float>
0x08 <int>
0x08 <unsigned int>
0x08 <unsigned long>
0x10 <Writer::Chunk*, unsigned long long>
0x10 <char [128], char const*>
0x10 <char const*, Entity_Id>
0x10 <char const*, Vec2i>
0x10 <char const*, Vec3>
0x10 <char const*, char [80]>
0x10 <char const*, char const*>
0x10 <char const*, char*>
0x10 <char const*, unsigned int>
0x10 <int, char [32]>
0x10 <int, char const*>
0x10 <int, int>
0x18 <Writer::Chunk*, unsigned long long, char const*>
0x18 <char [32], char const*, char const*>
0x18 <char const*, int, char const*>
0x18 <char const*, int, int>
0x18 <int, int, char const*>
0x18 <int, int, int>
0x20 <char const*, int, int, char const*>
0x20 <int, char const*, int, int>
0x28 <char const*, int, int, unsigned int, char const*>
0x28 <int, int, int, int, char const*>
```
:::
::::
This comes out at 576 bytes total, obviously with tables for more arguments taking up more space.
In an embedded setting, this will likely be a lot less due to a smaller (16-bit or 32-bit) memory space, and therefore 2× or 4× smaller pointers.
Now, for the functions themselves.
Remember that these instantiations only set up the lookup tables for `format_untyped`, so they're likely to be inlined into the caller---though I've inhibited that with the `[[gnu::noinline]]` attribute, to sum up the figures for this post.
:::: details
::: summary
List of instantiations of the new `fmt::format`
:::
::: details-content
```cpp
0x3f <>
0x47 <Format_Hex>
0x47 <bool>
0x47 <char [128]>
0x47 <char [20]>
0x47 <char [32]>
0x47 <char const*>
0x47 <char*>
0x47 <double>
0x47 <float>
0x47 <int>
0x47 <unsigned int>
0x47 <unsigned long>
0x49 <Writer::Chunk*, unsigned long long>
0x49 <char [128], char const*>
0x49 <char const*, Entity_Id>
0x49 <char const*, Vec2i>
0x49 <char const*, Vec3>
0x49 <char const*, char [80]>
0x49 <char const*, char const*>
0x49 <char const*, char*>
0x49 <char const*, unsigned int>
0x49 <int, char [32]>
0x49 <int, char const*>
0x49 <int, int>
0x4e <Writer::Chunk*, unsigned long long, char const*>
0x4e <char [32], char const*, char const*>
0x4e <char const*, int, char const*>
0x4e <char const*, int, int>
0x4e <int, int, char const*>
0x4e <int, int, int>
0x53 <char const*, int, int, char const*>
0x53 <int, char const*, int, int>
0x5d <char const*, int, int, unsigned int, char const*>
0x5d <int, int, int, int, char const*>
```
:::
::::
That's 2611 bytes of machine code, and summing it up with the space taken up by lookup tables, comes out at 3187 bytes in the executable.
That's *61.7%* the size of the previous version---quite a hefty save!
And this would only multiply in larger codebases.
Imagine the megabytes of disk space saved if a refactor of this scale were done on the Unreal Engine...
### C version ### C version
@ -348,3 +575,6 @@ The worst part would probably be emulating the parameter packs, because the prep
Maybe in another post. Maybe in another post.
---
Thank you once again to my friend Tori for reviewing a draft of this post!