fmt2: revision
This commit is contained in:
parent
530c137223
commit
51e3c0cda3
1 changed files with 243 additions and 13 deletions
256
content/fmt2.dj
256
content/fmt2.dj
|
@ -19,7 +19,8 @@ So I started thinking about how I could improve on that.
|
||||||
I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\
|
I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\
|
||||||
Until today.
|
Until today.
|
||||||
|
|
||||||
This write-up describes this alternative version of the library in detail.
|
This write-up describes this an improved version of the library in detail, with support for positional arguments, and much smaller generated code size!
|
||||||
|
I hope you like it.
|
||||||
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
@ -63,7 +64,7 @@ assert(strcmp(
|
||||||
) == 0);
|
) == 0);
|
||||||
```
|
```
|
||||||
|
|
||||||
Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, and never allows reading arguments out of bounds.
|
Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, gives back the amount of characters that would be written, and never allows reading arguments out of bounds.
|
||||||
|
|
||||||
As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`).
|
As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`).
|
||||||
This library does the same thing, though using the new format string syntax.
|
This library does the same thing, though using the new format string syntax.
|
||||||
|
@ -153,10 +154,10 @@ This trick with a template erasing `T*` into `void*` is actually really useful i
|
||||||
If there's any knowledge worth remembering from this article, it would be this technique.
|
If there's any knowledge worth remembering from this article, it would be this technique.
|
||||||
|
|
||||||
Aside from that, we once again make use of parameter packs.
|
Aside from that, we once again make use of parameter packs.
|
||||||
This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an ordinary [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).
|
This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).
|
||||||
|
|
||||||
Among the `const` soup, you may notice the `ffuncs` array being `static const`.
|
Among the [`const` soup](https://cdecl.org/), you may notice the `ffuncs` array being `static const`.
|
||||||
This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's `.rodata` section instead of copying the function pointers onto the stack.
|
This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's read-only data section, instead of generating code to write the function pointers onto the stack.
|
||||||
|
|
||||||
Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered.
|
Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered.
|
||||||
|
|
||||||
|
@ -300,9 +301,14 @@ The same [extra goodies][page:fmt#Extras] as in the previous post can be used (i
|
||||||
This library is slightly larger than the previous, being 73 lines of code long.
|
This library is slightly larger than the previous, being 73 lines of code long.
|
||||||
I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though.
|
I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though.
|
||||||
|
|
||||||
I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.
|
...is what I would say if I really _cared_ about squeezing every last line of code out of the library, but I don't!
|
||||||
|
I wrote this little library to be simple, extensible, and maintainable, so don't treat it as code golf.
|
||||||
|
Go with the extra lines of code.
|
||||||
|
This version of the library is better.
|
||||||
|
|
||||||
|
It's in the same ballpark either way, and honestly I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.\
|
||||||
And `printf`.
|
And `printf`.
|
||||||
|
|
||||||
Don't forget about `printf`.
|
Don't forget about `printf`.
|
||||||
|
|
||||||
|
|
||||||
|
@ -322,20 +328,241 @@ If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-b
|
||||||
### Code size
|
### Code size
|
||||||
|
|
||||||
The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly.
|
The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly.
|
||||||
Instead, it initialises the lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
|
|
||||||
|
The new string formatter instead initialises a lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
|
||||||
|
|
||||||
This is a good thing for embedded use cases.
|
This is a good thing for embedded use cases.
|
||||||
You should prefer this implementation over the previous one for that.
|
You should prefer this implementation over the previous one for that.
|
||||||
|
|
||||||
I couldn't get clang to inline references to `write_value` into the function table, though.
|
In my game, the size emitted into the executable for instantiations of `format` is *61.7%* of the previous version!
|
||||||
There's always an intermediary function generated from the instantiation of `write_value_erased`.
|
Read on for a detailed analysis.
|
||||||
|
|
||||||
This can get pretty bad when passing array types into format arguments (e.g. string literals)---generating many redundant variants of the function, like `write_value_erased<char [6]>`, `write_value_erased<char [7]>`, etc.
|
---
|
||||||
|
|
||||||
I believe this is due to function pointers of different types not always being interchangeable according to the C++ standard.
|
To illustrate this a bit, let's set an example.
|
||||||
You could work around this little inefficiency by casting between function pointers, and it _does_ [seem](https://stackoverflow.com/a/559671) [safe](https://stackoverflow.com/q/11647220) to me in this case (the function pointers have the same return type, the same number of arguments, and the types of arguments `const void*` and `const char*` are compatible)---but I don't really think it's worth it.
|
I have a function `usages` which uses `format` in a few different ways.
|
||||||
|
|
||||||
It's not an insurmountable task, but _not_ doing it and letting the compiler do all the necessary ABI shuffling---at the expense of an extra `jmp` in case it's not needed---is probably not a big performance cost, either.
|
```cpp
|
||||||
|
void usages(
|
||||||
|
String_Buffer& buf,
|
||||||
|
const char* filename,
|
||||||
|
const char* part_name, int part_index,
|
||||||
|
Entity_Id entity_id, Vec3 position)
|
||||||
|
{
|
||||||
|
format(buf, "/prop/{}", filename);
|
||||||
|
format(buf, "Part #{} ({})", part_index, part_name);
|
||||||
|
format(buf, "{} at {}", entity_id, position);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This will generate three separate template instantiations of `format`:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
void format<char const*>(String_Buffer&, char const*, char const* const&)
|
||||||
|
void format<int, char const*>(String_Buffer&, char const*, int const&, char const* const&)
|
||||||
|
void format<Entity_Id, Vec3>(String_Buffer&, char const*, Entity_Id const&, Vec3 const&)
|
||||||
|
```
|
||||||
|
|
||||||
|
Clang 21.1.0 with `-O3` inlines these into the calling function, but let's consider them out of line for this example.
|
||||||
|
Each instantiation, after inlining, results in code that looks like this:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
void format<int, char const*>(
|
||||||
|
String_Buffer& buf,
|
||||||
|
char const* fstr,
|
||||||
|
int const& a1,
|
||||||
|
char const* const& a2)
|
||||||
|
{
|
||||||
|
if (next_hole(buf, fstr))
|
||||||
|
write_value(buf, a1);
|
||||||
|
if (next_hole(buf, fstr))
|
||||||
|
write_value(buf, a2);
|
||||||
|
while (next_hole(buf, fstr)) {}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
From the machine code perspective, this comes out to 120 bytes of code.
|
||||||
|
This doesn't seem like a lot, but it quickly multiplies when you consider that format function calls are going to end up having different sets of arguments types.
|
||||||
|
|
||||||
|
My game currently has about 16k lines of code, though barely any user-facing text right now---most of it is logs and ImGui strings---but there are quite a few unique instantiations of `format` (35 to be exact.)
|
||||||
|
Here's the full list with their byte sizes (click the *bold* header to unfold the list).
|
||||||
|
|
||||||
|
:::: details
|
||||||
|
|
||||||
|
::: summary
|
||||||
|
|
||||||
|
List of `fmt::format` instantiations sorted by byte size
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
::: details-content
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
0x5c <>
|
||||||
|
0x6c <bool>
|
||||||
|
0x6c <char [128]>
|
||||||
|
0x6c <char [20]>
|
||||||
|
0x6c <char [32]>
|
||||||
|
0x6c <char const*>
|
||||||
|
0x6c <char*>
|
||||||
|
0x6c <double>
|
||||||
|
0x6c <int>
|
||||||
|
0x6c <unsigned int>
|
||||||
|
0x6c <unsigned long>
|
||||||
|
0x7c <Format_Hex>
|
||||||
|
0x7c <float>
|
||||||
|
0x8e <Writer::Chunk*, unsigned long long>
|
||||||
|
0x8e <char [128], char const*>
|
||||||
|
0x8e <char const*, Entity_Id>
|
||||||
|
0x8e <char const*, Vec2i>
|
||||||
|
0x8e <char const*, Vec3>
|
||||||
|
0x8e <char const*, char [80]>
|
||||||
|
0x8e <char const*, char const*>
|
||||||
|
0x8e <char const*, char*>
|
||||||
|
0x8e <char const*, unsigned int>
|
||||||
|
0x8e <int, char [32]>
|
||||||
|
0x8e <int, char const*>
|
||||||
|
0x8e <int, int>
|
||||||
|
0xb0 <char [32], char const*, char const*>
|
||||||
|
0xc0 <Writer::Chunk*, unsigned long long, char const*>
|
||||||
|
0xc0 <char const*, int, char const*>
|
||||||
|
0xc0 <char const*, int, int>
|
||||||
|
0xc0 <int, int, char const*>
|
||||||
|
0xc0 <int, int, int>
|
||||||
|
0xd2 <char const*, int, int, char const*>
|
||||||
|
0xd2 <int, char const*, int, int>
|
||||||
|
0xf2 <char const*, int, int, unsigned int, char const*>
|
||||||
|
0xf2 <int, int, int, int, char const*>
|
||||||
|
```
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
::::
|
||||||
|
|
||||||
|
Summing it all up, that's 5164 bytes of machine code.
|
||||||
|
For 35 unique combinations of arguments!
|
||||||
|
|
||||||
|
Now, let's replace the previous function with the new one.
|
||||||
|
Granted, this is using the new `%n` syntax which is incompatible with the old `{}`, and I haven't replaced the format strings---but the format string themselves do not affect the machine code size, so that's fine.
|
||||||
|
|
||||||
|
First, there's the static data for the function lookup tables.
|
||||||
|
|
||||||
|
:::: details
|
||||||
|
|
||||||
|
::: summary
|
||||||
|
|
||||||
|
List of lookup tables from instantiations of the new `fmt::format`
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
::: details-content
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
0x00 <>
|
||||||
|
0x08 <Format_Hex>
|
||||||
|
0x08 <bool>
|
||||||
|
0x08 <char [128]>
|
||||||
|
0x08 <char [20]>
|
||||||
|
0x08 <char [32]>
|
||||||
|
0x08 <char const*>
|
||||||
|
0x08 <char*>
|
||||||
|
0x08 <double>
|
||||||
|
0x08 <float>
|
||||||
|
0x08 <int>
|
||||||
|
0x08 <unsigned int>
|
||||||
|
0x08 <unsigned long>
|
||||||
|
0x10 <Writer::Chunk*, unsigned long long>
|
||||||
|
0x10 <char [128], char const*>
|
||||||
|
0x10 <char const*, Entity_Id>
|
||||||
|
0x10 <char const*, Vec2i>
|
||||||
|
0x10 <char const*, Vec3>
|
||||||
|
0x10 <char const*, char [80]>
|
||||||
|
0x10 <char const*, char const*>
|
||||||
|
0x10 <char const*, char*>
|
||||||
|
0x10 <char const*, unsigned int>
|
||||||
|
0x10 <int, char [32]>
|
||||||
|
0x10 <int, char const*>
|
||||||
|
0x10 <int, int>
|
||||||
|
0x18 <Writer::Chunk*, unsigned long long, char const*>
|
||||||
|
0x18 <char [32], char const*, char const*>
|
||||||
|
0x18 <char const*, int, char const*>
|
||||||
|
0x18 <char const*, int, int>
|
||||||
|
0x18 <int, int, char const*>
|
||||||
|
0x18 <int, int, int>
|
||||||
|
0x20 <char const*, int, int, char const*>
|
||||||
|
0x20 <int, char const*, int, int>
|
||||||
|
0x28 <char const*, int, int, unsigned int, char const*>
|
||||||
|
0x28 <int, int, int, int, char const*>
|
||||||
|
```
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
::::
|
||||||
|
|
||||||
|
This comes out at 576 bytes total, obviously with tables for more arguments taking up more space.
|
||||||
|
|
||||||
|
In an embedded setting, this will likely be a lot less due to a smaller (16-bit or 32-bit) memory space, and therefore 2× or 4× smaller pointers.
|
||||||
|
|
||||||
|
Now, for the functions themselves.
|
||||||
|
Remember that these instantiations only set up the lookup tables for `format_untyped`, so they're likely to be inlined into the caller---though I've inhibited that with the `[[gnu::noinline]]` attribute, to sum up the figures for this post.
|
||||||
|
|
||||||
|
:::: details
|
||||||
|
|
||||||
|
::: summary
|
||||||
|
|
||||||
|
List of instantiations of the new `fmt::format`
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
::: details-content
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
0x3f <>
|
||||||
|
0x47 <Format_Hex>
|
||||||
|
0x47 <bool>
|
||||||
|
0x47 <char [128]>
|
||||||
|
0x47 <char [20]>
|
||||||
|
0x47 <char [32]>
|
||||||
|
0x47 <char const*>
|
||||||
|
0x47 <char*>
|
||||||
|
0x47 <double>
|
||||||
|
0x47 <float>
|
||||||
|
0x47 <int>
|
||||||
|
0x47 <unsigned int>
|
||||||
|
0x47 <unsigned long>
|
||||||
|
0x49 <Writer::Chunk*, unsigned long long>
|
||||||
|
0x49 <char [128], char const*>
|
||||||
|
0x49 <char const*, Entity_Id>
|
||||||
|
0x49 <char const*, Vec2i>
|
||||||
|
0x49 <char const*, Vec3>
|
||||||
|
0x49 <char const*, char [80]>
|
||||||
|
0x49 <char const*, char const*>
|
||||||
|
0x49 <char const*, char*>
|
||||||
|
0x49 <char const*, unsigned int>
|
||||||
|
0x49 <int, char [32]>
|
||||||
|
0x49 <int, char const*>
|
||||||
|
0x49 <int, int>
|
||||||
|
0x4e <Writer::Chunk*, unsigned long long, char const*>
|
||||||
|
0x4e <char [32], char const*, char const*>
|
||||||
|
0x4e <char const*, int, char const*>
|
||||||
|
0x4e <char const*, int, int>
|
||||||
|
0x4e <int, int, char const*>
|
||||||
|
0x4e <int, int, int>
|
||||||
|
0x53 <char const*, int, int, char const*>
|
||||||
|
0x53 <int, char const*, int, int>
|
||||||
|
0x5d <char const*, int, int, unsigned int, char const*>
|
||||||
|
0x5d <int, int, int, int, char const*>
|
||||||
|
```
|
||||||
|
|
||||||
|
:::
|
||||||
|
|
||||||
|
::::
|
||||||
|
|
||||||
|
That's 2611 bytes of machine code, and summing it up with the space taken up by lookup tables, comes out at 3187 bytes in the executable.
|
||||||
|
That's *61.7%* the size of the previous version---quite a hefty save!
|
||||||
|
|
||||||
|
And this would only multiply in larger codebases.
|
||||||
|
Imagine the megabytes of disk space saved if a refactor of this scale were done on the Unreal Engine...
|
||||||
|
|
||||||
|
|
||||||
### C version
|
### C version
|
||||||
|
@ -348,3 +575,6 @@ The worst part would probably be emulating the parameter packs, because the prep
|
||||||
|
|
||||||
Maybe in another post.
|
Maybe in another post.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Thank you once again to my friend Tori for reviewing a draft of this post!
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue