578 lines
18 KiB
Text
578 lines
18 KiB
Text
title = "A simple string formatting library, round two"
|
||
id = "doc?20250917-fmt2"
|
||
updated = "2025-09-17T20:56:00+02:00"
|
||
tags = ["all", "programming", "cxx"]
|
||
|
||
+++
|
||
|
||
After implementing [a simple string formatting library in 65 lines of code][page:fmt.dj], I thought I was done.
|
||
The API was nice, it did what I needed it to do, so: _project complete_.
|
||
|
||
One piece of functionality that's commonly available in more complete libraries was missing though, and that was _positional arguments_.
|
||
|
||
I got a [comment on Lobsters](https://lobste.rs/c/bbwxbz) about it at the time, though I waved it away thinking I wouldn't need positional arguments anyways.
|
||
So far, I've been right---there still haven't been any cases in my game where I _needed_ positional arguments---but without them, the library feels a bit... incomplete.
|
||
|
||
So I started thinking about how I could improve on that.
|
||
I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\
|
||
Until today.
|
||
|
||
This write-up describes this an improved version of the library in detail, with support for positional arguments, and much smaller generated code size!
|
||
I hope you like it.
|
||
|
||
|
||
## Usage
|
||
|
||
This library has usage very similar to the original.
|
||
The main difference comes from the format string: instead of using `{}` for holes, it uses `%n`, where `n` is a digit between `0` and `9`.
|
||
|
||
```cpp
|
||
fmt::format(buf, "Hello, %0!", "world");
|
||
assert(strcmp(str, "Hello, world!") == 0);
|
||
|
||
fmt::format(buf, "[%0] [%1] %2", "main", "info", "Hewwo :3");
|
||
assert(strcmp(str, "[main] [info] Hewwo :3") == 0);
|
||
```
|
||
|
||
`{}` no longer needs escaping, but the `%0` pattern does---and it's done in a manner similar to that of libc `printf`.
|
||
To print a single `%` character followed by a digit, double the `%`, like so:
|
||
|
||
```cpp
|
||
fmt::format(buf, "%%0");
|
||
assert(strcmp(str, "%0") == 0);
|
||
```
|
||
|
||
Any other patterns are interpreted verbatim, and do not need escaping---though you may choose to double the `%`, for consistency with cases where it does need escaping.
|
||
|
||
```cpp
|
||
fmt::format(buf, "%0% complete", "3.14");
|
||
assert(strcmp(str, "3.14% complete") == 0);
|
||
```
|
||
|
||
The big advantage of this library comes from the fact that arguments can be reordered or repeated.
|
||
|
||
```cpp
|
||
fmt::format(
|
||
buf,
|
||
"Words of the day: '%2', '%1', '%0'. You chose '%1'.",
|
||
"matrix", "crystal", "rivulet");
|
||
assert(strcmp(
|
||
str,
|
||
"Words of the day: 'rivulet', 'crystal', 'matrix'. You chose 'matrix'."
|
||
) == 0);
|
||
```
|
||
|
||
Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, gives back the amount of characters that would be written, and never allows reading arguments out of bounds.
|
||
|
||
As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`).
|
||
This library does the same thing, though using the new format string syntax.
|
||
|
||
```cpp
|
||
fmt::format(buf, "%0% complete" /* no arguments */);
|
||
assert(strcmp(str, "%0% complete") == 0);
|
||
```
|
||
|
||
|
||
## Implementation walkthrough
|
||
|
||
The library starts off with the same boilerplate as [the first one][page:fmt.dj].
|
||
If you haven't read the original write-up, I would strongly recommend doing that right now.
|
||
|
||
```cpp
|
||
#include <cstring>
|
||
|
||
struct String_Buffer
|
||
{
|
||
char* str;
|
||
int cap;
|
||
int len = 0;
|
||
};
|
||
|
||
static void write(String_Buffer& buf, const char* str, int len)
|
||
{
|
||
int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL
|
||
int write_len = len > remaining_cap ? remaining_cap : len;
|
||
if (write_len > 0)
|
||
memcpy(buf.str + buf.len, str, write_len);
|
||
buf.len += len;
|
||
}
|
||
```
|
||
|
||
The signatures for functions writing values into the output string stay the same.
|
||
This library uses function overloading to resolve which function a value should be printed with, based on the value's type---just like the first library did.
|
||
|
||
```cpp
|
||
void write_value(String_Buffer& buf, const char* value)
|
||
{
|
||
write(buf, value, strlen(value));
|
||
}
|
||
|
||
// You may choose to add more overloads here.
|
||
```
|
||
|
||
The difference comes from how parsing the string and writing out format arguments is driven.
|
||
|
||
Previously, parsing the string was driven by an expansion of a parameter pack.
|
||
It allowed the library to remain compact and simple, but it's what prevented reordering or repeating of arguments.
|
||
There is no way you can reorder the expansion of a parameter pack (or, a compile-time operation) using run-time values.
|
||
|
||
This time around, arguments are passed into the library through an array of *type-erased pointers to the values*, as well as *functions that format the values behind those pointers.*
|
||
|
||
```cpp
|
||
using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr);
|
||
|
||
void format_untyped(
|
||
String_Buffer& buf,
|
||
const char* fstr,
|
||
int nargs,
|
||
const void* const* values,
|
||
Format_Function_Untyped* const* ffuncs);
|
||
```
|
||
|
||
`format` then constructs those arrays in a type-safe manner, helping itself with a template function, which _erases_ its argument's type into `const void*`---allowing its instantiations to be stuffed into an array of function pointers of the same type.
|
||
|
||
```cpp
|
||
template<typename T>
|
||
void write_value_erased(String_Buffer& buf, const void* value)
|
||
{
|
||
write_value(buf, *(const T*)value);
|
||
}
|
||
|
||
template<typename... Args>
|
||
void format(String_Buffer& buf, const char* fstr, const Args&... args)
|
||
{
|
||
static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported");
|
||
const void* const values[] = {&args...};
|
||
static Format_Function_Untyped* const ffuncs[] = {&write_value_erased<Args>...};
|
||
format_untyped(buf, fstr, sizeof...(args), values, ffuncs);
|
||
}
|
||
```
|
||
|
||
This trick with a template erasing `T*` into `void*` is actually really useful in general for writing polymorphic code.
|
||
If there's any knowledge worth remembering from this article, it would be this technique.
|
||
|
||
Aside from that, we once again make use of parameter packs.
|
||
This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).
|
||
|
||
Among the [`const` soup](https://cdecl.org/), you may notice the `ffuncs` array being `static const`.
|
||
This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's read-only data section, instead of generating code to write the function pointers onto the stack.
|
||
|
||
Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered.
|
||
|
||
```cpp
|
||
void format_untyped(
|
||
String_Buffer& buf,
|
||
const char* fstr,
|
||
int nargs,
|
||
const void* const* values,
|
||
Format_Function_Untyped* const* ffuncs)
|
||
{
|
||
const char* start = fstr;
|
||
while (*fstr != 0) {
|
||
if (*fstr == '%') {
|
||
write(buf, start, fstr - start);
|
||
++fstr;
|
||
start = fstr;
|
||
if (*fstr >= '0' && *fstr <= '9') {
|
||
int index = *fstr - '0';
|
||
if (index < nargs) {
|
||
(ffuncs[index])(buf, values[index]);
|
||
++fstr;
|
||
start = fstr;
|
||
} else {
|
||
start = fstr - 1; // include %n sequence verbatim
|
||
}
|
||
}
|
||
} else {
|
||
++fstr;
|
||
}
|
||
}
|
||
write(buf, start, fstr - start);
|
||
}
|
||
```
|
||
|
||
The full code listing, split into a header and an implementation file, and packed into a namespace, is available below.
|
||
|
||
```cpp
|
||
#pragma once
|
||
|
||
struct String_Buffer
|
||
{
|
||
char* str;
|
||
int cap;
|
||
int len = 0;
|
||
};
|
||
|
||
namespace fmt {
|
||
|
||
void write_value(String_Buffer& buf, const char* value);
|
||
|
||
using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr);
|
||
|
||
void format_untyped(
|
||
String_Buffer& buf,
|
||
const char* fstr,
|
||
int nargs,
|
||
const void* const* values,
|
||
Format_Function_Untyped* const* ffuncs);
|
||
|
||
template<typename T>
|
||
void write_value_erased(String_Buffer& buf, const void* value)
|
||
{
|
||
write_value(buf, *(const T*)value);
|
||
}
|
||
|
||
template<typename... Args>
|
||
void format(String_Buffer& buf, const char* fstr, const Args&... args)
|
||
{
|
||
static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported");
|
||
const void* values[] = {&args...};
|
||
static Format_Function_Untyped* const ffuncs[] = {&write_value_erased<Args>...};
|
||
format_untyped(buf, fstr, sizeof...(args), values, ffuncs);
|
||
}
|
||
|
||
}
|
||
```
|
||
|
||
```cpp
|
||
#include "format.hpp"
|
||
|
||
#include <cstring>
|
||
|
||
namespace fmt {
|
||
|
||
static void write(String_Buffer& buf, const char* str, int len)
|
||
{
|
||
int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL
|
||
int write_len = len > remaining_cap ? remaining_cap : len;
|
||
if (write_len > 0)
|
||
memcpy(buf.str + buf.len, str, write_len);
|
||
buf.len += len;
|
||
}
|
||
|
||
void write_value(String_Buffer& buf, const char* value)
|
||
{
|
||
write(buf, value, strlen(value));
|
||
}
|
||
|
||
void format_untyped(
|
||
String_Buffer& buf,
|
||
const char* fstr,
|
||
int nargs,
|
||
const void* const* values,
|
||
Format_Function_Untyped* const* ffuncs)
|
||
{
|
||
const char* start = fstr;
|
||
while (*fstr != '\0') {
|
||
if (*fstr == '%') {
|
||
write(buf, start, fstr - start);
|
||
++fstr;
|
||
start = fstr;
|
||
if (*fstr >= '0' && *fstr <= '9') {
|
||
int index = *fstr - '0';
|
||
if (index < nargs) {
|
||
(ffuncs[index])(buf, values[index]);
|
||
++fstr;
|
||
start = fstr;
|
||
} else {
|
||
start = fstr - 1; // include %n sequence verbatim
|
||
}
|
||
}
|
||
} else {
|
||
++fstr;
|
||
}
|
||
}
|
||
write(buf, start, fstr - start);
|
||
}
|
||
|
||
}
|
||
```
|
||
|
||
The same [extra goodies][page:fmt#Extras] as in the previous post can be used (implementations of `write_value` for various types, functions improving ergonomics).
|
||
|
||
|
||
## Remarks
|
||
|
||
|
||
### Source code length
|
||
|
||
This library is slightly larger than the previous, being 73 lines of code long.
|
||
I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though.
|
||
|
||
...is what I would say if I really _cared_ about squeezing every last line of code out of the library, but I don't!
|
||
I wrote this little library to be simple, extensible, and maintainable, so don't treat it as code golf.
|
||
Go with the extra lines of code.
|
||
This version of the library is better.
|
||
|
||
It's in the same ballpark either way, and honestly I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.\
|
||
And `printf`.
|
||
|
||
Don't forget about `printf`.
|
||
|
||
|
||
### `%` for holes
|
||
|
||
I chose `%n` as the hole syntax this time, because it was about as simple to parse as the previous choice of `{}`.
|
||
If I went with `{n}` instead, I would have to add extra code checking that there's only one digit inside the braces.
|
||
The downside of course is that you need to write out the indices manually in case they're a monotonic sequence (`%1, %2, %3`), as there is no natural syntax for an indexless hole (like there is with `{}`).
|
||
|
||
Also, due to the `%n` syntax only parsing one character, the library is limited to accepting 10 format arguments `%0`--`%9`.
|
||
That should be more than enough for 99.999% of your use cases, though.
|
||
It would be easier to add support for multiple digits with the `{n}` syntax, but the runtime cost would be bigger than supporting only a single digit.
|
||
|
||
If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-based by tweaking the digit parsing code in `format_untyped`.
|
||
|
||
|
||
### Code size
|
||
|
||
The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly.
|
||
|
||
The new string formatter instead initialises a lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
|
||
|
||
This is a good thing for embedded use cases.
|
||
You should prefer this implementation over the previous one for that.
|
||
|
||
In my game, the size emitted into the executable for instantiations of `format` is *61.7%* of the previous version!
|
||
Read on for a detailed analysis.
|
||
|
||
---
|
||
|
||
To illustrate this a bit, let's set an example.
|
||
I have a function `usages` which uses `format` in a few different ways.
|
||
|
||
```cpp
|
||
void usages(
|
||
String_Buffer& buf,
|
||
const char* filename,
|
||
const char* part_name, int part_index,
|
||
Entity_Id entity_id, Vec3 position)
|
||
{
|
||
format(buf, "/prop/{}", filename);
|
||
format(buf, "Part #{} ({})", part_index, part_name);
|
||
format(buf, "{} at {}", entity_id, position);
|
||
}
|
||
```
|
||
|
||
This will generate three separate template instantiations of `format`:
|
||
|
||
```cpp
|
||
void format<char const*>(String_Buffer&, char const*, char const* const&)
|
||
void format<int, char const*>(String_Buffer&, char const*, int const&, char const* const&)
|
||
void format<Entity_Id, Vec3>(String_Buffer&, char const*, Entity_Id const&, Vec3 const&)
|
||
```
|
||
|
||
Clang 21.1.0 with `-O3` inlines these into the calling function, but let's consider them out of line for this example.
|
||
Each instantiation, after inlining, results in code that looks like this:
|
||
|
||
```cpp
|
||
void format<int, char const*>(
|
||
String_Buffer& buf,
|
||
char const* fstr,
|
||
int const& a1,
|
||
char const* const& a2)
|
||
{
|
||
if (next_hole(buf, fstr))
|
||
write_value(buf, a1);
|
||
if (next_hole(buf, fstr))
|
||
write_value(buf, a2);
|
||
while (next_hole(buf, fstr)) {}
|
||
}
|
||
```
|
||
|
||
From the machine code perspective, this comes out to 120 bytes of code.
|
||
This doesn't seem like a lot, but it quickly multiplies when you consider that format function calls are going to end up having different sets of arguments types.
|
||
|
||
My game currently has about 16k lines of code, though barely any user-facing text right now---most of it is logs and ImGui strings---but there are quite a few unique instantiations of `format` (35 to be exact.)
|
||
Here's the full list with their byte sizes (click the *bold* header to unfold the list).
|
||
|
||
:::: details
|
||
|
||
::: summary
|
||
|
||
List of `fmt::format` instantiations sorted by byte size
|
||
|
||
:::
|
||
|
||
::: details-content
|
||
|
||
```cpp
|
||
0x5c <>
|
||
0x6c <bool>
|
||
0x6c <char [128]>
|
||
0x6c <char [20]>
|
||
0x6c <char [32]>
|
||
0x6c <char const*>
|
||
0x6c <char*>
|
||
0x6c <double>
|
||
0x6c <int>
|
||
0x6c <unsigned int>
|
||
0x6c <unsigned long>
|
||
0x7c <Format_Hex>
|
||
0x7c <float>
|
||
0x8e <Writer::Chunk*, unsigned long long>
|
||
0x8e <char [128], char const*>
|
||
0x8e <char const*, Entity_Id>
|
||
0x8e <char const*, Vec2i>
|
||
0x8e <char const*, Vec3>
|
||
0x8e <char const*, char [80]>
|
||
0x8e <char const*, char const*>
|
||
0x8e <char const*, char*>
|
||
0x8e <char const*, unsigned int>
|
||
0x8e <int, char [32]>
|
||
0x8e <int, char const*>
|
||
0x8e <int, int>
|
||
0xb0 <char [32], char const*, char const*>
|
||
0xc0 <Writer::Chunk*, unsigned long long, char const*>
|
||
0xc0 <char const*, int, char const*>
|
||
0xc0 <char const*, int, int>
|
||
0xc0 <int, int, char const*>
|
||
0xc0 <int, int, int>
|
||
0xd2 <char const*, int, int, char const*>
|
||
0xd2 <int, char const*, int, int>
|
||
0xf2 <char const*, int, int, unsigned int, char const*>
|
||
0xf2 <int, int, int, int, char const*>
|
||
```
|
||
|
||
:::
|
||
|
||
::::
|
||
|
||
Summing it all up, that's 5164 bytes of machine code.
|
||
For 35 unique combinations of arguments!
|
||
|
||
Now, let's replace the previous function with the new one.
|
||
Granted, this is using the new `%n` syntax which is incompatible with the old `{}`, and I haven't replaced the format strings---but the format string themselves do not affect the machine code size, so that's fine.
|
||
|
||
First, there's the static data for the function lookup tables.
|
||
|
||
:::: details
|
||
|
||
::: summary
|
||
|
||
List of lookup tables from instantiations of the new `fmt::format`
|
||
|
||
:::
|
||
|
||
::: details-content
|
||
|
||
```cpp
|
||
0x00 <>
|
||
0x08 <Format_Hex>
|
||
0x08 <bool>
|
||
0x08 <char [128]>
|
||
0x08 <char [20]>
|
||
0x08 <char [32]>
|
||
0x08 <char const*>
|
||
0x08 <char*>
|
||
0x08 <double>
|
||
0x08 <float>
|
||
0x08 <int>
|
||
0x08 <unsigned int>
|
||
0x08 <unsigned long>
|
||
0x10 <Writer::Chunk*, unsigned long long>
|
||
0x10 <char [128], char const*>
|
||
0x10 <char const*, Entity_Id>
|
||
0x10 <char const*, Vec2i>
|
||
0x10 <char const*, Vec3>
|
||
0x10 <char const*, char [80]>
|
||
0x10 <char const*, char const*>
|
||
0x10 <char const*, char*>
|
||
0x10 <char const*, unsigned int>
|
||
0x10 <int, char [32]>
|
||
0x10 <int, char const*>
|
||
0x10 <int, int>
|
||
0x18 <Writer::Chunk*, unsigned long long, char const*>
|
||
0x18 <char [32], char const*, char const*>
|
||
0x18 <char const*, int, char const*>
|
||
0x18 <char const*, int, int>
|
||
0x18 <int, int, char const*>
|
||
0x18 <int, int, int>
|
||
0x20 <char const*, int, int, char const*>
|
||
0x20 <int, char const*, int, int>
|
||
0x28 <char const*, int, int, unsigned int, char const*>
|
||
0x28 <int, int, int, int, char const*>
|
||
```
|
||
|
||
:::
|
||
|
||
::::
|
||
|
||
This comes out at 576 bytes total, obviously with tables for more arguments taking up more space.
|
||
|
||
In an embedded setting, this will likely be a lot less due to a smaller (16-bit or 32-bit) memory space, and therefore 2× or 4× smaller pointers.
|
||
|
||
Now, for the functions themselves.
|
||
Remember that these instantiations only set up the lookup tables for `format_untyped`, so they're likely to be inlined into the caller---though I've inhibited that with the `[[gnu::noinline]]` attribute, to sum up the figures for this post.
|
||
|
||
:::: details
|
||
|
||
::: summary
|
||
|
||
List of instantiations of the new `fmt::format`
|
||
|
||
:::
|
||
|
||
::: details-content
|
||
|
||
```cpp
|
||
0x3f <>
|
||
0x47 <Format_Hex>
|
||
0x47 <bool>
|
||
0x47 <char [128]>
|
||
0x47 <char [20]>
|
||
0x47 <char [32]>
|
||
0x47 <char const*>
|
||
0x47 <char*>
|
||
0x47 <double>
|
||
0x47 <float>
|
||
0x47 <int>
|
||
0x47 <unsigned int>
|
||
0x47 <unsigned long>
|
||
0x49 <Writer::Chunk*, unsigned long long>
|
||
0x49 <char [128], char const*>
|
||
0x49 <char const*, Entity_Id>
|
||
0x49 <char const*, Vec2i>
|
||
0x49 <char const*, Vec3>
|
||
0x49 <char const*, char [80]>
|
||
0x49 <char const*, char const*>
|
||
0x49 <char const*, char*>
|
||
0x49 <char const*, unsigned int>
|
||
0x49 <int, char [32]>
|
||
0x49 <int, char const*>
|
||
0x49 <int, int>
|
||
0x4e <Writer::Chunk*, unsigned long long, char const*>
|
||
0x4e <char [32], char const*, char const*>
|
||
0x4e <char const*, int, char const*>
|
||
0x4e <char const*, int, int>
|
||
0x4e <int, int, char const*>
|
||
0x4e <int, int, int>
|
||
0x53 <char const*, int, int, char const*>
|
||
0x53 <int, char const*, int, int>
|
||
0x5d <char const*, int, int, unsigned int, char const*>
|
||
0x5d <int, int, int, int, char const*>
|
||
```
|
||
|
||
:::
|
||
|
||
::::
|
||
|
||
That's 2611 bytes of machine code, and summing it up with the space taken up by lookup tables, comes out at 3187 bytes in the executable.
|
||
That's *61.7%* the size of the previous version---quite a hefty save!
|
||
|
||
And this would only multiply in larger codebases.
|
||
Imagine the megabytes of disk space saved if a refactor of this scale were done on the Unreal Engine...
|
||
|
||
|
||
### C version
|
||
|
||
It feels to me like with a bit of elbow grease, this library could be adapted to plain old C as well.
|
||
|
||
You would have to replace the template bits with macros and `...` / `va_list` though, and I'm not entirely sure how to resolve types into functions---feels like C11 `_Generic` could be of help.
|
||
|
||
The worst part would probably be emulating the parameter packs, because the preprocessor doesn't make it easy to transform the individual elements in a `__VA_ARGS__` list---but it [seems possible](https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms), if a bit horrid.
|
||
|
||
Maybe in another post.
|
||
|
||
---
|
||
|
||
Thank you once again to my friend Tori for giving feedback on a draft of this post!
|