treehouse/content/fmt2.dj

title = "A simple string formatting library, round two"
id = "doc?20250917-fmt2"
updated = "2025-09-17T20:56:00+02:00"
tags = ["all", "programming", "cxx"]

+++

After implementing [a simple string formatting library in 65 lines of code][page:fmt.dj], I thought I was done.
The API was nice, it did what I needed it to do, so: _project complete_.

One piece of functionality that's commonly available in more complete libraries was missing though, and that was _positional arguments_.

I got a [comment on Lobsters](https://lobste.rs/c/bbwxbz) about it at the time, though I waved it away thinking I wouldn't need positional arguments anyways.
So far, I've been right---there still haven't been any cases in my game where I _needed_ positional arguments---but without them, the library feels a bit... incomplete.

So I started thinking about how I could improve on that.
I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\
Until today.

This write-up describes this an improved version of the library in detail, with support for positional arguments, and much smaller generated code size!
I hope you like it.


## Usage

This library has usage very similar to the original.
The main difference comes from the format string: instead of using `{}` for holes, it uses `%n`, where `n` is a digit between `0` and `9`.

```cpp
fmt::format(buf, "Hello, %0!", "world");
assert(strcmp(str, "Hello, world!") == 0);

fmt::format(buf, "[%0] [%1] %2", "main", "info", "Hewwo :3");
assert(strcmp(str, "[main] [info] Hewwo :3") == 0);
```

`{}` no longer needs escaping, but the `%0` pattern does---and it's done in a manner similar to that of libc `printf`.
To print a single `%` character followed by a digit, double the `%`, like so:

```cpp
fmt::format(buf, "%%0");
assert(strcmp(str, "%0") == 0);
```

Any other patterns are interpreted verbatim, and do not need escaping---though you may choose to double the `%`, for consistency with cases where it does need escaping.

```cpp
fmt::format(buf, "%0% complete", "3.14");
assert(strcmp(str, "3.14% complete") == 0);
```

The big advantage of this library comes from the fact that arguments can be reordered or repeated.

```cpp
fmt::format(
	buf,
	"Words of the day: '%2', '%1', '%0'. You chose '%1'.",
	"matrix", "crystal", "rivulet");
assert(strcmp(
	str,
	"Words of the day: 'rivulet', 'crystal', 'matrix'. You chose 'matrix'."
) == 0);
```

Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, gives back the amount of characters that would be written, and never allows reading arguments out of bounds.

As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`).
This library does the same thing, though using the new format string syntax.

```cpp
fmt::format(buf, "%0% complete" /* no arguments */);
assert(strcmp(str, "%0% complete") == 0);
```


## Implementation walkthrough

The library starts off with the same boilerplate as [the first one][page:fmt.dj].
If you haven't read the original write-up, I would strongly recommend doing that right now.

```cpp
#include <cstring>

struct String_Buffer
{
	char* str;
	int cap;
	int len = 0;
};

static void write(String_Buffer& buf, const char* str, int len)
{
	int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL
	int write_len = len > remaining_cap ? remaining_cap : len;
	if (write_len > 0)
		memcpy(buf.str + buf.len, str, write_len);
	buf.len += len;
}
```

The signatures for functions writing values into the output string stay the same.
This library uses function overloading to resolve which function a value should be printed with, based on the value's type---just like the first library did.

```cpp
void write_value(String_Buffer& buf, const char* value)
{
	write(buf, value, strlen(value));
}

// You may choose to add more overloads here.
```

The difference comes from how parsing the string and writing out format arguments is driven.

Previously, parsing the string was driven by an expansion of a parameter pack.
It allowed the library to remain compact and simple, but it's what prevented reordering or repeating of arguments.
There is no way you can reorder the expansion of a parameter pack (or, a compile-time operation) using run-time values.

This time around, arguments are passed into the library through an array of *type-erased pointers to the values*, as well as *functions that format the values behind those pointers.*

```cpp
using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr);

void format_untyped(
	String_Buffer& buf,
	const char* fstr,
	int nargs,
	const void* const* values,
	Format_Function_Untyped* const* ffuncs);
```

`format` then constructs those arrays in a type-safe manner, helping itself with a template function, which _erases_ its argument's type into `const void*`---allowing its instantiations to be stuffed into an array of function pointers of the same type.

```cpp
template<typename T>
void write_value_erased(String_Buffer& buf, const void* value)
{
	write_value(buf, *(const T*)value);
}

template<typename... Args>
void format(String_Buffer& buf, const char* fstr, const Args&... args)
{
	static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported");
	const void* const values[] = {&args...};
	static Format_Function_Untyped* const ffuncs[] = {&write_value_erased<Args>...};
	format_untyped(buf, fstr, sizeof...(args), values, ffuncs);
}
```

This trick with a template erasing `T*` into `void*` is actually really useful in general for writing polymorphic code.
If there's any knowledge worth remembering from this article, it would be this technique.

Aside from that, we once again make use of parameter packs.
This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).

Among the [`const` soup](https://cdecl.org/), you may notice the `ffuncs` array being `static const`.
This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's read-only data section, instead of generating code to write the function pointers onto the stack.

Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered.

```cpp
void format_untyped(
	String_Buffer& buf,
	const char* fstr,
	int nargs,
	const void* const* values,
	Format_Function_Untyped* const* ffuncs)
{
	const char* start = fstr;
	while (*fstr != 0) {
		if (*fstr == '%') {
			write(buf, start, fstr - start);
			++fstr;
			start = fstr;
			if (*fstr >= '0' && *fstr <= '9') {
				int index = *fstr - '0';
				if (index < nargs) {
					(ffuncs[index])(buf, values[index]);
					++fstr;
					start = fstr;
				} else {
					start = fstr - 1; // include %n sequence verbatim
				}
			}
		} else {
			++fstr;
		}
	}
	write(buf, start, fstr - start);
}
```

The full code listing, split into a header and an implementation file, and packed into a namespace, is available below.

```cpp
#pragma once

struct String_Buffer
{
	char* str;
	int cap;
	int len = 0;
};

namespace fmt {

void write_value(String_Buffer& buf, const char* value);

using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr);

void format_untyped(
	String_Buffer& buf,
	const char* fstr,
	int nargs,
	const void* const* values,
	Format_Function_Untyped* const* ffuncs);

template<typename T>
void write_value_erased(String_Buffer& buf, const void* value)
{
	write_value(buf, *(const T*)value);
}

template<typename... Args>
void format(String_Buffer& buf, const char* fstr, const Args&... args)
{
	static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported");
	const void* values[] = {&args...};
	static Format_Function_Untyped* const ffuncs[] = {&write_value_erased<Args>...};
	format_untyped(buf, fstr, sizeof...(args), values, ffuncs);
}

}
```

```cpp
#include "format.hpp"

#include <cstring>

namespace fmt {

static void write(String_Buffer& buf, const char* str, int len)
{
	int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL
	int write_len = len > remaining_cap ? remaining_cap : len;
	if (write_len > 0)
		memcpy(buf.str + buf.len, str, write_len);
	buf.len += len;
}

void write_value(String_Buffer& buf, const char* value)
{
	write(buf, value, strlen(value));
}

void format_untyped(
	String_Buffer& buf,
	const char* fstr,
	int nargs,
	const void* const* values,
	Format_Function_Untyped* const* ffuncs)
{
	const char* start = fstr;
	while (*fstr != '\0') {
		if (*fstr == '%') {
			write(buf, start, fstr - start);
			++fstr;
			start = fstr;
			if (*fstr >= '0' && *fstr <= '9') {
				int index = *fstr - '0';
				if (index < nargs) {
					(ffuncs[index])(buf, values[index]);
					++fstr;
					start = fstr;
				} else {
					start = fstr - 1; // include %n sequence verbatim
				}
			}
		} else {
			++fstr;
		}
	}
	write(buf, start, fstr - start);
}

}
```

The same [extra goodies][page:fmt#Extras] as in the previous post can be used (implementations of `write_value` for various types, functions improving ergonomics).


## Remarks


### Source code length

This library is slightly larger than the previous, being 73 lines of code long.
I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though.

...is what I would say if I really _cared_ about squeezing every last line of code out of the library, but I don't!
I wrote this little library to be simple, extensible, and maintainable, so don't treat it as code golf.
Go with the extra lines of code.
This version of the library is better.

It's in the same ballpark either way, and honestly I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.\
And `printf`.

Don't forget about `printf`.


### `%` for holes

I chose `%n` as the hole syntax this time, because it was about as simple to parse as the previous choice of `{}`.
If I went with `{n}` instead, I would have to add extra code checking that there's only one digit inside the braces.
The downside of course is that you need to write out the indices manually in case they're a monotonic sequence (`%1, %2, %3`), as there is no natural syntax for an indexless hole (like there is with `{}`).

Also, due to the `%n` syntax only parsing one character, the library is limited to accepting 10 format arguments `%0`--`%9`.
That should be more than enough for 99.999% of your use cases, though.
It would be easier to add support for multiple digits with the `{n}` syntax, but the runtime cost would be bigger than supporting only a single digit.

If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-based by tweaking the digit parsing code in `format_untyped`.


### Code size

The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly.

The new string formatter instead initialises a lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.

This is a good thing for embedded use cases.
You should prefer this implementation over the previous one for that.

In my game, the size emitted into the executable for instantiations of `format` is *61.7%* of the previous version!
Read on for a detailed analysis.

---

To illustrate this a bit, let's set an example.
I have a function `usages` which uses `format` in a few different ways.

```cpp
void usages(
	String_Buffer& buf,
	const char* filename,
	const char* part_name, int part_index,
	Entity_Id entity_id, Vec3 position)
{
	format(buf, "/prop/{}", filename);
	format(buf, "Part #{} ({})", part_index, part_name);
	format(buf, "{} at {}", entity_id, position);
}
```

This will generate three separate template instantiations of `format`:

```cpp
void format<char const*>(String_Buffer&, char const*, char const* const&)
void format<int, char const*>(String_Buffer&, char const*, int const&, char const* const&)
void format<Entity_Id, Vec3>(String_Buffer&, char const*, Entity_Id const&, Vec3 const&)
```

Clang 21.1.0 with `-O3` inlines these into the calling function, but let's consider them out of line for this example.
Each instantiation, after inlining, results in code that looks like this:

```cpp
void format<int, char const*>(
	String_Buffer& buf,
	char const* fstr,
	int const& a1,
	char const* const& a2)
{
	if (next_hole(buf, fstr))
		write_value(buf, a1);
	if (next_hole(buf, fstr))
		write_value(buf, a2);
	while (next_hole(buf, fstr)) {}
}
```

From the machine code perspective, this comes out to 120 bytes of code.
This doesn't seem like a lot, but it quickly multiplies when you consider that format function calls are going to end up having different sets of arguments types.

My game currently has about 16k lines of code, though barely any user-facing text right now---most of it is logs and ImGui strings---but there are quite a few unique instantiations of `format` (35 to be exact.)
Here's the full list with their byte sizes (click the *bold* header to unfold the list).

:::: details

::: summary

List of `fmt::format` instantiations sorted by byte size

:::

::: details-content

```cpp
0x5c <>
0x6c <bool>
0x6c <char [128]>
0x6c <char [20]>
0x6c <char [32]>
0x6c <char const*>
0x6c <char*>
0x6c <double>
0x6c <int>
0x6c <unsigned int>
0x6c <unsigned long>
0x7c <Format_Hex>
0x7c <float>
0x8e <Writer::Chunk*, unsigned long long>
0x8e <char [128], char const*>
0x8e <char const*, Entity_Id>
0x8e <char const*, Vec2i>
0x8e <char const*, Vec3>
0x8e <char const*, char [80]>
0x8e <char const*, char const*>
0x8e <char const*, char*>
0x8e <char const*, unsigned int>
0x8e <int, char [32]>
0x8e <int, char const*>
0x8e <int, int>
0xb0 <char [32], char const*, char const*>
0xc0 <Writer::Chunk*, unsigned long long, char const*>
0xc0 <char const*, int, char const*>
0xc0 <char const*, int, int>
0xc0 <int, int, char const*>
0xc0 <int, int, int>
0xd2 <char const*, int, int, char const*>
0xd2 <int, char const*, int, int>
0xf2 <char const*, int, int, unsigned int, char const*>
0xf2 <int, int, int, int, char const*>
```

:::

::::

Summing it all up, that's 5164 bytes of machine code.
For 35 unique combinations of arguments!

Now, let's replace the previous function with the new one.
Granted, this is using the new `%n` syntax which is incompatible with the old `{}`, and I haven't replaced the format strings---but the format string themselves do not affect the machine code size, so that's fine.

First, there's the static data for the function lookup tables.

:::: details

::: summary

List of lookup tables from instantiations of the new `fmt::format`

:::

::: details-content

```cpp
0x00 <>
0x08 <Format_Hex>
0x08 <bool>
0x08 <char [128]>
0x08 <char [20]>
0x08 <char [32]>
0x08 <char const*>
0x08 <char*>
0x08 <double>
0x08 <float>
0x08 <int>
0x08 <unsigned int>
0x08 <unsigned long>
0x10 <Writer::Chunk*, unsigned long long>
0x10 <char [128], char const*>
0x10 <char const*, Entity_Id>
0x10 <char const*, Vec2i>
0x10 <char const*, Vec3>
0x10 <char const*, char [80]>
0x10 <char const*, char const*>
0x10 <char const*, char*>
0x10 <char const*, unsigned int>
0x10 <int, char [32]>
0x10 <int, char const*>
0x10 <int, int>
0x18 <Writer::Chunk*, unsigned long long, char const*>
0x18 <char [32], char const*, char const*>
0x18 <char const*, int, char const*>
0x18 <char const*, int, int>
0x18 <int, int, char const*>
0x18 <int, int, int>
0x20 <char const*, int, int, char const*>
0x20 <int, char const*, int, int>
0x28 <char const*, int, int, unsigned int, char const*>
0x28 <int, int, int, int, char const*>
```

:::

::::

This comes out at 576 bytes total, obviously with tables for more arguments taking up more space.

In an embedded setting, this will likely be a lot less due to a smaller (16-bit or 32-bit) memory space, and therefore 2× or 4× smaller pointers.

Now, for the functions themselves.
Remember that these instantiations only set up the lookup tables for `format_untyped`, so they're likely to be inlined into the caller---though I've inhibited that with the `[[gnu::noinline]]` attribute, to sum up the figures for this post.

:::: details

::: summary

List of instantiations of the new `fmt::format`

:::

::: details-content

```cpp
0x3f <>
0x47 <Format_Hex>
0x47 <bool>
0x47 <char [128]>
0x47 <char [20]>
0x47 <char [32]>
0x47 <char const*>
0x47 <char*>
0x47 <double>
0x47 <float>
0x47 <int>
0x47 <unsigned int>
0x47 <unsigned long>
0x49 <Writer::Chunk*, unsigned long long>
0x49 <char [128], char const*>
0x49 <char const*, Entity_Id>
0x49 <char const*, Vec2i>
0x49 <char const*, Vec3>
0x49 <char const*, char [80]>
0x49 <char const*, char const*>
0x49 <char const*, char*>
0x49 <char const*, unsigned int>
0x49 <int, char [32]>
0x49 <int, char const*>
0x49 <int, int>
0x4e <Writer::Chunk*, unsigned long long, char const*>
0x4e <char [32], char const*, char const*>
0x4e <char const*, int, char const*>
0x4e <char const*, int, int>
0x4e <int, int, char const*>
0x4e <int, int, int>
0x53 <char const*, int, int, char const*>
0x53 <int, char const*, int, int>
0x5d <char const*, int, int, unsigned int, char const*>
0x5d <int, int, int, int, char const*>
```

:::

::::

That's 2611 bytes of machine code, and summing it up with the space taken up by lookup tables, comes out at 3187 bytes in the executable.
That's *61.7%* the size of the previous version---quite a hefty save!

And this would only multiply in larger codebases.
Imagine the megabytes of disk space saved if a refactor of this scale were done on the Unreal Engine...


### C version

It feels to me like with a bit of elbow grease, this library could be adapted to plain old C as well.

You would have to replace the template bits with macros and `...` / `va_list` though, and I'm not entirely sure how to resolve types into functions---feels like C11 `_Generic` could be of help.

The worst part would probably be emulating the parameter packs, because the preprocessor doesn't make it easy to transform the individual elements in a `__VA_ARGS__` list---but it [seems possible](https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms), if a bit horrid.

Maybe in another post.

---

Thank you once again to my friend Tori for giving feedback on a draft of this post!