fmt2: draft
This commit is contained in:
parent
ab4d77780e
commit
530c137223
1 changed files with 350 additions and 0 deletions
350
content/fmt2.dj
Normal file
350
content/fmt2.dj
Normal file
|
@ -0,0 +1,350 @@
|
||||||
|
title = "A simple string formatting library, round two"
|
||||||
|
|
||||||
|
+++
|
||||||
|
|
||||||
|
_This is a draft post.
|
||||||
|
Please don't share it around just yet!_
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
After implementing [a simple string formatting library in 65 lines of code][page:fmt.dj], I thought I was done.
|
||||||
|
The API was nice, it did what I needed it to do, so: _project complete_.
|
||||||
|
|
||||||
|
One piece of functionality that's commonly available in more complete libraries was missing though, and that was _positional arguments_.
|
||||||
|
|
||||||
|
I got a [comment on Lobsters](https://lobste.rs/c/bbwxbz) about it at the time, though I waved it away thinking I wouldn't need positional arguments anyways.
|
||||||
|
So far, I've been right---there still haven't been any cases in my game where I _needed_ positional arguments---but without them, the library feels a bit... incomplete.
|
||||||
|
|
||||||
|
So I started thinking about how I could improve on that.
|
||||||
|
I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\
|
||||||
|
Until today.
|
||||||
|
|
||||||
|
This write-up describes this alternative version of the library in detail.
|
||||||
|
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
This library has usage very similar to the original.
|
||||||
|
The main difference comes from the format string: instead of using `{}` for holes, it uses `%n`, where `n` is a digit between `0` and `9`.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
fmt::format(buf, "Hello, %0!", "world");
|
||||||
|
assert(strcmp(str, "Hello, world!") == 0);
|
||||||
|
|
||||||
|
fmt::format(buf, "[%0] [%1] %2", "main", "info", "Hewwo :3");
|
||||||
|
assert(strcmp(str, "[main] [info] Hewwo :3") == 0);
|
||||||
|
```
|
||||||
|
|
||||||
|
`{}` no longer needs escaping, but the `%0` pattern does---and it's done in a manner similar to that of libc `printf`.
|
||||||
|
To print a single `%` character followed by a digit, double the `%`, like so:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
fmt::format(buf, "%%0");
|
||||||
|
assert(strcmp(str, "%0") == 0);
|
||||||
|
```
|
||||||
|
|
||||||
|
Any other patterns are interpreted verbatim, and do not need escaping---though you may choose to double the `%`, for consistency with cases where it does need escaping.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
fmt::format(buf, "%0% complete", "3.14");
|
||||||
|
assert(strcmp(str, "3.14% complete") == 0);
|
||||||
|
```
|
||||||
|
|
||||||
|
The big advantage of this library comes from the fact that arguments can be reordered or repeated.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
fmt::format(
|
||||||
|
buf,
|
||||||
|
"Words of the day: '%2', '%1', '%0'. You chose '%1'.",
|
||||||
|
"matrix", "crystal", "rivulet");
|
||||||
|
assert(strcmp(
|
||||||
|
str,
|
||||||
|
"Words of the day: 'rivulet', 'crystal', 'matrix'. You chose 'matrix'."
|
||||||
|
) == 0);
|
||||||
|
```
|
||||||
|
|
||||||
|
Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, and never allows reading arguments out of bounds.
|
||||||
|
|
||||||
|
As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`).
|
||||||
|
This library does the same thing, though using the new format string syntax.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
fmt::format(buf, "%0% complete" /* no arguments */);
|
||||||
|
assert(strcmp(str, "%0% complete") == 0);
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Implementation walkthrough
|
||||||
|
|
||||||
|
The library starts off with the same boilerplate as [the first one][page:fmt.dj].
|
||||||
|
If you haven't read the original write-up, I would strongly recommend doing that right now.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <cstring>
|
||||||
|
|
||||||
|
struct String_Buffer
|
||||||
|
{
|
||||||
|
char* str;
|
||||||
|
int cap;
|
||||||
|
int len = 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
static void write(String_Buffer& buf, const char* str, int len)
|
||||||
|
{
|
||||||
|
int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL
|
||||||
|
int write_len = len > remaining_cap ? remaining_cap : len;
|
||||||
|
if (write_len > 0)
|
||||||
|
memcpy(buf.str + buf.len, str, write_len);
|
||||||
|
buf.len += len;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The signatures for functions writing values into the output string stay the same.
|
||||||
|
This library uses function overloading to resolve which function a value should be printed with, based on the value's type---just like the first library did.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
void write_value(String_Buffer& buf, const char* value)
|
||||||
|
{
|
||||||
|
write(buf, value, strlen(value));
|
||||||
|
}
|
||||||
|
|
||||||
|
// You may choose to add more overloads here.
|
||||||
|
```
|
||||||
|
|
||||||
|
The difference comes from how parsing the string and writing out format arguments is driven.
|
||||||
|
|
||||||
|
Previously, parsing the string was driven by an expansion of a parameter pack.
|
||||||
|
It allowed the library to remain compact and simple, but it's what prevented reordering or repeating of arguments.
|
||||||
|
There is no way you can reorder the expansion of a parameter pack (or, a compile-time operation) using run-time values.
|
||||||
|
|
||||||
|
This time around, arguments are passed into the library through an array of *type-erased pointers to the values*, as well as *functions that format the values behind those pointers.*
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr);
|
||||||
|
|
||||||
|
void format_untyped(
|
||||||
|
String_Buffer& buf,
|
||||||
|
const char* fstr,
|
||||||
|
int nargs,
|
||||||
|
const void* const* values,
|
||||||
|
Format_Function_Untyped* const* ffuncs);
|
||||||
|
```
|
||||||
|
|
||||||
|
`format` then constructs those arrays in a type-safe manner, helping itself with a template function, which _erases_ its argument's type into `const void*`---allowing its instantiations to be stuffed into an array of function pointers of the same type.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
template<typename T>
|
||||||
|
void write_value_erased(String_Buffer& buf, const void* value)
|
||||||
|
{
|
||||||
|
write_value(buf, *(const T*)value);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<typename... Args>
|
||||||
|
void format(String_Buffer& buf, const char* fstr, const Args&... args)
|
||||||
|
{
|
||||||
|
static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported");
|
||||||
|
const void* const values[] = {&args...};
|
||||||
|
static Format_Function_Untyped* const ffuncs[] = {&write_value_erased<Args>...};
|
||||||
|
format_untyped(buf, fstr, sizeof...(args), values, ffuncs);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This trick with a template erasing `T*` into `void*` is actually really useful in general for writing polymorphic code.
|
||||||
|
If there's any knowledge worth remembering from this article, it would be this technique.
|
||||||
|
|
||||||
|
Aside from that, we once again make use of parameter packs.
|
||||||
|
This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an ordinary [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).
|
||||||
|
|
||||||
|
Among the `const` soup, you may notice the `ffuncs` array being `static const`.
|
||||||
|
This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's `.rodata` section instead of copying the function pointers onto the stack.
|
||||||
|
|
||||||
|
Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
void format_untyped(
|
||||||
|
String_Buffer& buf,
|
||||||
|
const char* fstr,
|
||||||
|
int nargs,
|
||||||
|
const void* const* values,
|
||||||
|
Format_Function_Untyped* const* ffuncs)
|
||||||
|
{
|
||||||
|
const char* start = fstr;
|
||||||
|
while (*fstr != 0) {
|
||||||
|
if (*fstr == '%') {
|
||||||
|
write(buf, start, fstr - start);
|
||||||
|
++fstr;
|
||||||
|
start = fstr;
|
||||||
|
if (*fstr >= '0' && *fstr <= '9') {
|
||||||
|
int index = *fstr - '0';
|
||||||
|
if (index < nargs) {
|
||||||
|
(ffuncs[index])(buf, values[index]);
|
||||||
|
++fstr;
|
||||||
|
start = fstr;
|
||||||
|
} else {
|
||||||
|
start = fstr - 1; // include %n sequence verbatim
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
++fstr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
write(buf, start, fstr - start);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The full code listing, split into a header and an implementation file, and packed into a namespace, is available below.
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
struct String_Buffer
|
||||||
|
{
|
||||||
|
char* str;
|
||||||
|
int cap;
|
||||||
|
int len = 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
namespace fmt {
|
||||||
|
|
||||||
|
void write_value(String_Buffer& buf, const char* value);
|
||||||
|
|
||||||
|
using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr);
|
||||||
|
|
||||||
|
void format_untyped(
|
||||||
|
String_Buffer& buf,
|
||||||
|
const char* fstr,
|
||||||
|
int nargs,
|
||||||
|
const void* const* values,
|
||||||
|
Format_Function_Untyped* const* ffuncs);
|
||||||
|
|
||||||
|
template<typename T>
|
||||||
|
void write_value_erased(String_Buffer& buf, const void* value)
|
||||||
|
{
|
||||||
|
write_value(buf, *(const T*)value);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<typename... Args>
|
||||||
|
void format(String_Buffer& buf, const char* fstr, const Args&... args)
|
||||||
|
{
|
||||||
|
static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported");
|
||||||
|
const void* values[] = {&args...};
|
||||||
|
static Format_Function_Untyped* const ffuncs[] = {&write_value_erased<Args>...};
|
||||||
|
format_untyped(buf, fstr, sizeof...(args), values, ffuncs);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include "format.hpp"
|
||||||
|
|
||||||
|
#include <cstring>
|
||||||
|
|
||||||
|
namespace fmt {
|
||||||
|
|
||||||
|
static void write(String_Buffer& buf, const char* str, int len)
|
||||||
|
{
|
||||||
|
int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL
|
||||||
|
int write_len = len > remaining_cap ? remaining_cap : len;
|
||||||
|
if (write_len > 0)
|
||||||
|
memcpy(buf.str + buf.len, str, write_len);
|
||||||
|
buf.len += len;
|
||||||
|
}
|
||||||
|
|
||||||
|
void write_value(String_Buffer& buf, const char* value)
|
||||||
|
{
|
||||||
|
write(buf, value, strlen(value));
|
||||||
|
}
|
||||||
|
|
||||||
|
void format_untyped(
|
||||||
|
String_Buffer& buf,
|
||||||
|
const char* fstr,
|
||||||
|
int nargs,
|
||||||
|
const void* const* values,
|
||||||
|
Format_Function_Untyped* const* ffuncs)
|
||||||
|
{
|
||||||
|
const char* start = fstr;
|
||||||
|
while (*fstr != '\0') {
|
||||||
|
if (*fstr == '%') {
|
||||||
|
write(buf, start, fstr - start);
|
||||||
|
++fstr;
|
||||||
|
start = fstr;
|
||||||
|
if (*fstr >= '0' && *fstr <= '9') {
|
||||||
|
int index = *fstr - '0';
|
||||||
|
if (index < nargs) {
|
||||||
|
(ffuncs[index])(buf, values[index]);
|
||||||
|
++fstr;
|
||||||
|
start = fstr;
|
||||||
|
} else {
|
||||||
|
start = fstr - 1; // include %n sequence verbatim
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
++fstr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
write(buf, start, fstr - start);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The same [extra goodies][page:fmt#Extras] as in the previous post can be used (implementations of `write_value` for various types, functions improving ergonomics).
|
||||||
|
|
||||||
|
|
||||||
|
## Remarks
|
||||||
|
|
||||||
|
|
||||||
|
### Source code length
|
||||||
|
|
||||||
|
This library is slightly larger than the previous, being 73 lines of code long.
|
||||||
|
I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though.
|
||||||
|
|
||||||
|
I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.
|
||||||
|
|
||||||
|
And `printf`.
|
||||||
|
Don't forget about `printf`.
|
||||||
|
|
||||||
|
|
||||||
|
### `%` for holes
|
||||||
|
|
||||||
|
I chose `%n` as the hole syntax this time, because it was about as simple to parse as the previous choice of `{}`.
|
||||||
|
If I went with `{n}` instead, I would have to add extra code checking that there's only one digit inside the braces.
|
||||||
|
The downside of course is that you need to write out the indices manually in case they're a monotonic sequence (`%1, %2, %3`), as there is no natural syntax for an indexless hole (like there is with `{}`).
|
||||||
|
|
||||||
|
Also, due to the `%n` syntax only parsing one character, the library is limited to accepting 10 format arguments `%0`--`%9`.
|
||||||
|
That should be more than enough for 99.999% of your use cases, though.
|
||||||
|
It would be easier to add support for multiple digits with the `{n}` syntax, but the runtime cost would be bigger than supporting only a single digit.
|
||||||
|
|
||||||
|
If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-based by tweaking the digit parsing code in `format_untyped`.
|
||||||
|
|
||||||
|
|
||||||
|
### Code size
|
||||||
|
|
||||||
|
The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly.
|
||||||
|
Instead, it initialises the lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
|
||||||
|
|
||||||
|
This is a good thing for embedded use cases.
|
||||||
|
You should prefer this implementation over the previous one for that.
|
||||||
|
|
||||||
|
I couldn't get clang to inline references to `write_value` into the function table, though.
|
||||||
|
There's always an intermediary function generated from the instantiation of `write_value_erased`.
|
||||||
|
|
||||||
|
This can get pretty bad when passing array types into format arguments (e.g. string literals)---generating many redundant variants of the function, like `write_value_erased<char [6]>`, `write_value_erased<char [7]>`, etc.
|
||||||
|
|
||||||
|
I believe this is due to function pointers of different types not always being interchangeable according to the C++ standard.
|
||||||
|
You could work around this little inefficiency by casting between function pointers, and it _does_ [seem](https://stackoverflow.com/a/559671) [safe](https://stackoverflow.com/q/11647220) to me in this case (the function pointers have the same return type, the same number of arguments, and the types of arguments `const void*` and `const char*` are compatible)---but I don't really think it's worth it.
|
||||||
|
|
||||||
|
It's not an insurmountable task, but _not_ doing it and letting the compiler do all the necessary ABI shuffling---at the expense of an extra `jmp` in case it's not needed---is probably not a big performance cost, either.
|
||||||
|
|
||||||
|
|
||||||
|
### C version
|
||||||
|
|
||||||
|
It feels to me like with a bit of elbow grease, this library could be adapted to plain old C as well.
|
||||||
|
|
||||||
|
You would have to replace the template bits with macros and `...` / `va_list` though, and I'm not entirely sure how to resolve types into functions---feels like C11 `_Generic` could be of help.
|
||||||
|
|
||||||
|
The worst part would probably be emulating the parameter packs, because the preprocessor doesn't make it easy to transform the individual elements in a `__VA_ARGS__` list---but it [seems possible](https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms), if a bit horrid.
|
||||||
|
|
||||||
|
Maybe in another post.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue