Compare commits
	
		
			2 commits
		
	
	
		
			31e99f3137
			...
			530c137223
		
	
	| Author | SHA1 | Date | |
|---|---|---|---|
| 530c137223 | |||
| ab4d77780e | 
					 2 changed files with 357 additions and 1 deletions
				
			
		
							
								
								
									
										350
									
								
								content/fmt2.dj
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										350
									
								
								content/fmt2.dj
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,350 @@
 | 
			
		|||
title = "A simple string formatting library, round two"
 | 
			
		||||
 | 
			
		||||
+++
 | 
			
		||||
 | 
			
		||||
_This is a draft post.
 | 
			
		||||
Please don't share it around just yet!_
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
After implementing [a simple string formatting library in 65 lines of code][page:fmt.dj], I thought I was done.
 | 
			
		||||
The API was nice, it did what I needed it to do, so: _project complete_.
 | 
			
		||||
 | 
			
		||||
One piece of functionality that's commonly available in more complete libraries was missing though, and that was _positional arguments_.
 | 
			
		||||
 | 
			
		||||
I got a [comment on Lobsters](https://lobste.rs/c/bbwxbz) about it at the time, though I waved it away thinking I wouldn't need positional arguments anyways.
 | 
			
		||||
So far, I've been right---there still haven't been any cases in my game where I _needed_ positional arguments---but without them, the library feels a bit... incomplete.
 | 
			
		||||
 | 
			
		||||
So I started thinking about how I could improve on that.
 | 
			
		||||
I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\
 | 
			
		||||
Until today.
 | 
			
		||||
 | 
			
		||||
This write-up describes this alternative version of the library in detail.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Usage
 | 
			
		||||
 | 
			
		||||
This library has usage very similar to the original.
 | 
			
		||||
The main difference comes from the format string: instead of using `{}` for holes, it uses `%n`, where `n` is a digit between `0` and `9`.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
fmt::format(buf, "Hello, %0!", "world");
 | 
			
		||||
assert(strcmp(str, "Hello, world!") == 0);
 | 
			
		||||
 | 
			
		||||
fmt::format(buf, "[%0] [%1] %2", "main", "info", "Hewwo :3");
 | 
			
		||||
assert(strcmp(str, "[main] [info] Hewwo :3") == 0);
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
`{}` no longer needs escaping, but the `%0` pattern does---and it's done in a manner similar to that of libc `printf`.
 | 
			
		||||
To print a single `%` character followed by a digit, double the `%`, like so:
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
fmt::format(buf, "%%0");
 | 
			
		||||
assert(strcmp(str, "%0") == 0);
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Any other patterns are interpreted verbatim, and do not need escaping---though you may choose to double the `%`, for consistency with cases where it does need escaping.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
fmt::format(buf, "%0% complete", "3.14");
 | 
			
		||||
assert(strcmp(str, "3.14% complete") == 0);
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The big advantage of this library comes from the fact that arguments can be reordered or repeated.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
fmt::format(
 | 
			
		||||
	buf,
 | 
			
		||||
	"Words of the day: '%2', '%1', '%0'. You chose '%1'.",
 | 
			
		||||
	"matrix", "crystal", "rivulet");
 | 
			
		||||
assert(strcmp(
 | 
			
		||||
	str,
 | 
			
		||||
	"Words of the day: 'rivulet', 'crystal', 'matrix'. You chose 'matrix'."
 | 
			
		||||
) == 0);
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, and never allows reading arguments out of bounds.
 | 
			
		||||
 | 
			
		||||
As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`).
 | 
			
		||||
This library does the same thing, though using the new format string syntax.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
fmt::format(buf, "%0% complete" /* no arguments */);
 | 
			
		||||
assert(strcmp(str, "%0% complete") == 0);
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Implementation walkthrough
 | 
			
		||||
 | 
			
		||||
The library starts off with the same boilerplate as [the first one][page:fmt.dj].
 | 
			
		||||
If you haven't read the original write-up, I would strongly recommend doing that right now.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
#include <cstring>
 | 
			
		||||
 | 
			
		||||
struct String_Buffer
 | 
			
		||||
{
 | 
			
		||||
	char* str;
 | 
			
		||||
	int cap;
 | 
			
		||||
	int len = 0;
 | 
			
		||||
};
 | 
			
		||||
 | 
			
		||||
static void write(String_Buffer& buf, const char* str, int len)
 | 
			
		||||
{
 | 
			
		||||
	int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL
 | 
			
		||||
	int write_len = len > remaining_cap ? remaining_cap : len;
 | 
			
		||||
	if (write_len > 0)
 | 
			
		||||
		memcpy(buf.str + buf.len, str, write_len);
 | 
			
		||||
	buf.len += len;
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The signatures for functions writing values into the output string stay the same.
 | 
			
		||||
This library uses function overloading to resolve which function a value should be printed with, based on the value's type---just like the first library did.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
void write_value(String_Buffer& buf, const char* value)
 | 
			
		||||
{
 | 
			
		||||
	write(buf, value, strlen(value));
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
// You may choose to add more overloads here.
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The difference comes from how parsing the string and writing out format arguments is driven.
 | 
			
		||||
 | 
			
		||||
Previously, parsing the string was driven by an expansion of a parameter pack.
 | 
			
		||||
It allowed the library to remain compact and simple, but it's what prevented reordering or repeating of arguments.
 | 
			
		||||
There is no way you can reorder the expansion of a parameter pack (or, a compile-time operation) using run-time values.
 | 
			
		||||
 | 
			
		||||
This time around, arguments are passed into the library through an array of *type-erased pointers to the values*, as well as *functions that format the values behind those pointers.*
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr);
 | 
			
		||||
 | 
			
		||||
void format_untyped(
 | 
			
		||||
	String_Buffer& buf,
 | 
			
		||||
	const char* fstr,
 | 
			
		||||
	int nargs,
 | 
			
		||||
	const void* const* values,
 | 
			
		||||
	Format_Function_Untyped* const* ffuncs);
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
`format` then constructs those arrays in a type-safe manner, helping itself with a template function, which _erases_ its argument's type into `const void*`---allowing its instantiations to be stuffed into an array of function pointers of the same type.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
template<typename T>
 | 
			
		||||
void write_value_erased(String_Buffer& buf, const void* value)
 | 
			
		||||
{
 | 
			
		||||
	write_value(buf, *(const T*)value);
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
template<typename... Args>
 | 
			
		||||
void format(String_Buffer& buf, const char* fstr, const Args&... args)
 | 
			
		||||
{
 | 
			
		||||
	static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported");
 | 
			
		||||
	const void* const values[] = {&args...};
 | 
			
		||||
	static Format_Function_Untyped* const ffuncs[] = {&write_value_erased<Args>...};
 | 
			
		||||
	format_untyped(buf, fstr, sizeof...(args), values, ffuncs);
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
This trick with a template erasing `T*` into `void*` is actually really useful in general for writing polymorphic code.
 | 
			
		||||
If there's any knowledge worth remembering from this article, it would be this technique.
 | 
			
		||||
 | 
			
		||||
Aside from that, we once again make use of parameter packs.
 | 
			
		||||
This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an ordinary [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers).
 | 
			
		||||
 | 
			
		||||
Among the `const` soup, you may notice the `ffuncs` array being `static const`.
 | 
			
		||||
This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's `.rodata` section instead of copying the function pointers onto the stack.
 | 
			
		||||
 | 
			
		||||
Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
void format_untyped(
 | 
			
		||||
	String_Buffer& buf,
 | 
			
		||||
	const char* fstr,
 | 
			
		||||
	int nargs,
 | 
			
		||||
	const void* const* values,
 | 
			
		||||
	Format_Function_Untyped* const* ffuncs)
 | 
			
		||||
{
 | 
			
		||||
	const char* start = fstr;
 | 
			
		||||
	while (*fstr != 0) {
 | 
			
		||||
		if (*fstr == '%') {
 | 
			
		||||
			write(buf, start, fstr - start);
 | 
			
		||||
			++fstr;
 | 
			
		||||
			start = fstr;
 | 
			
		||||
			if (*fstr >= '0' && *fstr <= '9') {
 | 
			
		||||
				int index = *fstr - '0';
 | 
			
		||||
				if (index < nargs) {
 | 
			
		||||
					(ffuncs[index])(buf, values[index]);
 | 
			
		||||
					++fstr;
 | 
			
		||||
					start = fstr;
 | 
			
		||||
				} else {
 | 
			
		||||
					start = fstr - 1; // include %n sequence verbatim
 | 
			
		||||
				}
 | 
			
		||||
			}
 | 
			
		||||
		} else {
 | 
			
		||||
			++fstr;
 | 
			
		||||
		}
 | 
			
		||||
	}
 | 
			
		||||
	write(buf, start, fstr - start);
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The full code listing, split into a header and an implementation file, and packed into a namespace, is available below.
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
#pragma once
 | 
			
		||||
 | 
			
		||||
struct String_Buffer
 | 
			
		||||
{
 | 
			
		||||
	char* str;
 | 
			
		||||
	int cap;
 | 
			
		||||
	int len = 0;
 | 
			
		||||
};
 | 
			
		||||
 | 
			
		||||
namespace fmt {
 | 
			
		||||
 | 
			
		||||
void write_value(String_Buffer& buf, const char* value);
 | 
			
		||||
 | 
			
		||||
using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr);
 | 
			
		||||
 | 
			
		||||
void format_untyped(
 | 
			
		||||
	String_Buffer& buf,
 | 
			
		||||
	const char* fstr,
 | 
			
		||||
	int nargs,
 | 
			
		||||
	const void* const* values,
 | 
			
		||||
	Format_Function_Untyped* const* ffuncs);
 | 
			
		||||
 | 
			
		||||
template<typename T>
 | 
			
		||||
void write_value_erased(String_Buffer& buf, const void* value)
 | 
			
		||||
{
 | 
			
		||||
	write_value(buf, *(const T*)value);
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
template<typename... Args>
 | 
			
		||||
void format(String_Buffer& buf, const char* fstr, const Args&... args)
 | 
			
		||||
{
 | 
			
		||||
	static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported");
 | 
			
		||||
	const void* values[] = {&args...};
 | 
			
		||||
	static Format_Function_Untyped* const ffuncs[] = {&write_value_erased<Args>...};
 | 
			
		||||
	format_untyped(buf, fstr, sizeof...(args), values, ffuncs);
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
```cpp
 | 
			
		||||
#include "format.hpp"
 | 
			
		||||
 | 
			
		||||
#include <cstring>
 | 
			
		||||
 | 
			
		||||
namespace fmt {
 | 
			
		||||
 | 
			
		||||
static void write(String_Buffer& buf, const char* str, int len)
 | 
			
		||||
{
 | 
			
		||||
	int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL
 | 
			
		||||
	int write_len = len > remaining_cap ? remaining_cap : len;
 | 
			
		||||
	if (write_len > 0)
 | 
			
		||||
		memcpy(buf.str + buf.len, str, write_len);
 | 
			
		||||
	buf.len += len;
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
void write_value(String_Buffer& buf, const char* value)
 | 
			
		||||
{
 | 
			
		||||
	write(buf, value, strlen(value));
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
void format_untyped(
 | 
			
		||||
	String_Buffer& buf,
 | 
			
		||||
	const char* fstr,
 | 
			
		||||
	int nargs,
 | 
			
		||||
	const void* const* values,
 | 
			
		||||
	Format_Function_Untyped* const* ffuncs)
 | 
			
		||||
{
 | 
			
		||||
	const char* start = fstr;
 | 
			
		||||
	while (*fstr != '\0') {
 | 
			
		||||
		if (*fstr == '%') {
 | 
			
		||||
			write(buf, start, fstr - start);
 | 
			
		||||
			++fstr;
 | 
			
		||||
			start = fstr;
 | 
			
		||||
			if (*fstr >= '0' && *fstr <= '9') {
 | 
			
		||||
				int index = *fstr - '0';
 | 
			
		||||
				if (index < nargs) {
 | 
			
		||||
					(ffuncs[index])(buf, values[index]);
 | 
			
		||||
					++fstr;
 | 
			
		||||
					start = fstr;
 | 
			
		||||
				} else {
 | 
			
		||||
					start = fstr - 1; // include %n sequence verbatim
 | 
			
		||||
				}
 | 
			
		||||
			}
 | 
			
		||||
		} else {
 | 
			
		||||
			++fstr;
 | 
			
		||||
		}
 | 
			
		||||
	}
 | 
			
		||||
	write(buf, start, fstr - start);
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The same [extra goodies][page:fmt#Extras] as in the previous post can be used (implementations of `write_value` for various types, functions improving ergonomics).
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Remarks
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Source code length
 | 
			
		||||
 | 
			
		||||
This library is slightly larger than the previous, being 73 lines of code long.
 | 
			
		||||
I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though.
 | 
			
		||||
 | 
			
		||||
I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries.
 | 
			
		||||
 | 
			
		||||
And `printf`.
 | 
			
		||||
Don't forget about `printf`.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### `%` for holes
 | 
			
		||||
 | 
			
		||||
I chose `%n` as the hole syntax this time, because it was about as simple to parse as the previous choice of `{}`.
 | 
			
		||||
If I went with `{n}` instead, I would have to add extra code checking that there's only one digit inside the braces.
 | 
			
		||||
The downside of course is that you need to write out the indices manually in case they're a monotonic sequence (`%1, %2, %3`), as there is no natural syntax for an indexless hole (like there is with `{}`).
 | 
			
		||||
 | 
			
		||||
Also, due to the `%n` syntax only parsing one character, the library is limited to accepting 10 format arguments `%0`--`%9`.
 | 
			
		||||
That should be more than enough for 99.999% of your use cases, though.
 | 
			
		||||
It would be easier to add support for multiple digits with the `{n}` syntax, but the runtime cost would be bigger than supporting only a single digit.
 | 
			
		||||
 | 
			
		||||
If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-based by tweaking the digit parsing code in `format_untyped`.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Code size
 | 
			
		||||
 | 
			
		||||
The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly.
 | 
			
		||||
Instead, it initialises the lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values.
 | 
			
		||||
 | 
			
		||||
This is a good thing for embedded use cases.
 | 
			
		||||
You should prefer this implementation over the previous one for that.
 | 
			
		||||
 | 
			
		||||
I couldn't get clang to inline references to `write_value` into the function table, though.
 | 
			
		||||
There's always an intermediary function generated from the instantiation of `write_value_erased`.
 | 
			
		||||
 | 
			
		||||
This can get pretty bad when passing array types into format arguments (e.g. string literals)---generating many redundant variants of the function, like `write_value_erased<char [6]>`, `write_value_erased<char [7]>`, etc.
 | 
			
		||||
 | 
			
		||||
I believe this is due to function pointers of different types not always being interchangeable according to the C++ standard.
 | 
			
		||||
You could work around this little inefficiency by casting between function pointers, and it _does_ [seem](https://stackoverflow.com/a/559671) [safe](https://stackoverflow.com/q/11647220) to me in this case (the function pointers have the same return type, the same number of arguments, and the types of arguments `const void*` and `const char*` are compatible)---but I don't really think it's worth it.
 | 
			
		||||
 | 
			
		||||
It's not an insurmountable task, but _not_ doing it and letting the compiler do all the necessary ABI shuffling---at the expense of an extra `jmp` in case it's not needed---is probably not a big performance cost, either.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### C version
 | 
			
		||||
 | 
			
		||||
It feels to me like with a bit of elbow grease, this library could be adapted to plain old C as well.
 | 
			
		||||
 | 
			
		||||
You would have to replace the template bits with macros and `...` / `va_list` though, and I'm not entirely sure how to resolve types into functions---feels like C11 `_Generic` could be of help.
 | 
			
		||||
 | 
			
		||||
The worst part would probably be emulating the parameter packs, because the preprocessor doesn't make it easy to transform the individual elements in a `__VA_ARGS__` list---but it [seems possible](https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms), if a bit horrid.
 | 
			
		||||
 | 
			
		||||
Maybe in another post.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -151,9 +151,15 @@ impl Config {
 | 
			
		|||
    }
 | 
			
		||||
 | 
			
		||||
    pub fn page_url(&self, page: &str) -> String {
 | 
			
		||||
        let (page, hash) = page.split_once('#').unwrap_or((page, ""));
 | 
			
		||||
 | 
			
		||||
        // We don't want .dj appearing in URLs, though it exists as a disambiguator in [page:] links.
 | 
			
		||||
        let page = page.strip_suffix(".dj").unwrap_or(page);
 | 
			
		||||
        format!("{}/{}", self.site, page)
 | 
			
		||||
        if !hash.is_empty() {
 | 
			
		||||
            format!("{}/{page}#{hash}", self.site)
 | 
			
		||||
        } else {
 | 
			
		||||
            format!("{}/{page}", self.site)
 | 
			
		||||
        }
 | 
			
		||||
    }
 | 
			
		||||
 | 
			
		||||
    pub fn pic_url(&self, pics_dir: &dyn Dir, id: &str) -> String {
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue