diff --git a/content/fmt2.dj b/content/fmt2.dj deleted file mode 100644 index 5399a67..0000000 --- a/content/fmt2.dj +++ /dev/null @@ -1,350 +0,0 @@ -title = "A simple string formatting library, round two" - -+++ - -_This is a draft post. -Please don't share it around just yet!_ - ---- - -After implementing [a simple string formatting library in 65 lines of code][page:fmt.dj], I thought I was done. -The API was nice, it did what I needed it to do, so: _project complete_. - -One piece of functionality that's commonly available in more complete libraries was missing though, and that was _positional arguments_. - -I got a [comment on Lobsters](https://lobste.rs/c/bbwxbz) about it at the time, though I waved it away thinking I wouldn't need positional arguments anyways. -So far, I've been right---there still haven't been any cases in my game where I _needed_ positional arguments---but without them, the library feels a bit... incomplete. - -So I started thinking about how I could improve on that. -I went back and forth trying to come up with something sensible, but nothing _simple_ was ever coming to mind.\ -Until today. - -This write-up describes this alternative version of the library in detail. - - -## Usage - -This library has usage very similar to the original. -The main difference comes from the format string: instead of using `{}` for holes, it uses `%n`, where `n` is a digit between `0` and `9`. - -```cpp -fmt::format(buf, "Hello, %0!", "world"); -assert(strcmp(str, "Hello, world!") == 0); - -fmt::format(buf, "[%0] [%1] %2", "main", "info", "Hewwo :3"); -assert(strcmp(str, "[main] [info] Hewwo :3") == 0); -``` - -`{}` no longer needs escaping, but the `%0` pattern does---and it's done in a manner similar to that of libc `printf`. -To print a single `%` character followed by a digit, double the `%`, like so: - -```cpp -fmt::format(buf, "%%0"); -assert(strcmp(str, "%0") == 0); -``` - -Any other patterns are interpreted verbatim, and do not need escaping---though you may choose to double the `%`, for consistency with cases where it does need escaping. - -```cpp -fmt::format(buf, "%0% complete", "3.14"); -assert(strcmp(str, "3.14% complete") == 0); -``` - -The big advantage of this library comes from the fact that arguments can be reordered or repeated. - -```cpp -fmt::format( - buf, - "Words of the day: '%2', '%1', '%0'. You chose '%1'.", - "matrix", "crystal", "rivulet"); -assert(strcmp( - str, - "Words of the day: 'rivulet', 'crystal', 'matrix'. You chose 'matrix'." -) == 0); -``` - -Other than that, the same invariants are upheld as in the previous library: the library never writes past the input buffer, and never allows reading arguments out of bounds. - -As a reminder, the previous library printed holes without corresponding arguments verbatim (as the string `{}`). -This library does the same thing, though using the new format string syntax. - -```cpp -fmt::format(buf, "%0% complete" /* no arguments */); -assert(strcmp(str, "%0% complete") == 0); -``` - - -## Implementation walkthrough - -The library starts off with the same boilerplate as [the first one][page:fmt.dj]. -If you haven't read the original write-up, I would strongly recommend doing that right now. - -```cpp -#include - -struct String_Buffer -{ - char* str; - int cap; - int len = 0; -}; - -static void write(String_Buffer& buf, const char* str, int len) -{ - int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL - int write_len = len > remaining_cap ? remaining_cap : len; - if (write_len > 0) - memcpy(buf.str + buf.len, str, write_len); - buf.len += len; -} -``` - -The signatures for functions writing values into the output string stay the same. -This library uses function overloading to resolve which function a value should be printed with, based on the value's type---just like the first library did. - -```cpp -void write_value(String_Buffer& buf, const char* value) -{ - write(buf, value, strlen(value)); -} - -// You may choose to add more overloads here. -``` - -The difference comes from how parsing the string and writing out format arguments is driven. - -Previously, parsing the string was driven by an expansion of a parameter pack. -It allowed the library to remain compact and simple, but it's what prevented reordering or repeating of arguments. -There is no way you can reorder the expansion of a parameter pack (or, a compile-time operation) using run-time values. - -This time around, arguments are passed into the library through an array of *type-erased pointers to the values*, as well as *functions that format the values behind those pointers.* - -```cpp -using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr); - -void format_untyped( - String_Buffer& buf, - const char* fstr, - int nargs, - const void* const* values, - Format_Function_Untyped* const* ffuncs); -``` - -`format` then constructs those arrays in a type-safe manner, helping itself with a template function, which _erases_ its argument's type into `const void*`---allowing its instantiations to be stuffed into an array of function pointers of the same type. - -```cpp -template -void write_value_erased(String_Buffer& buf, const void* value) -{ - write_value(buf, *(const T*)value); -} - -template -void format(String_Buffer& buf, const char* fstr, const Args&... args) -{ - static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported"); - const void* const values[] = {&args...}; - static Format_Function_Untyped* const ffuncs[] = {&write_value_erased...}; - format_untyped(buf, fstr, sizeof...(args), values, ffuncs); -} -``` - -This trick with a template erasing `T*` into `void*` is actually really useful in general for writing polymorphic code. -If there's any knowledge worth remembering from this article, it would be this technique. - -Aside from that, we once again make use of parameter packs. -This time not with [fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), but with an ordinary [expansion inside a brace-enclosed initialiser](https://en.cppreference.com/w/cpp/language/parameter_pack.html#Brace-enclosed_initializers). - -Among the `const` soup, you may notice the `ffuncs` array being `static const`. -This little trick reduces code size a bit, because it will make the compiler generate a lookup table in the executable's `.rodata` section instead of copying the function pointers onto the stack. - -Finally, we get to `format_untyped`, which parses the format string, writing out the verbatim parts, and calling the appropriate format function whenever a hole is encountered. - -```cpp -void format_untyped( - String_Buffer& buf, - const char* fstr, - int nargs, - const void* const* values, - Format_Function_Untyped* const* ffuncs) -{ - const char* start = fstr; - while (*fstr != 0) { - if (*fstr == '%') { - write(buf, start, fstr - start); - ++fstr; - start = fstr; - if (*fstr >= '0' && *fstr <= '9') { - int index = *fstr - '0'; - if (index < nargs) { - (ffuncs[index])(buf, values[index]); - ++fstr; - start = fstr; - } else { - start = fstr - 1; // include %n sequence verbatim - } - } - } else { - ++fstr; - } - } - write(buf, start, fstr - start); -} -``` - -The full code listing, split into a header and an implementation file, and packed into a namespace, is available below. - -```cpp -#pragma once - -struct String_Buffer -{ - char* str; - int cap; - int len = 0; -}; - -namespace fmt { - -void write_value(String_Buffer& buf, const char* value); - -using Format_Function_Untyped = void(String_Buffer& buf, const void* value_ptr); - -void format_untyped( - String_Buffer& buf, - const char* fstr, - int nargs, - const void* const* values, - Format_Function_Untyped* const* ffuncs); - -template -void write_value_erased(String_Buffer& buf, const void* value) -{ - write_value(buf, *(const T*)value); -} - -template -void format(String_Buffer& buf, const char* fstr, const Args&... args) -{ - static_assert(sizeof...(args) <= 10, "a maximum of 10 arguments is supported"); - const void* values[] = {&args...}; - static Format_Function_Untyped* const ffuncs[] = {&write_value_erased...}; - format_untyped(buf, fstr, sizeof...(args), values, ffuncs); -} - -} -``` - -```cpp -#include "format.hpp" - -#include - -namespace fmt { - -static void write(String_Buffer& buf, const char* str, int len) -{ - int remaining_cap = buf.cap - buf.len - 1; // leave one byte for NUL - int write_len = len > remaining_cap ? remaining_cap : len; - if (write_len > 0) - memcpy(buf.str + buf.len, str, write_len); - buf.len += len; -} - -void write_value(String_Buffer& buf, const char* value) -{ - write(buf, value, strlen(value)); -} - -void format_untyped( - String_Buffer& buf, - const char* fstr, - int nargs, - const void* const* values, - Format_Function_Untyped* const* ffuncs) -{ - const char* start = fstr; - while (*fstr != '\0') { - if (*fstr == '%') { - write(buf, start, fstr - start); - ++fstr; - start = fstr; - if (*fstr >= '0' && *fstr <= '9') { - int index = *fstr - '0'; - if (index < nargs) { - (ffuncs[index])(buf, values[index]); - ++fstr; - start = fstr; - } else { - start = fstr - 1; // include %n sequence verbatim - } - } - } else { - ++fstr; - } - } - write(buf, start, fstr - start); -} - -} -``` - -The same [extra goodies][page:fmt#Extras] as in the previous post can be used (implementations of `write_value` for various types, functions improving ergonomics). - - -## Remarks - - -### Source code length - -This library is slightly larger than the previous, being 73 lines of code long. -I think the extra bit of functionality is useful enough that it's a worthy tradeoff, though. - -I only really flailed the 65 loc figure in the title of the last post to poke fun at the complexity of popular template-heavy string formatting libraries. - -And `printf`. -Don't forget about `printf`. - - -### `%` for holes - -I chose `%n` as the hole syntax this time, because it was about as simple to parse as the previous choice of `{}`. -If I went with `{n}` instead, I would have to add extra code checking that there's only one digit inside the braces. -The downside of course is that you need to write out the indices manually in case they're a monotonic sequence (`%1, %2, %3`), as there is no natural syntax for an indexless hole (like there is with `{}`). - -Also, due to the `%n` syntax only parsing one character, the library is limited to accepting 10 format arguments `%0`--`%9`. -That should be more than enough for 99.999% of your use cases, though. -It would be easier to add support for multiple digits with the `{n}` syntax, but the runtime cost would be bigger than supporting only a single digit. - -If you find the 0-based indexing unnatural, it's easy enough to switch it to 1-based by tweaking the digit parsing code in `format_untyped`. - - -### Code size - -The assembly for the formatting code comes out a lot more compact, because the compiler no longer has to generate potentially very long and repetitive code for calling `next_hole` and `write_value` repeatedly. -Instead, it initialises the lookup table with value pointers on the stack, and passes it to `format_untyped`, along with a static table containing pointers to functions which can print the values. - -This is a good thing for embedded use cases. -You should prefer this implementation over the previous one for that. - -I couldn't get clang to inline references to `write_value` into the function table, though. -There's always an intermediary function generated from the instantiation of `write_value_erased`. - -This can get pretty bad when passing array types into format arguments (e.g. string literals)---generating many redundant variants of the function, like `write_value_erased`, `write_value_erased`, etc. - -I believe this is due to function pointers of different types not always being interchangeable according to the C++ standard. -You could work around this little inefficiency by casting between function pointers, and it _does_ [seem](https://stackoverflow.com/a/559671) [safe](https://stackoverflow.com/q/11647220) to me in this case (the function pointers have the same return type, the same number of arguments, and the types of arguments `const void*` and `const char*` are compatible)---but I don't really think it's worth it. - -It's not an insurmountable task, but _not_ doing it and letting the compiler do all the necessary ABI shuffling---at the expense of an extra `jmp` in case it's not needed---is probably not a big performance cost, either. - - -### C version - -It feels to me like with a bit of elbow grease, this library could be adapted to plain old C as well. - -You would have to replace the template bits with macros and `...` / `va_list` though, and I'm not entirely sure how to resolve types into functions---feels like C11 `_Generic` could be of help. - -The worst part would probably be emulating the parameter packs, because the preprocessor doesn't make it easy to transform the individual elements in a `__VA_ARGS__` list---but it [seems possible](https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms), if a bit horrid. - -Maybe in another post. - diff --git a/src/config.rs b/src/config.rs index 2c7baf7..a7f2479 100644 --- a/src/config.rs +++ b/src/config.rs @@ -151,15 +151,9 @@ impl Config { } pub fn page_url(&self, page: &str) -> String { - let (page, hash) = page.split_once('#').unwrap_or((page, "")); - // We don't want .dj appearing in URLs, though it exists as a disambiguator in [page:] links. let page = page.strip_suffix(".dj").unwrap_or(page); - if !hash.is_empty() { - format!("{}/{page}#{hash}", self.site) - } else { - format!("{}/{page}", self.site) - } + format!("{}/{}", self.site, page) } pub fn pic_url(&self, pics_dir: &dyn Dir, id: &str) -> String {