cpp26-reflection · part 11

One codegen, many wire formats

· english · audience: working-cpp · discuss

Through posts 8–10 we grew reflect_json: a serializer driven by reflection + annotations. The reflection walk was specific to JSON — it knew about braces, quotes, and commas. That’s not fundamental. The reflection walk — “visit each non-skipped member, apply the annotation-derived key, emit the value” — is the structural part. “What braces look like” is a lexical decision that can be swapped.

If we factor out the syntax and leave the walk in place, the same library serializes to JSON, YAML, XML, TOML, MessagePack, or anything else you can describe as key-value over structural recursion.

Same User struct, three outputs:

--- JSON ---
{"userName":"filip","id":42,"email":"filip@example.com","isAdmin":true,
 "homeAddress":{"city":"Warsaw","postal_code":12345}}

--- YAML ---
userName: filip
id: 42
email: filip@example.com
isAdmin: true
homeAddress: 
  city: Warsaw
  postal_code: 12345

--- XML ---
<userName>filip</userName><id>42</id><email>filip@example.com</email>
<isAdmin>true</isAdmin>
<homeAddress><city>Warsaw</city><postal_code>12345</postal_code></homeAddress>

The factoring

Identify what changes between formats. For JSON / YAML / XML / TOML / MessagePack, it’s a small list:

  • How an object opens and closes ({} vs. indentation vs. <tag> vs. nothing vs. length-prefix byte).
  • How fields are separated (, vs. newline vs. nothing vs. vtable byte).
  • How a key is rendered ("k": vs. k: vs. <k> vs. k = vs. varint-len).
  • How a leaf value is rendered (quoted-escaped vs. bare vs. tagged vs. typed-binary).
  • Whether there’s a post-field hook (XML needs to close </k>).

Seven methods. Collect them into a struct — a format policy — and pass it as the first template parameter to the serializer.

Three format policies

struct json_format {
    void begin_object(std::string& o)    { o += '{'; }
    void end_object(std::string& o)      { o += '}'; }
    void field_separator(std::string& o) { o += ','; }
    void end_field(std::string&)         {}
    void emit_key(std::string& o, std::string_view k) {
        o += '"'; o += k; o += "\":";
    }
    void emit_string(std::string& o, std::string_view v) { o += '"'; o += v; o += '"'; }
    void emit_bool  (std::string& o, bool v)             { o += v ? "true" : "false"; }
    template <typename N> void emit_number(std::string& o, N v) { o += std::to_string(v); }
};

struct yaml_format {
    int depth = 0;
    void begin_object(std::string& o) {
        if (depth > 0) o += '\n';
        ++depth;
    }
    void end_object(std::string&)        { --depth; }
    void field_separator(std::string& o) { o += '\n'; }
    void end_field(std::string&)         {}
    void emit_key(std::string& o, std::string_view k) {
        for (int i = 0; i < depth - 1; ++i) o += "  ";
        o += k; o += ": ";
    }
    void emit_string(std::string& o, std::string_view v) { o += v; }
    void emit_bool  (std::string& o, bool v)             { o += v ? "true" : "false"; }
    template <typename N> void emit_number(std::string& o, N v) { o += std::to_string(v); }
};

struct xml_format {
    std::vector<std::string_view> tags;
    void begin_object(std::string&)      {}
    void end_object(std::string&)        {}
    void field_separator(std::string&)   {}
    void end_field(std::string& o) {
        o += "</"; o += tags.back(); o += '>';
        tags.pop_back();
    }
    void emit_key(std::string& o, std::string_view k) {
        tags.push_back(k);
        o += '<'; o += k; o += '>';
    }
    void emit_string(std::string& o, std::string_view v) { o += v; }
    void emit_bool  (std::string& o, bool v)             { o += v ? "true" : "false"; }
    template <typename N> void emit_number(std::string& o, N v) { o += std::to_string(v); }
};

Each is stateful where needed (YAML has depth, XML has a tag stack) and stateless where not (JSON). The contract is simple; compliance is easy.

The format-agnostic core

template <typename F, typename T>
void serialize_to(F& fmt, std::string& out, T const& obj);

template <typename F, typename V>
void emit_value(F& fmt, std::string& out, V const& v) {
    if constexpr (std::is_same_v<V, bool>)                       fmt.emit_bool(out, v);
    else if constexpr (std::is_arithmetic_v<V>)                  fmt.emit_number(out, v);
    else if constexpr (std::is_convertible_v<V, std::string_view>) fmt.emit_string(out, v);
    else                                                          serialize_to(fmt, out, v);
}

template <typename F, typename T>
void serialize_to(F& fmt, std::string& out, T const& obj) {
    fmt.begin_object(out);
    bool first = true;
    constexpr auto ctx = std::meta::access_context::unchecked();
    template for (constexpr auto m
                  : std::define_static_array(
                      std::meta::nonstatic_data_members_of(^^T, ctx))) {
        if constexpr (!std::meta::annotation_of_type<skip>(m).has_value()) {
            if (!first) fmt.field_separator(out);
            first = false;
            fmt.emit_key(out, key_of<m>());
            emit_value(fmt, out, obj.[: m :]);
            fmt.end_field(out);
        }
    }
    fmt.end_object(out);
}

The same template for from post 8, the same key_of<m>() from post 9. The format is the new axis.

TOML in 20 lines

Adding a new format is one small struct:

struct toml_format {
    std::string section_path;
    void begin_object(std::string& o) {
        if (!section_path.empty()) {
            o += "["; o += section_path; o += "]\n";
        }
    }
    void end_object(std::string&) {}
    void field_separator(std::string& o) { o += '\n'; }
    void end_field(std::string&) {}
    void emit_key(std::string& o, std::string_view k) { o += k; o += " = "; }
    void emit_string(std::string& o, std::string_view v) { o += '"'; o += v; o += '"'; }
    void emit_bool  (std::string& o, bool v)             { o += v ? "true" : "false"; }
    template <typename N> void emit_number(std::string& o, N v) { o += std::to_string(v); }
};

Nested tables need a small addition — tracking section path across recursive calls — but the shape doesn’t change.

MessagePack (binary output)

Binary formats drop in too:

struct msgpack_format {
    void begin_object(std::vector<std::uint8_t>& o, std::size_t n) {
        o.push_back(0x80 | static_cast<std::uint8_t>(n));  // fixmap tag
    }
    void emit_key(std::vector<std::uint8_t>& o, std::string_view k) {
        o.push_back(0xa0 | static_cast<std::uint8_t>(k.size()));  // fixstr tag
        o.insert(o.end(), k.begin(), k.end());
    }
    /* ... emit_bool/number/string writing MsgPack bytes ... */
};

Two extensions to the engine to make binary work:

  1. Output buffer is a template parameter: std::string for text formats, std::vector<uint8_t> for binary.
  2. begin_object receives the field count: MessagePack’s map header prefixes with size. A consteval helper counts surviving (non-skipped) members before the walk.

Same reflection walk. Different buffer type. Same ~100-line engine.

Per format, what changes

FormatState carriedUnique patternOutput type
JSONbraces + commas + quoted keysstd::string
YAMLint depthnewline + per-depth indentstd::string
XMLvector<string_view> tag stackopen-tag / close-tag pairingstd::string
TOMLsection pathflat keys + [section] headersstd::string
MessagePackfield countlength-prefixed bytesstd::vector<uint8_t>
Protobuf wire formatfield numbers via annotationtag/length/valuestd::vector<uint8_t>

The columns that change are narrow. Reflection walk is stable across all of them.

What you ship

reflect_serial/ as a single header-only library:

  • The engine (~80 lines).
  • Five format policies (~20 lines each).
  • The annotation vocabulary (~50 lines, shared across formats).
  • A dozen helpers (case conversion, string escapes, etc).

Total: ~300 lines. Handles five formats. Compare to the equivalent in C++ today: nlohmann/json + yaml-cpp + pugixml + toml11 + msgpack-c — five separate libraries, each tens of thousands of lines, with disjoint annotation conventions.

Cross-language angle

LanguageOne model, many formats
Rust serdeSeparate crates (serde_json, serde_yaml, serde_toml, rmp_serde). Each implements the Serializer trait. #[derive(Serialize)] dispatches per linked serializer.
Jackson (Java)ObjectMapper, YAMLMapper, XmlMapper, CBORMapper. Same annotated classes; swap the mapper. Runtime polymorphism.
.NETJSON only in System.Text.Json. YAML/XML via separate libraries with their own attribute dialects. No unified model.
Goencoding/json, encoding/xml, gopkg.in/yaml.v3. Each reads struct tags. Tags must be duplicated per format.
PythonPydantic for JSON+dicts, third-party for YAML/TOML.
C++26One policy struct per format. Shared reflection walk. Annotations apply across formats.

C++ lands at serde’s and Jackson’s architectural outcome — polymorphic over a serializer — with the whole dispatch resolved at compile time. Output is the codegen of a hand-written serializer, per format.

Try it

User u{"filip", 42, "filip@example.com", "hash", true, {"Warsaw", 12345}};
std::println("{}", rserial::to_json(u));
std::println("{}", rserial::to_yaml(u));
std::println("{}", rserial::to_xml(u));

What’s next

Arc 3 ends. Arc 4 starts: CLI parsing (the clap-derive port), a tiny ORM, autowired dependency injection, auto-mocks.