.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. .. include:: ../common-defs.rst .. highlight:: cpp .. default-domain:: cpp .. |BW| replace:: :code:`BufferWriter` .. |BWF| replace:: :code:`BufferWriter` formatting .. _bw-format: *********************** BufferWriter Formatting *********************** Synopsis ******** :code:`#include "swoc/bwf_base.h"` Formatted output was added to :class:`BufferWriter` for several reasons. * Type safe formatted output in addition to buffer safe formatted output. Rather than non-obvious cleverness with :code:`snprintf` and :libswoc:`BufferWriter::commit`, build the formatting in directly. * Specialized output functions for complex types, to have the class provide the formatting logic instead of cut and pasted code in multiple locations. This also avoids breaking modularity to get the data needed for good formatting. This also enables formatting wrappers which can provide generic and simple ways to do specific styles of output beyond formatting codes (e.g. :libswoc:`As_Hex`). * Argument naming, both for ordering, repeating, and for "global" names which can be used without arguments. This is also intended for use where there are context dependent names, e.g. for printing in the context of an HTTP header, the header field names could be made so their use is replaced by the value of that field. * The ability to pass arbitrary "extra" data to formatting functions for special, type dependent purposes. The formatting style is the "prefix" or "printf" style - the format is specified first and then all the arguments. The syntax is based on `Python formatting `__. This contrasts to the "infix" or "streaming" style where formatting, literals, and argument are intermixed in the order of output. There are various arguments for both styles but conversations within the Trafffic Server community indicated a clear preference for the prefix style. Therefore creating formatted output consists of a :term:`format string`, containing literal text and :term:`format specifier`\ s, which are replaced with generated text, usually based on the values of arguments to the print function. The design is optimized for formatted output to fixed buffers. This is by far the dominant style in the expected use cases and during the design phase I was told any performance loss compared to :code:`snprintf` must be minimal. While work has and will be done to extend :class:`BufferWriter` to operate on non-fixed buffers, such use is secondary to operating directly on contiguous buffers. .. important:: The overriding design goal is to provide the type specific formatting and flexibility of C++ stream operators with the performance of :code:`snprintf` and :code:`memcpy`. Usage ***** As noted |BWF| is modeled on Python string formatting because the Traffic Server project uses quite a bit of Python. It seemed a good model for prefix style formatting, mapping easily in to the set of desired features. The primary divergences are * Names do not refer to in scope variables, but to output generators local to the print context via `Name Binding`_. * The addition of a third colon separated field to provide extension data to the formatting logic. The primary entry point for this is :libswoc:`BufferWriter::print`. A format string consists of literal text in which format specifiers are embedded. Each specifier marks a place where generated output will be placed. The specifier is marked by paired braces and is divided in to three fields, separated by colons. These fields are optional - if default output is acceptable, a pair of braces will suffice. In a sense, ``{}`` serves the same function for output as :code:`auto` does for programming - the compiler knows the type, it should be able to do something reasonable without the programmer needing to be explicit. The fields are used in the less common cases where greater control of the output is required. Format Specifier Grammar ======================== This is the grammar for the fields inside a format specifier. .. productionList:: spec specifier: "{" [name] [":" [style] [":" extension]] "}" name: index | ICHAR+ index: non-negative integer extension: ICHAR* ICHAR: a printable ASCII character except for '{', '}', ':' style: formatting instructions. The three fields are :token:`~spec:name`, :token:`~spec:style`, and :token:`~spec:extension`. :token:`~spec:name` The :token:`~spec:name` of the argument to use. This can be a non-negative integer in which case it is the zero based index of the argument to the method call. E.g. ``{0}`` means the first argument and ``{2}`` is the third argument after the format. ``bw.print("{0} {1}", 'a', 'b')`` => ``a b`` ``bw.print("{1} {0}", 'a', 'b')`` => ``b a`` The :token:`~spec:name` can be omitted in which case it is treated as an index in parallel to the position in the format string relative to other argument based specifiers. Only the position in the format string matters, not what arguments other format specifiers may have used. ``bw.print("{0} {2} {}", 'a', 'b', 'c')`` => ``a c c`` ``bw.print("{0} {2} {2}", 'a', 'b', 'c')`` => ``a c c`` Note an argument can be printed more than once if the name is used more than once. ``bw.print("{0} {} {0}", 'a', 'b')`` => ``a b a`` ``bw.print("{0} {1} {0}", 'a', 'b')`` => ``a b a`` Alphanumeric names refer to values in a :term:`format context` table. These will be described in more detail someday. Such names do not count in terms of default argument indexing. These rules are designed to be natural, but any ambiguity can be eliminated by explicit indexing in the specifiers. :token:`~spec:style` Basic formatting control. .. productionList:: fmt style: [[fill]align][sign]["#"]["0"][[min][.precision][,max][type]] fill: fill-char | URI-char URI-char: "%" hex-digit hex-digit fill-char: printable character except "{", "}", ":", "%" align: "<" | ">" | "=" | "^" sign: "+" | "-" | " " min: non-negative integer precision: positive integer max: non-negative integer type: "g" | "s" | "S" | "x" | "X" | "d" | "o" | "b" | "B" | "p" | "P" hex-digit: "0" .. "9" | "a" .. "f" | "A" .. "F" The output is placed in a field that is at least :token:`~fmt:min` wide and no more than :token:`~fmt:max` wide. If the output is less than :token:`~fmt:min` then * The :token:`~fmt:fill` character is used for the extra space required. This can be an explicit character or a URI encoded one (to allow otherwise reserved characters). * The output is shifted according to the :token:`!fmt:align`. < Align to the left, fill to the right. > Align to the right, fill to the left. ^ Align in the middle, fill to left and right. = Numerically align, putting the fill between the sign character (left aligned) and the value (right aligned). The output is clipped by :token:`~fmt:max` width characters and by the end of the buffer. :token:`~fmt:precision` is used by floating point values to specify the number of places of precision. :token:`~fmt:type` is used to indicate type specific formatting. For integers it indicates the output radix and if ``#`` is present the radix is prefix is generated (one of ``0xb``, ``0``, ``0x``). Format types of the same letter are equivalent, varying only in the character case used for output. Most commonly 'x' prints values in lower cased hexadecimal (:code:`0x1337beef`) while 'X' prints in upper case hexadecimal (:code:`0X1337BEEF`). Note there is no upper case decimal or octal type because case is irrelevant for those. = =============== g generic, default. b binary B Binary d decimal o octal x hexadecimal X Hexadecimal p pointer (hexadecimal address) P Pointer (Hexadecimal address) s string S String (upper case) = =============== For several specializations the hexadecimal format is taken to indicate printing the value as if it were a hexidecimal value, in effect providing a hex dump of the value. This is the case for :code:`std::string_view` and therefore a hex dump of an object can be done by creating a :code:`std::string_view` covering the data and then printing it with :code:`{:x}`. The string type ('s' or 'S') is generally used to cause alphanumeric output for a value that would normally use numeric output. For instance, a :expr:`bool` is normally ``0`` or ``1``. Using the type 's' yields ``true`` or ``false``. The upper case form, 'S', applies only in these cases where the formatter generates the text, it does not apply to normally text based values unless specifically noted. Therefore a :code:`bool` printed with the type 'S' yields ``TRUE`` or ``FALSE``. This is frequently done with formatting for enumerations, printing the numeric value by default and printing a text equivalent for format 's' or 'S'. :token:`~spec:extension` Text (excluding braces) passed to the type specific formatter function. This can be used to provide extensions for specific argument types (e.g., IP addresses). It is never examined by |BWF|, it is only effective in type specific formatting overloads. When a format specifier is parsed, the result is placed in an instance of :libswoc:`bwf::Spec `. Examples ======== Some examples, comparing :code:`snprintf` and :libswoc:`BufferWriter::print`. :: if (len > 0) { auto n = snprintf(buff, len, "count %d", count); len -= n; buff += n; } bw.print("count {}", count); // -- if (len > 0) { auto n = snprintf(buff, len, "Size %" PRId64 " bytes", sizeof(thing)); len -= n; buff += n; } bw.print("Size {} bytes", sizeof(thing)); // -- if (len > 0) { auto n = snprintf(buff, len, "Number of items %ld", thing->count()); len -= n; buff += n; } bw.print("Number of items {}", thing->count()); Enumerations become easier. Note in this case argument indices are used in order to print both a name and a value for the enumeration. A key benefit here is the lack of need for a developer to know the specific free function or method needed to do the name lookup. In this case, :code:`HttpDebugNuames::get_server_state_name`. Rather than every developer having to memorize the assocation between the type and the name lookup function, or grub through the code hoping for an example, the compiler is told once and henceforth does the lookup. The implementation of the formatter is described in `an example `. A sample of code previously used to output an error message using this enumeration. :: if (len > 0) { auto n = snprintf(buff, len, "Unexpected event %d in state %s[%d] for %.*s", event, HttpDebugNames::get_server_state_name(t_state.current.state), t_state.current.state, static_cast(host_len), host); buff += n; len -= n; } Using |BW| :: bw.print("Unexpected event {0} in state {1}[{1:d}] for {2}", event, t_state.current.state, std::string_view{host, host_len}); Adapting to use of :code:`std::string_view` illustrates the advantage of a formatter overload knowing how to get the size from the object and not having to deal with restrictions on the numeric type (e.g., that :code:`%.*s` requires an :code:`int`, not a :code:`size_t`). :: if (len > 0) { len -= snprintf(buff, len, "%.*s", static_cast(s.size()), s.data()); } vs :: bw.print("{}", s); or even bw.write(s); The difference is even more stark with dealing with IP addresses. There are two big advantages here. One is not having to know the conversion function name. The other is the lack of having to declare local variables and having to remember what the appropriate size is. Not requiring local variables can be particularly nice in the context of a :code:`switch` statement where local variables for a :code:`case` mean having to add extra braces, or declare the temporaries at an outer scope. :: char ip_buff1[INET6_ADDRPORTSTRLEN]; char ip_buff2[INET6_ADDRPORTSTRLEN]; ats_ip_nptop(ip_buff1, sizeof(ip_buff1), addr1); ats_ip_nptop(ip_buff2, sizeof(ip_buff2), add2); if (len > 0) { snprintf(buff, len, "Connecting to %s from %s", ip_buff1, ip_buff2); } vs :: bw.print("Connecting to {} from {}", addr1, addr2); User Defined Formatting ======================= To get the full benefit of type safe formatting it is necessary to provide type specific formatting functions which are called when a value of that type is formatted. This is how type specific knowledge such as the names of enumeration values are encoded in a single location. The special formatting for IP address data is done by providing default formatters, it is not built in to the base formatting logic. Most of the support for this is in the nested namespace :code:`bwf`. The format style is stored in an instance of :libswoc:`bwf::Spec `. .. namespace-push:: bwf .. class:: Spec Format specifier data. :libswoc:`Reference `. .. namespace-pop:: Additional type specific formatting can be provided via the :token:`~spec:extension` field. This provides another option for tweaking formatted output vs. using wrapper classes. To provide a formatter for a type :code:`V` the function :code:`bwformat` is overloaded. The signature would look like this:: swoc::BufferWriter& swoc::bwformat( swoc::BufferWriter& w , swoc::bwf::Spec const& spec , V const& v ) :arg:`w` is the output and :arg:`spec` the :libswoc:`parsed format specifier `, including the name and extension (if any). The calling framework will handle basic alignment as per :arg:`spec` therefore the overload normally does not need to do so. In some cases, however, the alignment requirements are more detailed (e.g. integer alignment operations) or performance is critical. In the latter case the formatter should make sure to use at least the :libswoc:`minimum width ` in order to disable any framework alignment operation. It is important to note a formatter can call another formatter. For example, the formatter for :code:`std::string` looks like .. literalinclude:: ../../code/include/swoc/bwf_base.h :lines: 987-990 A more complex example of this which illustrates other mechanisms is formatting a character pointer. .. literalinclude:: ../../code/include/swoc/bwf_base.h :start-after: // Char pointer formatting :end-before: // doc end This checks the format and if it's a pointer or hex format, delegates to generic pointer formatting. Otherwise if it's not :code:`nullptr` then it's treated as a C-string and delegated to the :code:`string_view` formatter. If it is :code:`nullptr` then it's delegated to the formatter for :code:`nullptr_t`. The implementation for generic pointer formatting is .. literalinclude:: ../../code/src/bw_format.cc :start-after: // Generic poiner formatting :end-before: // doc end The code first copies the format specification and forces a leading radix. Next it does special handling for :code:`nullptr`. If the pointer is valid, the code checks if the type ``p`` or ``P`` was used in order to select the appropriate case, then delegates the actual rendering to the :libswoc:`integer formatter ` with a type of ``x`` or ``X`` as appropriate. In turn other formatters, if given the type ``p`` or ``P`` can cast the value to :code:`const void*` and call :code:`bwformat` on that to output the value as a pointer. The difference between calling :code:`bwformat` vs. :libswoc:`BufferWriter::write` is the ability to pass the format specifier instance. If all of the formatting is handled directly, then direct |BW| methods are a good choice. If the formatter wants to use the built in formatting then :code:`bwformat` is the right choice. This is what is done with the pointer example above - the format specifier is copied and tweaked, and then passed on so that any formatting provided from the original format string remains valid. To help reduce duplication, the output stream operator :code:`operator<<` on a :code:`BufferWriter` is defined to call :code:`bwformat` with a default constructed :libswoc:`bwf::Spec` instance. This makes :: w << thing; identical to :: bwformat(w, swoc::bwf::Spec::DEFAULT, thing); which is also the same as :: w.print("{}", thing); Enum Example ------------ .. _bwf-http-debug-name-example: For a specific example of using |BWF| to make debug messages easier, consider the case of :code:`HttpDebugNames` in the Traffic Server code base. This is a class that serves as a namespace to provide various methods that convert state machine related enumerations into descriptive strings. Currently this is undocumented (and uncommented) and is therefore used infrequently, as that requires either blind cut and paste, or tracing through header files to understand the code. The result is much less useful diagnostics. This can be greatly simplified by adding formatters to :file:`proxy/http/HttpDebugNames.h` :: inline swoc::BufferWriter & bwformat(swoc::BufferWriter &w, swoc::bwf::Spec const &spec, HttpTransact::ServerState_t state) { if (spec.has_numeric_type()) { // allow the user to force numeric output with '{:d}' or other numeric type. return bwformat(w, spec, static_cast(state)); } else { return bwformat(w, spec, HttpDebugNames::get_server_state_name(state)); } } With this in place, the code to print the name of the server state enumeration is :: bw.print("{}", t_state.current_state); There is no need to remember names like :code:`HttpDebugNames` nor which method in it does the conversion. The developer making the :code:`HttpDebugNames` class or equivalent can take care of that in the same header file that provides the type. The type specific formatting is incorporated in to the general printing mechanism and from that point on works without any local code required, or memorization by the developer. Argument Forwarding ------------------- It will frequently be useful for other libraries to support formatting for input strings. For such use cases the class methods will need to take variable arguments and then forward them on to the formatter. :class:`BufferWriter` provides :libswoc:`BufferWriter::print_v` for this purpose. Instead of taking C style variable arguments, these overloads take a reference to a :code:`std::tuple` of arguments. Such as tuple is easily created with `std::forward_as_tuple `__. An example of this is a container of messages. The message class is .. literalinclude:: ../../unit_tests/ex_IntrusiveDList.cc :lines: 37-48,62 :emphasize-lines: 10 The container class has a :code:`debug` method to append :code:`Message` instances using |BWF|. .. literalinclude:: ../../unit_tests/ex_IntrusiveDList.cc :lines: 81-82,89,98 The implementation is simple. .. literalinclude:: ../../unit_tests/ex_IntrusiveDList.cc :lines: 122-131 :emphasize-lines: 6 This gathers the argument (generally references to the arguments) in to a single tuple which is then passed by reference, to avoid restacking the arguments for every nested function call. In essence refernces the arguments are put on the stack (inside the tuple) once and a reference to that stack is passed to nested functions. This replaces the C style :code:`va_list` and provides not just arguments but also complete type information. The example code uses :libswoc:`bwprint_v` to print to a :code:`std::string`. There is corresponding method, :libswoc:`BufferWriter::print_v`, which takes a tuple instead of an explicit list of arguments when working with |BW| instances. Internally, of course, :libswoc:`bwprint_v` is implemented using a local :libswoc:`FixedBufferWriter` instance and :libswoc:`BufferWriter::print_v`. Default Type Specific Formatting ================================ |BWF| has a number of user defined formatting overloads built in, primarily for types used inside the |BWF| implementation, to avoid circular reference problems. There is also support for formatting `IP addresses <_ip_addr_fmt>`_ via an additional include file. Specific types -------------- :code:`std::string_view` Generally the contents of the view. 'x' or 'X' A hexadecimal dump of the contents of the view in lower ('x') or upper ('X') case. 'p' or 'P' The pointer and length value of the view in lower ('p') or upper ('P') case. 's' The string in (forced) lower case. 'S' The string in (forced) upper case. For printing substrings, views are sufficiently cheap to do this in the arguments. For instance, printing the 10th through 20th characters of the view :code:`text` means passing :code:`text.substr(9,11)` instead of :code:`text`. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 43-44,57-58 However, for those terminally addicted to C style formatting, this can also be done by setting the precision. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 59-60,44,47-50 :libswoc:`TextView` Because this is a subclass of :code:`std::string_view`, all of the formatting for that works the same for this class. .. _ip_addr_fmt: :code:`sockaddr const *` :code:`#include "swoc/bwf_ip.h"` The IP address is printed. Fill is used to fill in address segments if provided, not to the minimum width if specified. :libswoc:`IPEndpoint` and :libswoc:`IPAddr` are supported with the same formatting. The formatting support in this case is extensive because of the commonality and importance of IP address data. Type overrides 'p' or 'P' The pointer address is printed as hexadecimal lower ('p') or upper ('P') case. The extension can be used to control which parts of the address are printed. These can be in any order, the output is always address, port, family. The default is the equivalent of "ap". In addition, the character '=' ("numeric align") can be used to internally right justify the elements. 'a' The address. 'p' The port (host order). 'f' The IP address family. '=' Internally justify the numeric values. This must be the first or second character. If it is the second the first character is treated as the internal fill character. If omitted '0' (zero) is used. E.g. :: void func(sockaddr const* addr) { bw.print("To {}", addr); // -> "To 172.19.3.105:4951" bw.print("To {0::a} on port {0::p}", addr); // -> "To 172.19.3.105 on port 4951" bw.print("To {::=}", addr); // -> "To 127.019.003.105:04951" bw.print("Using address family {::f}", addr); bw.print("{::a}",addr); // -> "172.19.3.105" bw.print("{::=a}",addr); // -> "172.019.003.105" bw.print("{::0=a}",addr); // -> "172.019.003.105" bw.print("{:: =a}",addr); // -> "172. 19. 3.105" bw.print("{:>20:a}",addr); // -> " 172.19.3.105" bw.print("{:>20:=a}",addr); // -> " 172.019.003.105" bw.print("{:>20: =a}",addr); // -> " 172. 19. 3.105" } Format Classes -------------- Although the extension for a format can be overloaded to provide additional features, this can become too confusing and complex to use if it is used for fundamentally different semantics on the same based type. In that case it is better to provide a format wrapper class that holds the base type but can be overloaded to produce different (wrapper class based) output. The classic example is :code:`errno` which is an integral type but frequently should be formatted with additional information such as the descriptive string for the value. To do this the format wrapper class :code:`swoc::bwf::Errno` is provided. Using it is simple:: w.print("File not open - {}", swoc::bwf::Errno(errno)); which will produce output that looks like "File not open - EACCES: Permission denied [13]" For :code:`errno` this is handy in another way as :code:`swoc::bwf::Errno` will preserve the value of :code:`errno` across other calls that might change it. E.g.:: swoc::bwf::Errno last_err(errno); // some other code generating diagnostics that might tweak errno. w.print("File not open - {}", last_err); This can also be useful for user defined data types. For instance, in the HostDB component of Traffic Server the type of the entry is printed in multiple places and each time this code is repeated :: "%s%s %s", r->round_robin ? "Round-Robin" : "", r->reverse_dns ? "Reverse DNS" : "", r->is_srv ? "SRV" : "DNS" This could be wrapped in a class, :code:`HostDBFmt` such as :: struct HostDBFmt { HostDBInfo* _r { nullptr }; HostDBFmt(r) : _r(r) {} }; Then define a formatter for the wrapper :: swoc::BufferWriter& bwformat( swoc::BufferWriter& w , swoc::bwf::Spec const& , HostDBFmt const& wrap ) { return w.print("{}{} {}", wrap._r->round_robin ? "Round-Robin" : "", r->reverse_dns ? "Reverse DNS" : "", r->is_srv ? "SRV" : "DNS"); } Now all of the cut and paste formatting code is replaced with :: w.print("{}", HostDBFmt(r)); These are the existing format classes in header file ``bfw_std_format.h``. All are in the :code:`swoc::bwf` namespace. .. namespace-push:: bwf .. class:: Errno Formatting for :code:`errno`. Generically the formatted output is the short name, the description, and the numeric value. A format type of ``d`` will generate just the numeric value, while a format type of ``s`` will generate the short name and description without a number. For more detailed output, the extension can be used to pick just the short or long name. For non-numeric format codes, if the extension has the character 's' then the short name is output, and if it contains the character 'l' the long name is output. Examples: ========== ============================================== Format Result ========== ============================================== ``:n`` [13] ``:s`` EACCES: Permission denied ``:s:sl`` EACCES: Permission denied ``:s:s`` EACCES ``:s:l`` Permission denied ``::s`` EACCES [13] ========== ============================================== :libswoc:`Reference `. .. class:: Date Date formatting in the :code:`strftime` style. An instance can be constructed with a :code:`strftime` compatible format, or with a :code:`time_t` and format string. When used the format specification can take an extention of "local" which formats the time as local time. Otherwise it is GMT. ``w.print("{}", Date("%H:%M"));`` will print the hour and minute as GMT values. ``w.print("{::local}", Date("%H:%M"));`` will print the hour and minute in the local time zone. ``w.print("{::gmt}"), ...);`` will output in GMT if additional explicitness is desired. :libswoc:`Reference `. .. function:: template < typename ... Args > FirstOf(Args && ... args) Print the first non-empty string in an argument list. All arguments must be convertible to :code:`std::string_view`. By far the most common case is the two argument case used to print a special string if the base string is null or empty. For instance, something like this:: w.print("{}", name != nullptr ? name : "") This could also be done like:: w.print("{}", swoc::bwf::FirstOf(name, "")); If the first argument is a local variable that exists only to do the empty check, that variable can eliminated entirely. const char * name = thing.get_name(); w.print("{}", name != nullptr ? name : "") can be simplified to w.print("{}", swoc::bwf::FirstOf(thing.get_name(), "")); In general avoiding ternary operators in the print argument list makes the code cleaner and easier to understand. :libswoc:`Reference `. .. class:: Optional A wrapper for optional output generation. This wraps a format string and a set of arguments and generates output conditional, either the format string with the arguments applied, or nothing. This is useful for output data that requires additional delimiters if present, but nothing if not. A common pattern for this is something like :: printf("Text: %d%s%s", count, data ? data : "", data ? " " : ""); or something like :: printf("Text: %d"); if (data) { printf(" %s", data); } In both cases, the leading space separating :arg:`data` from the previous output is printed iff :arg:`data` is not :code:`nullptr`. Using :code:`Optional` with |BWF| this is done with something like :: w.print("Text: {}{}", count, swoc:bwf::Optional(data != nullptr, " {}", data); The first argument is a conditional, which determines if output is generated, followed by a format string and then arguments for the format string. The number of specifiers in the format string and the number of arguments must agree. Because the case where the argument and the conditional are effective the same is so common, there is a specialization of :code:`Optional` which takes just a format string and an argument. This requires the format string to have take only one parameter, and the argument to either * Have the method :code:`empty` which returns :code:`false` if there is content. * Be convertible to :code:`bool` such that the argument converts to :code:`true` if there is content. This enables the example to be further reduced to :: w.print("Text: {}{}", count, swoc:bwf::Optional(" {}", data); Note this works with raw C strings, the STL string classes, and :code:`TextView`. The more general form can be used if this specialization doesn't suffice. :libswoc:`Reference `. Writing a Format Class ---------------------- Writing addtional format classes is designed to be easy, taking two or three steps. For example, consider a wrapper to output a string in `rot13 `__. The first step is to declare the wrapper class. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 651-652,655 This class simply stores the :code:`std::string_view` for later use. Next the formatting for the wrapper class must be provided by overloading :code:`bwformat`. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 657-664 This uses :libswoc:`transform_view_of` to do the character rotation. The lambda to perform the per character transform is defined separate for code cleanliness, it could just as easily have been defined directly as an argument. That's all that is strictly required - this code now works as expected. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 685-689 Note the universal initializer must be used because there is no constructor. That is easily fixed. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 651-655 and now this works as expected. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 691-692 Obviously other constructors can be provided for different ways to use the wrapper. An optional third step is to use free functions, rather than constructors, to access the wrapper. This is useful in some circumstances, one example being that it is desirable other classes can overload the format class construction, which is not possible using only constructors. In this case, a wrapper function could be done as .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 666-670 and used .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 694-695 Now, if there was a struct that needed Rot13 support .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 672-675 then the wrapper could be overloaded with .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 677-681 and used .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 697-699 In general, provide wrapper class constructors unless there is a specific need for using free functions instead. Care should be used with the content of the format class to avoid expensive copies. In this case a :code:`std::string_view` is very cheap to copy and the style of the wrapper takes advantage of `return value optimization `__. .. namespace-pop:: Working with standard I/O ========================= For convenience a stream operator for :code:`std::stream` is provided to make the use more natural. :: std::cout << bw; std::cout << bw.view(); // identical effect as the previous line. Using a :class:`BufferWriter` with :code:`printf` is straight forward by use of the sized string format code if necessary (generally using C++ IO streams is a better choice). :: swoc::LocalBufferWriter<256> bw; bw.print("Failed to connect to {}", addr1); printf("%.*s\n", int(bw.size()), bw.data()); Alternatively the output can be null terminated in the formatting to avoid having to pass the size. :: swoc::LocalBufferWriter<256> bw; printf("%s\n", bw.print("Failed to connect to {}\0", addr1).data()); When using C++ stream I/O, writing to a stream can be done without any local variables at all. :: std::cout << swoc::LocalBufferWriter<256>().print("Failed to connect to {}", addr1) << std::endl; If done repeatedly, a :code:`using` improves the look :: using LBW = swoc::LocalBufferWriter<256>; // ... std::cout << LBW().print("Failed to connect to {}", addr1) << std::endl; This is handy for temporary debugging messages as it avoids having to clean up local variable declarations later, particularly when the types involved themselves require additional local declarations (such as in this example, an IP address which would normally require a local text buffer for conversion before printing). As noted previously this is particularly useful inside a :code:`case` where local variables are more annoying to set up. Name Binding ============ The first part of each format specifier is a name. This was originally done to be more compliant with Python formatting and is most commonly left blank, although sometimes it is used to format arguments out of order or use them multiple times. To make this a more useful feature, |BWF| supports :term:`name binding` which binds names to text generator functors. The generator is expected to write output to a |BW| instance to replace the specifier, rather than a formatting argument. The base formatting logic is passed a functor by constant reference which provides the name binding service. The functor is expected to have the signature :: unspecified_type (BufferWriter & w, bwf::Spec const& spec) const As the format string is processed, if a format specifier has a name that is not numeric, the formatting logic calls the functor, ignoring the return value (which can therefore be of any type, including :code:`void`). :arg:`w` is the output buffer and :arg:`spec` is the specifier that caused the functor to be invoked. The binding functor is expected to generate text in :arg:`w` in accordance with the format specifier :arg:`spec`. Generally this involves looking up a functor based on the name and calling that in turn to generate the text. The name for the binding is contained in the :libswoc:`Spec::_name` member of :arg:`spec`. The class :libswoc:`NameBinding` is provided as a base class for supporting name binding. It * Forces a virtual destructor. * Provides a pure virtual declaration to ensure the correct function operator is implemented. * Provides a standardized :libswoc:`"missing name" method `. This class is handy but not required. |BWF| provides support for two use cases. External Generators ------------------- The first use case is for an "external generator" which generates text based on static or global data. An example would be a "timestamp" generator which generates a timestamp based on the current time. This could be associated with the name "timestamp" and used like .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 140 to generate output such as "Nov 16 12:21:05.545 Test Started". Context Generators ------------------ The second is a "context generator" which generates text based on a context object. This use case presumes a set of generators which access parts of a context object for text generation such that the output of the generator depends on the state of the context object. For example, the context object might be an HTTP request and the generators field accessors, each of which outputs the value for a specific field of the request. Because the name is handed to the name binding object, an implementation could subclass :libswoc:`ContextNames` and override the function operator to check the name first against fields in the request, and only if that doesn't match, do a lookup for a generator. :libswoc:`ContextNames` provides an implementation for storing and using name bindings. Global Names ------------ The external name generator support is used to create a set of default global names. A global singleton instance of an external name binding, :libswoc:`ExternalNames`, is used by default when generating formatting output. Generators assigned to this instance are therefore available in the default printing context. Here are a couple of examples for illustration of how this can be used. A "timestamp" name was used as an example of a name useful to implement, so the example here will start by doing that. First, the generator is defined. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 71-83 This generates a time stamp with the month through seconds, dropping the leading year and clipping everything past the seconds. It then adds milliseconds. Sample output looks like "Nov 16 11:40:20.833". This is then attached to the default global name binding in an initialization function called during process startup. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 125-132 :emphasize-lines: 4 Because the test code is statically linked to the library, this must be done via a function called from :code:`main` to be sure the library statics have been fully initialized. That taken care of, using the global name is trivial. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 140 The output from a run is "Nov 16 12:21:05.545 Test Started". Note because this is a format specifier, all of the supported format style works without additional work. That's not very useful with a timestamp but consider printing the epoch time. Again, the generator is defined. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 85-89 The generator is then assigned to the name "now". .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 128-131 :emphasize-lines: 2 And used with various styles. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 143 Sample output from a run is "Time is 1542393187 5bef0d63 5BEF0D63 0x5bef0d63". Context Binding Example ----------------------- Context name binding is useful for front ends to |BW|, not for direct use. The expected use case is format string provided by an external agent, with format specifiers to pull data from a context object where explicitly naming the context object isn't possible. As an example use case consider a Traffic Server plugin that provides a cookie manipulation function. When setting a cookie value, it is useful to access transaction specific data such as the URL, portions of the URL (e.g. the path), HTTP field values, some other cookie item value, etc. This can be provided easily by setting up a context binding which binds a request context, and binds the various names to the appropriate elements in the context. To start the example, a *very* simplified context will be used - it is hardwired for comprehensibility, in production code the elements would be initialized for each transaction. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 105-121 This holds the interesting information. Next up is a context name binding class that binds an instance of :code:`Context`. This can be done with the template :libswoc:`ContextNames`. The template class provides both a map of names to generators and the subclass of :libswoc:`NameBinding` to pass to the formatter. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 150 For each supported name a function is defined to extract that data. For fields and cookies, the extension will hold the field name and so the generator needs to look up the name from the extension in the specifier. The field generators are done as local lambda functions. The other generators are done as in place lambdas, since they simply pass a member of :code:`Context` to :code:`bwformat`. In production code this might done with lambdas, or file scope functions, or via methods in :code:`Context`. For writing the exmaple, lambdas were easiest and so those were used. First the field generators, as those are more complex. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 158-165 .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 167-174 :code:`NA` is a constant string used to indicate a missing field / cookie. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 67 With the field generators in place, time to hook up the generators. For the direct member ones, just define a lambda in place. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 177-187 In production code, :code:`cb` would be a process static, initialized at process start up, as the relationship between the names and the generators doesn't change. Time to try it out. This test gets the "YRP" field. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 189-190 This test reconstructs the URL without the query parameters. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 192-193 That's a minimalist approach, using as little additional code as possible. But it's a bit funky to require the field names in the extension. There are various alternative approaches that could be used. The one considered here is to do more parsing work to make it easier for the users, by making the names more structured in the form "cookie.name" which means the value of the cookie element with the name "name". The two implementations shown here were chosen to demonstrate features of |BWF|. One type of implementation is to change how names are handled by the context binding (`example `__). Note the base formatting logic does not do name look, it only passes the name (embedded in the specifier) to the binding. By subclassing the binding this lookup can be intercepted and done differently, specifically by checking for names of the format "A.B" and using A to select the table in which to lookup B. The other alternative is to change the parsing of the format string so that a field name such as "{cookie.name}" is parsed as if it had been "{cookie::name}" (`example `__). Both of these approaches require understanding the core formatting logic and how to customize it, as explained in `Custom Formatting`. Custom Formatting ================= The internals of |BWF| are designed to enable using other format syntax. The one described in this document is simply the one implemented by default. Any format which can be used to generate literal output along with instances of :libswoc:`bwf::Spec` instances can be made to work. Along with support for binding names, this makes it relatively easy to create custom format styles for use in specialized applications, particularly with formatting user input, e.g. for user defined diagnostic messages. This starts with the :libswoc:`BufferWriter::print_nfv` method. This is the formatted output implementation, all of the other variants serving as shims to call this method. The method has three arguments. :arg:`names` This is a container for bound names. If a specifier has a name that is not numeric, the specifier is passed to the name binding for output. :arg:`ex` The :term:` format extractor`. This is a functor that detects end of input and extracts literals and specifiers. It has two required overloads and one optional. .. class:: Extractor .. function:: explicit operator bool () const :return: :code:`true` if there is more format string to process, otherwise :code:`false`. .. function:: bool operator () (std::string_view &literal, bwf::Spec &spec) :return: :code:`true` if a specifier was parsed and :arg:`spec` updated, otherwise :code:`false`. Extract the next literal and/or specifier. It may be assumed both :arg:`literal` and :arg;`spec` are initialized as if default constructed. If no literal is available :arg:`literal` should be unmodified, otherwise it should be set to the literal. If a specifier is found, :arg:`spec` must be updated to the parsed value of the specifier. If a specifier is found the method must return :code:`true` otherwise it must return :code:`false`. The method must always return at least one of :arg:`literal` or :arg:`spec` if the extractor is not empty. .. function:: void capture(BufferWriter & w, const bwf::Spec & spec, std::any && value) This is an optional method used to capture an argument. A pointer to the argument is placed in :arg:`value` with full type information. The method may generate output but this is not required. If this method is not present and the extractor returns a specifier with the type :libswoc:`Spec::CAPTURE_TYPE`, an exception will be thrown. :arg:`args` A tuple containing the arguments to be formatted. The formatting logic in :libswoc:`BufferWriter::print_nfv` is .. uml:: :align: center title Core Formatting start while (ex()) is (not empty) :ex(literal, spec); if (literal) then (not empty) :w.write(literal); endif if (spec) then (found) if (spec._name) then (numeric or empty) :format arg[spec]; else :names(spec); endif endif endwhile (empty) stop If the name in :arg:`spec` is not empty and not numeric, rather than selecting a member of :arg:`args` the specifier is passed to the name binding, which presumably generates the appropriate output. The name is embedded in the specifier :arg:`spec` in the :libswoc:`Spec::_name` member for use by the name binding. Otherwise, an empty or numeric name means an argument is selected and passed to a :code:`bwformat` overload, the specific overload selected based on the type of the argument. For examples of this, the `Context Binding Example`_ will be redone in two different ways, each illustrating a different approach to customizing output formatting. .. _example-custom-parsing: Parsing Example --------------- For this case, the parsing of the format specifier is overridden and if the name is of the form "A.B" it is treated as "A::B", that is "A" is put in the :arg:`_name` member and "B" is put in the :arg:`_ext` member. Any extension is ignored. In addition, to act more like a Traffic Server plugin (and illustrate how to use alternate specifier formats), the parser requires format specifiers to be of the form "**%{**\ *name*\ **:**\ *style*\ **}**\ ". A double percent "%%" will mark a percent that is not part of a format specifier. The first step is to declare a class that will be the extractor functor. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 295-305 This will be used only as a temporary passed to :libswoc:`BufferWriter::print_nfv` and is therefore always constructed with the format string. The format string left to parse is kept in :arg:`_fmt` which means the empty check is really just a check on that. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 311-314 The function operator, which parses the format string to extract literals and specifiers, is a bit more complex. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 316-352 The rough logic is * Search for a '%' - if not found, it's all literal, return that. * Make sure the '%' isn't '%%' - if it is, need to return just a literal with the leading '%' and skip the trailing '%', doing more parsing on the next call. * Check for an open brace, and if found find the close brace, then parse the internals into a specifier. Because the same style format as the default is used, the parser for :libswoc:`bwf::Spec` can be used. Otherwise if something different were needed that parsing logic would replace .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 336 * If a specifier was found, check the name for a period. If found, split it and put the prefix in the name and the suffix in the extension. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 341-346 A name binding .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 359-360 is declared and names are assigned in the usual way. In addition to assigning context related names, external generators can also be assigned to the name binding, which can be a useful feature to inject external names in addition to the context specific ones. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 395 After that, everything is ready to try it out. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 397-408 .. _example-custom-name-dispatch: Name Binding Example -------------------- Another approach is to override how name lookup is done in the binding. Because the field handling will be done in the override, methods are added to the :code:`Context` to do the generation for structured names, rather than placing that logic in the binding. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 204-225 Next a subclass of :libswoc:`ContextNames` is created which binds to a :code:`ExContext` object. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 229-232 Inside the class the function operator is overloaded to handle name look up. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 237-257 The incoming name is taken from the specifier and split on a period. If that yields a non-empty result it is checked against the two valid structure names and the appropriate method on :code:`ExContext` called to generate the output. Otherwise the normal name look up is done to find the direct access generators. An instance is constructed and the direct access names assigned .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 261-270 and it's time to try it out. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :lines: 272-288 This tests structured names, direct access names, external names ("version"), and some formatting. C Style ------- The formatting is sufficiently flexible to emulate C style or "printf" formatting. Given that a major motivation for this work was the inadequacy of C style formatting, it's a bit odd to have this example but it was done to show that even emulating :code:`printf`, it's still better. I must note this, although this works reasonably well, it's still an example and not suitable for production code. There are still some edge cases not handled, but as an proof of concept it's not worth fixing every detail. The first step is creating a format extractor, since the format string syntax is completley different from the default. This is done by creating a class to perform the extraction and hold state, although it will only be used as a temporary passed to :libswoc:`BufferWriter::print_nfv`. The state is required to track "captured" arguments. These are used to emulate the '*' marker for integers in format specifiers, which indicate their value is in an argument, not the format string. This can be done both for maximum size and precision, so both of the must be capturable. The basic logic is to keep a :libswoc:`bwf::Spec` in the class to hold the captured values, along with flags indicating the capture state (it may be necessary to do two captures, if both the maximum size and precision are variable). .. literalinclude:: ../../unit_tests/ex_bw_format.cc :start-at: class C_Format :end-before: // class C_Format The empty indicator needs to be a bit different in that even if the format is empty, if the last part of the format string had a capture (indicated by :arg:`_saved_p` being :code:`true`) a non-empty state needs to be returned to get an invocation to output that last specifier. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :start-after: C_Format operator bool :end-before: C_Format operator bool The capture logic takes advantage of the fact that only integers can be captured, and in fact :code:`printf` itself requires exactly an :code:`int`. This logic is a bit more flexible, accepting :code:`unsigned` and :code:`size_t` also, but otherwise is fairly restrictive. It should also generate an error instead of silently returning on a bad type, but you can't have everything. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :start-after: C_Format capture :end-before: C_Format capture The set up for the capture passes the capture element in the extension of the return specifier, which this logic checks to know where to stash the captured value. The actual parsing logic will be skipped - it's in the example file :swoc:git:`src/unit_tests/ex_bw_format.cc` in the function operator method. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :start-after: C_Format parsing :end-before: { :lineno-match: This handles all the basics of C style formatting including sign control, minimum and maximum widths, precision, and leading radix support. One thing of note is that integer size indicators (such as "l' in "%ld") are ignored - the type is known, therefore the sizing information is redundant at best and wrong at worst, so it is parsed and discarded. If a capture is needed, state is set the extrator instance and the specifier type is set to :libswoc:`bwf::Spec::CAPTURE_TYPE` which will cause the formatting logic to call the extractor method :code:`capture` with the corresponding argument. The specifier name is always empty, as strict in order processing is mandatory. Some example uses, along with verification of the results. .. literalinclude:: ../../unit_tests/ex_bw_format.cc :start-after: C_Format tests :end-before: C_Format tests Summary ------- These example show that changing the format style and/or syntax can be done with relatively little code. Even the C style formatting takes less than 100 lines of code to be mostly complete, even though it can't take advantage of the parsing in :libswoc:`bwf::Spec` and handle captures. This makes using |BWF| in existing projects with already defined syntax which is not the same as the default a low hurdle to get over. Design Notes ************ This is essentially my own work but I want to call out Uthira Mohan, who was there at the start of what became |BWF|, a joint quicky project to play with variadic templates and formatting. This code is based directly on that project, rather excessively extended, as is my wont. Alan Wang contributed the floating point support, along with useful comments on the code and API while he was an intern. Thanks, Uthira and Alan! Type safe formatting has two major benefits - * No mismatch between the format specifier and the argument. Although some modern compilers do better at catching this at run time, there is still risk (especially with non-constant format strings) and divergence between operating systems such that there is no `universally correct choice `__. In addition the number of arguments can be verified to be correct which is often useful. * Formatting can be customized per type or even per partial type (e.g. :code:`T*` for generic :code:`T`). This enables embedding common formatting work in the format system once, rather than duplicating it in many places (e.g. converting enum values to names). This makes it easier for developers to make useful error messages. See :ref:`this example ` for more detail. As a result of these benefits there has been other work on similar projects, to replace :code:`printf` a better mechanism. Unfortunately most of these are rather project specific and don't suit the use case in Traffic Server. The two best options, `Boost.Format `__ and `fmt `__, while good, are also not quite close enough to outweight the benefits of a version specifically tuned for Traffic Server. ``Boost.Format`` is not acceptable because of the Boost footprint. ``fmt`` has the problem of depending on C++ stream operators and therefore not having the required level of performance or memory characteristics. Its main benefit, of reusing stream operators, doesn't apply to Traffic Server because of the nigh non-existence of such operators. The possibility of using C++ stream operators was investigated but changing those to use pre-existing buffers not allocated internally was very difficult, judged worse than building a relatively simple implementation from scratch. The actual core implementation of formatted output for :class:`BufferWriter` is not very large - most of the overall work will be writing formatters, work which would need to be done in any case but in contrast to current practice, only done once. This code has under gone multiple large scale revisions, some driven by use (the most recent only triggered by trying to write the examples in this document and finding some rough edges) and others by a need for additional functionality (the format extractor support). I think it's close to its final form and I am quite pleased with it. The most recent revisions to the alternate formatting support have made it rather simple to retrofit this work in to existing / legacy applications. I do expect to have some ongoing work on the documentation, which I consider currently basically a first pass.