Chapter 40. A tokenizer helper template

The x::tokenizer template takes one parameter, a class that defines a static constexpr is_token() method, like this:

#include <x/tokens.H>

class is_alphabetic {

public:

    static constexpr bool is_token(char c)
    {
        return (c >= 'A' && c <= 'Z') ||
               (c >= 'a' && c <= 'z');
    }
};

typedef x::tokenizer<is_alphabetic> alphabetic_tokens_t;

x::tokenizer implements common algorithms for constructing and dealing with grammars and protocols that employ elements that consist of characters that are a part of the defined token class. In the following descriptions, token class is the set of characters for which the above constexpr is_token() function returns true.

bool flag=x::tokenizer<token_class_t>::is_token(b, e);

b and e are iterators that define an input sequence. This is_token() returns true if all characters in the input sequence are in the character token class.

bool flag=x::tokenizer<token_class_t>::is_token(c);

This is equivalent to bool flag=x::tokenizer<token_class_t>::is_token(std::begin(c), std::end(c));.

output_iter_t o=x::tokenizer<token_class_t>::emit_token_or_quoted_word(p, b, e);

b and e must be, at a minimum, forward iterators. The first parameter, p is an output iterator. If is_token(b, e), then the sequence defined by the forward iterators is copied into the output iterator. Otherwise the sequence that gets copied into the output iterator consists of a double quote, <">, the forward iterator-defined sequence, and another double quote, with any double quote and backslash, <\>, characters contained in the forward iterator-defined sequence prefixed by a backslash.

In all cases, emit_token_or_quoted_word() returns the final value of the output iterator. An exception gets thrown if the forward iterator sequence contains \r, \n, or \0 characters, which cannot be in the token_class_t.

output_iter_t o=x::tokenizer<token_class_t>::emit_token_or_quoted_word(p, c);

This is equivalent to output_iter_t o=x::tokenizer<token_class_t>::emit_token_or_quoted_word(p, std::begin(c), std::end(c));

x/tokens.H defines the following classes that are suitable to be the template parameter to x::tokenizer:

is_http_token

The characters that belong to the token element defined in section 2.2 of RFC 2616.