After combining
x::mime::newline_iter
,
x::mime::bodystart_iter
,
and
x::mime::header_iter
,
use the following iterators to form a complete parser for a
non-compound MIME section
(with some assistance from a
x::mime::header_collector
):
#include <x/mime/sectiondecoder.H> std::string content_transfer_encoding; x::mime::section_decoder decoder= x::mime::section_decoder::create(content_transfer_encoding, std::ostreambuf_iterator<char>(std::cout));
The first parameter to
x::mime::section_decoder
's
create
() is
the value of the MIME “Content-Transfer-Encoding”
header, like
“quoted-printable”, or
“base64” (all other values result in a non-transformative
decoder).
The second parameter is an output iterator over
char
s.
x::mime::section_decoder
is an output iterator over
int
s from,
at a minimum, an
x::mime::newline_iter
that produces an
output sequence of int
s, demarcated by newlines,
with a trailing eof
().
The output iterator instance received by
create
() gets iterated over
char
that were decoded using the specified
transfer encoding.
std::string content_transfer_encoding; std::string charset; x::mime::section_decoder decoder= x::mime::section_decoder::create(content_transfer_encoding, std::ostreambuf_iterator<char>(std::cout), charset, "UTF-8");
create
() takes two optional parameters.
For text
MIME entities, the first optional
parameter is the MIME entity's character set, from the
“Content-Type” header. The second optional parameter
is the application's character set. In addition to decoding the
output sequence,
x::mime::section_decoder
transcodes the char
s from MIME entity's
character set to the application character set.
#include <x/mime/entityparser.H> x::outputrefiterator<int> processor=x::mime::make_entity_parser( x::mime::header_collector::create( [] (const std::string &name, const std::string &name_lc, const std::string &value) { // ... }), [] { return body_iterator(); }, x::mime::sectioninfo::create());
x::mime::make_entity_parser
()
combines
x::mime::bodystart_iter
,
x::mime::header_iter
,
and a few other odds and ends. It instantiates
an output iterator that expects to be iterated over a single
MIME entity.
x::mime::make_entity_parser
returns
a template class for an output iterator that's convertable to an
x::outputrefiterator<int>
,
and which iterates over an output sequence of int
values produced by
x::mime::newline_iter
.
x::mime::make_entity_parser
()
takes three parameters, and returns an output iterator
over int
s.
The first parameter becomes an output iterator that
gets iterated over the header
portion of the MIME entity.
The iterator constructed by
x::mime::header_collector
is a popular choice for the header iterator, since
x::mime::make_entity_parser
() already constructs
an intermediate
x::mime::header_iter
anyway.
When
x::mime::make_entity_parser
()'s iterator
iterates over an x::mime::body_start
, the
header iterator iterates over an x::mime::eof
value and
the second parameter
to
x::mime::make_entity_parser
() gets invoked.
The second parameter is a functor or a lambda that returns another
output iterator over int
s, which ends up iterating
over the rest of the output sequence, after the
x::mime::body_start
.
x::mime::make_entity_parser
() encapsulaets
the typical control flow of collecting the headers of a MIME entity,
then figuring out how to parse the entity's body.
The standard approach is to have the header iterator collect the
MIME entity's headers, then have the functor/lambda figure out what
to do with this entity, and return an output iterator that implements
what's to be done.
The third and the final parameter is a
x::mime::sectioninfo
.
The output iterator returned by
x::mime::make_entity_parser
() updates this
object as it iterates over its output sequence.
x::mime::make_entity_parser
() returns an
output iterator, and that's pretty much it. The show starts only after
it actually iterates over something that resembles a MIME entity.
This has a couple of implications.
What both functors or lambdas capture, and whether by reference
or by value, needs careful thought. The functors/lambdas do not
get invoked by
x::mime::make_entity_parser
(). They
get invoked, as appropriate, when the resulting output iterator
actually iterates over something. This usually means capturing by
value, preferrably a reference to a reference-counted object.
The values in the x::mime::sectioninfo
also get updated only when the show gets on the road.
x::mime::make_entity_parser
() takes the
x::mime::sectioninfo
object, and saves it
as
part of the returned output iterator, which updates the
MIME entity metadata in the
x::mime::sectioninfo
as the iterator
iterates over the output sequence. The iterator's reference on the
x::mime::sectioninfo
object gets released
only after the output iterator goes out of scope and gets
destroyed.
The values in the
x::mime::sectioninfo
may be used only
after
the output sequence iterates over the eof
value.
The following example shows how to decode a non-compound MIME entity.
#include <x/mime/newlineiter.H> #include <x/mime/headeriter.H> #include <x/mime/bodystartiter.H> #include <x/mime/headercollector.H> #include <x/mime/sectiondecoder.H> #include <x/mime/entityparser.H> #include <x/mime/structured_content_header.H> #include <x/chrcasecmp.H> #include <iostream> int main() { std::string content_transfer_encoding; std::string content_type="text"; std::string charset; auto info=x::mime::sectioninfo::create(); auto processor= x::mime::make_entity_parser (x::mime::header_collector::create ([&] (const std::string &name, const std::string &name_lc, const std::string &value) { x::chrcasecmp::str_equal_to cmp; if (cmp(name, x::mime::structured_content_header ::content_transfer_encoding)) { content_transfer_encoding= x::mime ::structured_content_header(value) .value; } if (cmp(name, x::mime::structured_content_header ::content_type)) { x::mime::structured_content_header hdr(value); content_type=hdr.mime_content_type(); charset=hdr.charset("iso-8859-1"); } }), [&] { typedef std::ostreambuf_iterator<char> dump_iter_t; dump_iter_t dump_to_stdout(std::cout); return content_type == "text" ? x::mime::section_decoder ::create(content_transfer_encoding, dump_to_stdout, charset, "UTF-8") : x::mime::section_decoder ::create(content_transfer_encoding, dump_to_stdout); }, info); typedef x::mime::newline_iter<decltype(processor)> newline_iter_t; std::copy(std::istreambuf_iterator<char>(std::cin), std::istreambuf_iterator<char>(), newline_iter_t::create(processor)) .get()->eof(); std::cout << info->header_char_cnt << " bytes in the header, " << info->body_char_cnt << " bytes in the body." << std::endl << info->header_line_cnt << " lines in the header, " << info->body_line_cnt << " lines in the body." << std::endl; if (info->no_trailing_newline) std::cout << "No trailing newline" << std::endl; return 0; }
bodydecoder.C
constructs a processor using a
header collector that only cares about the
“Content-Transfer-Encoding” and
“Content-Type” headers, using
a case-insensitive
string comparison.
Proper parsing of these structured MIME headers requires an
x::mime::structured_content_header
,
even for the
“Content-Transfer-Encoding”. This makes sure that any
whitespace in the headers gets properly ignored.
Once the MIME's body begins, the body iterator construction lambda
instantiates a
x::mime::section_decoder
that
outputs to a std::ostreambuf_iterator
to
std::cout
.
If “Content-Type” indicates that this is a
text
MIME entity, it also gets transcoded to
UTF-8
.
$ cat bodydecoder.txt
Subject: test
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Hello=A0world!
$ ./bodydecoder <bodydecoder.txt
Hello world!
104 bytes in the header, 15 bytes in the body.
4 lines in the header, 1 lines in the body.
bodydecoder.C
reads a non-compound MIME
entity on standard input, and writes its decoded body to standard
output.
In this example, it's ok for the lambas
to capture the stack-scoped objects,
content_transfer_encoding
,
content_type
, and
charset
, by reference. Everything gets iterated,
and everything goes out of scope and gets destroyed, before
main
() returns.
In most situations, it will be somewhat difficult to capture much
by reference, and they'll capture everthing by value. Reference-counted
objects come in very handy, under these circumstances.
The blank line that separates the header from the body is considered to be a part of the header portion of the MIME entity.