bodydecoder.C
and
bodydecoder2.C
lack the logic for handling
compound MIME entities.
x::mime::make_multipart_parser
() and
x::mime::make_message_rfc822_parser
()
are section processor factories that wrap other
section processor factories
and return output iterators for parsing compound MIME sections.
They get invoked from either
the section processor factory that's passed to
x::mime::make_document_entity_parser
,
or from another section processor factory that was previously
wrapped by one of these functions.
This results in an open-ended framework for recursively parsing
compound MIME documents:
#include <x/mime/newlineiter.H> #include <x/mime/headeriter.H> #include <x/mime/bodystartiter.H> #include <x/mime/headercollector.H> #include <x/mime/sectiondecoder.H> #include <x/mime/entityparser.H> #include <x/mime/structured_content_header.H> #include <x/mime/contentheadercollector.H> #include <x/chrcasecmp.H> #include <iostream> x::outputrefiterator<int> parse_section(const x::headersbase &, const x::mime::sectioninfo &); x::outputrefiterator<int> create_parser(const x::mime::sectioninfo &info, bool is_message_rfc822) { auto header_iter= x::mime::contentheader_collector::create(is_message_rfc822); auto headers=header_iter.get(); return x::mime::make_entity_parser (header_iter, [headers, info] { return parse_section(headers->content_headers, info); }, info); } x::outputrefiterator<int> parse_section(const x::headersbase &headers, const x::mime::sectioninfo &info) { x::mime::structured_content_header content_type(headers, x::mime::structured_content_header::content_type); if (content_type.is_message()) return x::mime::make_message_rfc822_parser (create_parser, info); if (content_type.is_multipart()) return x::mime::make_multipart_parser (content_type.boundary(), create_parser, info); typedef std::ostreambuf_iterator<char> dump_iter_t; dump_iter_t dump_to_stdout(std::cout); std::string content_transfer_encoding= x::mime::structured_content_header (headers, x::mime::structured_content_header::content_transfer_encoding) .value; return content_type.mime_content_type() == "text" ? x::mime::section_decoder::create(content_transfer_encoding, dump_to_stdout, content_type .charset("iso-8859-1"), "UTF-8") : x::mime::section_decoder::create(content_transfer_encoding, dump_to_stdout); } void dump(const x::mime::const_sectioninfo &info) { std::cout << "MIME section " << info->index_name() << " starts at character offset " << info->starting_pos << std::endl << " " << info->header_char_cnt << " bytes in the header, " << info->body_char_cnt << " bytes in the body." << std::endl << " " << info->header_line_cnt << " lines in the header, " << info->body_line_cnt << " lines in the body." << std::endl; if (info->no_trailing_newline) std::cout << " No trailing newline" << std::endl; for (const auto &child:info->children) { std::cout << std::endl; dump(child); } } int main() { x::mime::sectioninfoptr top_level_info; std::copy(std::istreambuf_iterator<char>(std::cin), std::istreambuf_iterator<char>(), x::mime::make_document_entity_parser ([&top_level_info] (const x::mime::sectioninfo &info, bool is_message_rfc822) { top_level_info=info; return create_parser(info, is_message_rfc822); })) .get()->eof(); if (top_level_info.null()) { std::cerr << "How did we get here?" << std::endl; return 0; } dump(top_level_info); return 0; }
x::mime::contentheader_collector
is an output iterator that's similar to
x::mimeheadercollector
,
except that it collects all headers whose names start with
“Content-” into a
x::headersbase
.
The output iterator's
get
method
returns a reference to a reference-counted object with a
content_headers
member, which is a
x::headersbase
container for the
“Content-” headers.
Additionally,
x::mime::contentheader_collector
's
constructor takes a bool
flag.
If true, the “Mime-Version: 1.0” header must be present,
otherwise no “Content-” headers get collected
(content_headers
will be empty).
“Mime-Version: 1.0” can appear after the
“Content-” headers.
x::mime::contentheader_collector
collects all
“Content-” headers as it iterates over the header portion
of a MIME entity. At the end of the output sequence, the accumulated
headers in content_headers
get
cleared if the bool
flag
is true but “Mime-Version: 1.0” was absent:
$ cat bodydecoder.txt
Subject: test
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Hello=A0world!
$ ./bodydecoder3 <bodydecoder.txt
Hello=A0world!
MIME section 1 starts at character offset 0
104 bytes in the header, 15 bytes in the body.
4 lines in the header, 1 lines in the body.
In the absence of the “Mime-Version: 1.0” header, this is
parsed as a non-MIME message, so the quoted-printable
transfer encoding is not used, producing “Hello=A0world” on
output.
$ cat bodydecoder2.txt
Subject: test
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0
Hello=A0world!
$ ./bodydecoder3 <bodydecoder2.txt
Hello world!
MIME section 1 starts at character offset 0
122 bytes in the header, 15 bytes in the body.
5 lines in the header, 1 lines in the body.
Now the MIME headers are in effect.
In bodydecoder3.C
,
create_parser
is the session processor factory
functor/lambda that's passed to
x::mime::make_document_entity_parser
,
like
bodydecoder.C
and
bodydecoder2.C
(with a small wrapper that captures
the top level x::mime::sectioninfo
object.
create_parser
constructs a new
x::mime::contentheader_collector
object,
and passes to x::mime::make_entity_parser
() as the
header iterator.
The body iterator factory captures the reference
to the reference-counted object with the
content_headers
by value, so that the object is still
in scope long after create_parser
() returns, when
the header portion iteration concludes, at some time later.
The body iterator factory parameter given to
x::mime::make_entity_parser
() looks at the
collected headers.
x::mime::make_message_rfc822_parser
()
takes a section processor functor/lambda and the
x::sectioninfo
of a
message/rfc822
MIME entity, that invokes the
section processor functor/lambda with a
x::sectioninfo
for the body of the
message/rfc822
MIME entity.
bodydecoder3.C
passes the same
create_parser
() functor, effecting recursive
parsing of these MIME entities.
x::mime::make_multipart_parser
()
takes three parameters: a delimiter for a multipart
compound MIME entity, a session processor factory functor/lambda,
and multipart
's
x::sectioninfo
.
It returns an output iterator that invokes the functor/lamba for
every entity that the multipart
entity contains.
For non-compound MIME entities, the body iterator factory returns a
x::mime::section_decoder
to decode the non-compound entity, converting text
MIME entities to the UTF-8
character set:
$ cat bodydecoder3.txt
Subject: test
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="xxx"
--xxx
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Hello=A0
--xxx
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
world!
--xxx--
$ ./bodydecoder3 <bodydecoder3.txt
Hello world!
MIME section 1 starts at character offset 0
79 bytes in the header, 217 bytes in the body.
4 lines in the header, 12 lines in the body.
MIME section 1.1 starts at character offset 85
90 bytes in the header, 8 bytes in the body.
3 lines in the header, 1 lines in the body.
No trailing newline
MIME section 1.2 starts at character offset 190
90 bytes in the header, 7 bytes in the body.
3 lines in the header, 1 lines in the body.
bodydecoder3.C
decodes each MIME entity in the
document, one at a time, concatenating their contents.
The first part of the multipart
entity does not
end with a trailing newline, so the result of the concatenation is a
single line of text.
examples/mime/bodydecoder4.C
is an alternative version of bodydecoder3.C
that
uses
x::mime::make_parser
()
to replace the logic in the first half of
parse_section
().
The first parameter to x::mime::make_parser
() is a
x::mime::structured_content_header
with the
the value of the “Content-Type” header.
The second parameter is the
x::sectioninfo
for the MIME section where
this “Content-Type” header came from.
If it's a compound MIME section,
x::mime::make_parser
() uses
x::mime::make_multipart_parser
() or
x::mime::make_message_rfc822_parser
() to take
care of it, with the section processor factory passed as the third
parameter.
The fourth parameter is a functor or a lambda that gets invoked if
the MIME section is not a compound section. It receives one argument,
the x::sectioninfo
.