Chapter 33. URIs


Using international domain names

A x::uriimpl is a class that represents a URI as defined in RFC 3986.

#include <x/uriimpl.H>

x::uriimpl u("http://uid:pw@host/path?query#fragment");

std::cout << "scheme: " << u.getScheme() << std::endl
	  << "authority: " << (u.getAuthority() ?
			       : std::string("(null)")) << std::endl
	  << "path: " << u.getPath() << std::endl
	  << "query: " << u.getQuery() << std::endl
	  << "fragment: " << u.getFragment() << std::endl;

A x::uriimpl may also be constructed from an input sequence defined by a beginning and an ending iterator.

x::uriimpl u(str.begin(), str.end());

The constructors throw an exception if the passed string cannot be parsed as an URI. getScheme(), getAuthority(), getPath(), getQuery(), and getFragment() retrieve the corresponding parts of the URI.

getAuthority() returns a reference to a x::uriimpl::authority_t, a class that's convertible to a bool indicating whether the URI includes an authority part. Other get methods return a std::string which will be empty if the URI did not have the corresponding part.

x::uriimpl::authority_t has three fields:


The userinfo portion of the authority.


This bool is true if the authority specifies a userinfo part.

An empty string in userinfo does not necessarily indicate that the authority did not have a userinfo part. The strict syntax allows an empty userinfo to be specified. If has_userinfo is true, and userinfo is an empty string, the authority had a @ character with nothing to its left.


The host portion of the authority, with an optional :port suffix.

setScheme(), setAuthority(), setPath(), setQuery(), and setFragment() replace the corresponding part of the URI. Their std::string parameter specifies the new value (including setAuthority()). An exception gets thrown if the passed string contains characters that are not allowed in the URI part.

A x::uriimpl may hold an absolute or a relative URI. The += or + operation combines two URIs together.

x::uriimpl absuri("");

absuri += x::uriimpl("../images");

std::string str;


toString() formats the URI as a string, and writes it to the given output iterator.

x::uriimpl defines all comparison operators, as such this class may be used as a key in an associative container. As specified by the RFC, the URI scheme and the host component of the authority is case insensitive. This comparison operation has no knowledge of scheme-specific semantics, so all other parts of a URI are considered case sensitive.

#include <x/uriimpl.H>
#include <x/http/form.H>

x::uriimpl u("http://host/path?parameter=value");

auto form=u.getForm();

for (const auto &param: *form)
    std::cout << param.first << "=" << param.second << std::endl;

getForm() invokes getQuery() and returns a x::http::form::parameters.

Using international domain names

#include <x/uriimpl.H>
#include <x/locale.H>

x::uriimpl u("http://привет", x::locale::base::utf8());

std::cout << u.getHostPort().first << std::endl;

URIs that use international domain names get constructed with a second parameter to x::uriimpl's constructor. The second parameter specifies the locale whose character set encodes the international domain name. There's an optional third parameter that specifies LibIDN conversion flags: IDNA_ALLOW_UNASSIGNED, and IDNA_USE_STD3_ASCII_RULES.

The international domain name is stored as its ASCII-compatible encoding, so the above example produces on standard output.

std::string uri_utf8=u.toStringi18n(x::locale::base::utf8());

toStringi18n() returns the URI as a string. An international domain name in the URI gets converted from its ASCII-compatible encoded representation to the character set specified by the locale parameter.

#include <x/idn.H>

std::string i18n;
std::string str=x::idn::to_ascii(i18n, x::locale::base::environment());

i18n=x::idn::from_ascii(str, x::locale::base::environment());

idn.H defines low level functions for converting strings to or from ASCII-compatible encoding that's used with international domain names. Overloaded to_ascii() methods convert international domain names encoded in the locale's codeset to ASCII-compatible encoding method, and from_ascii() does the reverse.