zuzu-rust 0.5.0

=encoding utf8

=head1 NAME

std/string/encode - Character-encoding conversions between Strings and BinaryStrings.

=head1 SYNOPSIS

  from std/string/encode import *;

  let bytes := encode( "héllo", ENCODING_UTF16 );
  let text := decode( bytes, "UTF-16" );

=head1 IMPLEMENTATION SUPPORT

This module is supported by all implementations of ZuzuScript.

=head1 DESCRIPTION

This module converts between text (C<String>) and encoded bytes
(C<BinaryString>).

All implementations support UTF-8, UTF-16, UTF-32, and ISO-8859-1
(Latin-1). Implementations are encouraged to support additional
encodings where the host platform makes that practical; programs that
need to run on every implementation should restrict themselves to the
four required encodings.

Encoding names are matched case-insensitively, so C<"utf-8"> and
C<"UTF-8"> are equivalent.

For UTF-16 and UTF-32, C<encode> produces the canonical form: big-endian
with no byte order mark. This is deterministic and identical across
implementations. C<decode> honours a leading byte order mark (consuming
it and switching to little-endian where it says so) and otherwise
assumes big-endian input.

Invalid input raises an exception: unknown encoding names, bytes that do
not form valid text in the requested encoding, and characters that the
target encoding cannot represent (for example, encoding C<"😀"> as
ISO-8859-1) all throw.

=head1 EXPORTS

=head2 Functions

=over

=item * C<encode(String text, String encoding)>

Parameters: C<text> is the text to encode; C<encoding> names the target
encoding and defaults to C<"UTF-8">. Returns: C<BinaryString>. Encodes
C<text> as bytes. Throws a TypeException if C<text> is not a C<String>,
and an exception if the encoding is unknown or cannot represent a
character in C<text>.

=item * C<decode(BinaryString bytes, String encoding)>

Parameters: C<bytes> is the encoded input; C<encoding> names the source
encoding and defaults to C<"UTF-8">. Returns: C<String>. Decodes
C<bytes> into text. Throws a TypeException if C<bytes> is not a
C<BinaryString>, and an exception if the encoding is unknown or the
bytes are not valid for it.

=back

=head2 Constants

=over

=item C<ENCODING_UTF8>

Type: C<String>. The value C<"UTF-8">.

=item C<ENCODING_UTF16>

Type: C<String>. The value C<"UTF-16">.

=item C<ENCODING_UTF32>

Type: C<String>. The value C<"UTF-32">.

=item C<ENCODING_LATIN>

Type: C<String>. The value C<"ISO-8859-1">.

=back

=head1 COPYRIGHT AND LICENCE

B<< std/string/encode >> is copyright Toby Inkster.

It is free software; you may redistribute it and/or modify it under
the terms of either the Artistic License 1.0 or the GNU General Public
License version 2.