nested-text 0.1.0

A fully spec-compliant NestedText v3.8 parser and serializer
Documentation
Official NestedText Test Suite
==============================

Version: 3.8

As of NestedText v3.8 the official test suite was replaced by the current test 
suite.


Issues with Previous Test Suite
-------------------------------

The previous test suite had the following issues.

1. It contained symbolic links, which caused problems on Windows.

2. The tests contain invisible characters in the form of tabs, end-of-line 
   spaces, unicode white spaces, control characters, and line termination 
   characters.  These invisible characters were often confusing and could easily 
   be lost.

3. The tests were too tailored to the specific behavior of the Python 
   *NestedText* implementation.

4. The tests were categorized and numbered.  Both the categorization and 
   numbering were problematic.  Each test often fit into many categories but 
   could only be placed in one.  The numbering implied an order where no natural 
   order exists.

5. Important tests were missing from the test suite.

6. The template directories assumed a case-sensitive file system.


The New Test Suite
------------------

These test cases are focused on assuring that valid *NestedText* input is 
properly read and invalid NestedText is properly identified by an implementation 
of *NestedText*.  No attempt is made to assure that the implementation produces 
valid *NestedText*.  There is considerably flexibility in the way that 
*NestedText* may be generated.  In light of this flexibility the best way to 
test a *NestedText* writer is to couple it with a *NestedText* reader and 
perform a round trip through both, which can be performed with these test cases.


tests.nt
--------

The test cases are contained in *tests.nt*.  The *convert* command converts 
these test cases into JSON (*tests.json*).  It is expected this JSON file is 
what is used to test *NestedText* implementations.  The *NestedText* file of 
test cases, *tests.nt*, is used to generate *tests.json*, and is only needed if 
you plan to add or modify tests.  Do not modify the JSON file directly, as any 
changes will be overridden whenever *convert* is run.

Each test case in *tests.nt* is a dictionary entry.  The key is used as the name 
of the test.  The keys must be unique and are largely chosen at random, but any 
words that are expected to be found within the test cases are avoided.  This 
allows test cases to be quickly found by searching for their name.

The fields that may be specified are:

description (str):
    Simply describes the test.  Is optional and unused.

string_in (str):
    This is a string that will be fed into a *NestedText* load function.  This 
    string contains a *NestedText* document, though that document may contain 
    errors.  This string may contain Unicode characters and Unicode escape 
    sequences.  It is generally recommended that Unicode white space be given 
    using escape sequences so that there presence is obvious.

bytes_in (str):
    This is an alternate to *string_in*.  In this case the *NestedText* document 
    is constrained to consist of ASCII characters and escape sequences such as 
    \\t, \\r, \\n, etc.  Also supported are binary escape sequences, \\x00 to 
    \\xFF.

encoding (str):
    The desired encoding for the *NestedText* document.  The default encoding is 
    UTF-8.  If you specify *bytes* as the encoding, no encoding is used.

load_out (None | str | list | dict):
    The expected output from the *NestedText* load function if no errors are 
    expected.  If a string is given and if the first character in the string is 
    ‘!’, then the mark is removed and what remains is evaluated by Python, with 
    the result becoming the expected output.

load_err (dict):
    Details about the error that is expected to be found and reported by the 
    *NestedText* load function.  It consists of 4 fields:

    message (str):
        The error message that is emitted by the Python implementation of 
        *NestedText*.

    line (str):
        The text of the line in the *NestedText* input from *load_in* or 
        *bytes_in* that contains the error.

    lineno (None, int):
        The line number of the line that contains the error.  The first line 
        is line 0.

    colno (None, int):
        The number of the column where the error is likely to be.  The first 
        column is column 0.

Here is an example of a test with a valid *NestedText* document::

    pundit:
        description: a single-level dictionary
        load_in:
            > key 1: value 1
            > key 2: value 2
            >
        load_out:
            key 1: value 1
            key 2: value 2

And here is an example of a test with an invalid *NestedText* document::

    foundling:
        load_in:
            > ingredients:
            >   - 3 green chilies
            >     - 3 red chilies
        load_err:
            message: invalid indentation
            line:     - 3 red chilies
            lineno: 2
            colno: 2

The test keys (*pundit* and *foundling*) are arbitrary.

Control characters and Unicode white space characters are expected to be 
backslash escaped in *load_in*, *bytes_in*, *load_out*, and *load_err.lines*.  
Here are some specific cases where backslash escapes should be used:

**Line Terminations**  Newlines are replaced by line feed (LF) characters unless 
the newline is preceded by either \\r or \\n or both.  The \\r is replaced by 
a carriage return (CR) and the \\n is replaced by a line feed (LF).  In this way 
the line termination characters can be specified explicitly on a per line basis.  
For example::

    key 1: this line ends with CR & LF\r\n
    key 2: this line ends with CR\r
    key 3: this line ends with LF\n
    key 4: this line also ends with LF
    key 5: this line, being the last, has no line termination character

**White Space**  All white space other than ASCII spaces and newlines should be 
made explicit by using backslash escape sequences.  Specifically tabs should be 
specified as \\t and the Unicode white spaces should be specified using their 
\\x or \\u code (ex. \\xa0 or \\u00a0 for the no-break space).  In addition, end 
of line spaces are optionally made explicit by replacing them with \\x20 if they 
are important and there is concern that they may be accidentally lost.

**Other Special Characters**  Backslash escape codes should also be used for 
control codes (\\a for bell, \\b for backspace, \\x7f for delete, \\x1b for 
escape, etc) and for backslash itself (\\\\).


tests.json
----------

The *convert* command creates *tests.json*, but if you do not wish to add or 
modify the tests, you can simply use *tests.json* from the GitHub repository.

*tests.json* is a file suitable for use with `parametrize_from_file 
<https://parametrize-from-file.readthedocs.io/en/latest/api/parametrize_from_file.html>`_, 
which is a *pytest* plugin suitable for testing Python projects (*test_nt.py* 
uses *parametrize_from_file* to apply *tests.json* to the Python implementation 
of *NestedText*).  However, you can use *tests.json* directly to implement tests 
for any *NestedText* implementation in any language.

It contains dictionary with a single key, *load_tests*.  The value of this key 
is a nested dictionary where each key-value pair is one test.  The key is the 
name of the test and the value is the test.  The test consists of the following 
fields:

load_in:
    This is a string that contains the *NestedText* document to be loaded for 
    the test.  The string is a base64 encoded string of bytes.

load_out:
    The expected output from the *NestedText* loader if no error is expected.  
    The structure of this value is dependent on the *NestedText* document 
    encoded in *load_in*.  It may be a nested collection of lists, dictionaries 
    and strings, or it may be *null*.

load_err:
    Details about an expected error.  *load_err* supports the following 
    subfields:

    message:
        The message generated by the Python implementation of *NestedText* for 
        the expected error.

    line:
        The line in the input document where the error occurs.

    lineno:
        The line number of the line where the error occurs.  0 represents the 
        first line in the document.  Is *null* or missing if the line number is 
        unknown.

    colno:
        The column number where the error occurs.  0 represents the first 
        column.  Is *null* or missing if the column number is unknown.

encoding:
    The encoding for the *NestedText* document.  The default encoding is UTF-8.

types:
    A dictionary of line-type counts.  It contains the count of each type of 
    line contained in the input document.  These counts can be used to filter 
    the tests if desired.

    The line types are::

        blank
        comment
        dict item
        inline dict
        inline list
        key item
        list item
        string item
        unrecognized

    As an example, if the implementation you are testing is limited to Minimal 
    NestedText, you can filter out tests that use the more advance features of 
    NestedText by simply ignoring those test cases that contain *inline dict*, 
    *inline list* or *key item*.