rustling 0.8.0

A blazingly fast library for computational linguistics
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
.. _chat_transcriptions:

Transcriptions and Annotations
==============================

Conversational data formatted in CHAT provides transcriptions with rich
annotations for both linguistic and extra-linguistic information.
``rustling.chat`` is designed to extract data and annotations in CHAT and expose them
in Python data structures for flexible data analyses and modeling work.
This page explains how ``rustling.chat`` represents CHAT data and annotations.

CHAT Format
-----------

To see how the CHAT format translates to ``rustling.chat``, let's look at the very first
two utterances in Eve's data in the American English
`Brown <https://childes.talkbank.org/access/Eng-NA/Brown.html>`_
dataset on CHILDES (data file: ``Brown/Eve/010600a.cha``),
where apparently Eve demands cookies in the first utterance
and her mother responds with a question for confirmation in the second utterance:

.. code-block::

    *CHI:	more cookie . [+ IMP]
    %mor:	adj|more-Cmp-S1 noun|cookie .
    %gra:	1|2|AMOD 2|2|ROOT 3|2|PUNCT
    %int:	distinctive , loud
    *MOT:	you 0v more cookies ?
    %mor:	pron|you-Prs-Acc-S2 adj|more-Cmp-S1 noun|cookie-Plur ?
    %gra:	1|3|NSUBJ 2|3|AMOD 3|3|ROOT 4|3|PUNCT

``rustling.chat`` handles CHAT data by paying attention to the following:

* **Participants:**
  The two participants are ``CHI`` and ``MOT``.
  In CHILDES, it is customary to denote the target child (i.e., Eve in this example)
  by ``CHI`` and the child's mother by ``MOT``.
  The asterisk ``*`` that comes just before the participant code signals
  a transcription line, known as the main tier in CHAT.
  Each utterance must begin with this main tier.

* **Transcriptions:**
  The two main tiers are ``more cookie . [+ IMP]`` from Eve
  and ``you 0v more cookies ?`` from her mother.
  The transcriptions are word-segmented by spaces
  (even for languages that don't have such orthographic conventions as English does).
  Punctuation marks are also treated as "words".
  Annotations such as ``[+ IMP]`` and ``0v`` here can be found in transcriptions.

* **Dependent tiers:**
  Between one utterance and the next, there are often what's known
  as dependent tiers, signaled by ``%`` and
  associated with the transcription line just immediately above;
  Eve's utterance has the dependent tiers ``%mor``
  (morphological information), ``%gra`` (grammatical relations),
  and ``%int`` (intonation),
  whereas Eve's mother's has only ``%mor`` and ``%gra``.
  Although certain dependent tiers are more standardized and more commonly found
  in CHILDES datasets (especially ``%mor`` and ``%gra``),
  none of the dependent tiers are obligatory in a CHAT utterance.

* **The %mor tier:**
  The morphological information aligns one-to-one to the segmented words
  (including punctuation marks) in the main tier.;
  Annotations in the main tier are ignored.
  In each item of ``%mor``, the part-of-speech tag is on the left of the pipe ``|``,
  e.g., ``adj`` for an adjective in ``adj|more-Cmp-S1`` aligned to ``more`` in Eve's utterance.
  Inflectional and derivational information is on the right of ``|``,
  e.g., ``you-Prs-Acc-S2`` for the second-person, singular, accusative, personal pronoun
  in ``pron|you-Prs-Acc-S2`` aligned to ``you`` in Eve's mother's line.

* **The %gra tier:**
  CHAT represents grammatical relations in terms of heads and dependents in
  dependency grammar.
  Every item on the ``%gra`` tier corresponds one-to-one to the segmented words
  in the transcription (and therefore one-to-one to the ``%mor`` items as well).
  In Eve's mother's ``%gra``, ``2|3|AMOD`` means ``more`` at position 2 of the utterance
  is a dependent of the word ``cookies`` at position 3 as the head,
  and that the relation is one of adjectival modification.

* **Other tiers:**
  Apart from ``%mor`` and ``%gra``, other dependent tiers may appear in CHAT data files.
  Some of them contain more linguistic information, e.g., ``%int`` for intonation
  in Eve's utterance here, and others contain contextual information about the
  utterance or recording session.
  Many of these tiers are used only as needed (``%int`` not used in Eve's mother's
  utterance in this example).

Once you have a :class:`~rustling.chat.CHAT` object,
several methods are available for accessing the transcriptions and annotations.
Which method suits your need best depends on which level of information you need.
The following sections introduce these :class:`~rustling.chat.CHAT` methods.
As an example, let's work with the
`Brown <https://childes.talkbank.org/access/Eng-NA/Brown.html>`_
dataset of American English on CHILDES
(see :ref:`chat_read` for how to download and read this dataset):

.. code-block:: python

    import rustling
    brown = rustling.read_chat("path/to/your/local/Brown.zip")


Filtering by File
-----------------

The Brown dataset contains data for the three children Adam, Eve, and Sarah.
Let's first take a look at how the Brown dataset is structure,
because we need to separate the children's data for analysis:

.. code-block:: python

    brown.n_files
    # 214
    brown.file_paths
    # ['Brown/Adam/020304.cha',
    #  'Brown/Adam/020318.cha',
    #  ...
    #  'Brown/Eve/010600a.cha',
    #  'Brown/Eve/010600b.cha',
    #  ...
    #  'Brown/Sarah/020305.cha',
    #  'Brown/Sarah/020307.cha',
    #  ...]

The three children's data is organized in subdirectories under their respective name.
The :meth:`~rustling.chat.CHAT.filter` method can be used to create a new :class:`~rustling.chat.CHAT`
from the data matching a subdirectory path:

.. code-block:: python

    eve = brown.filter(files="Eve")
    eve.n_files
    # 20
    eve.head()
    # *CHI:  more             cookie       .
    # %mor:  adj|more-Cmp-S1  noun|cookie  .
    # %gra:  1|2|AMOD         2|2|ROOT     3|2|PUNCT
    # %int:  distinctive , loud

    # *MOT:  you                  more             cookies           ?
    # %mor:  pron|you-Prs-Acc-S2  adj|more-Cmp-S1  noun|cookie-Plur  ?
    # %gra:  1|3|NSUBJ            2|3|AMOD         3|3|ROOT          4|3|PUNCT

    # *MOT:  how_about      another              graham        cracker       ?
    # %mor:  intj|howabout  det|another-Def-Ind  noun|graham   noun|cracker  ?
    # %gra:  1|4|DISCOURSE  2|4|DET              3|4|COMPOUND  4|4|ROOT      5|4|PUNCT

    # *MOT:  would            that           do             just        as          well       ?
    # %mor:  aux|would-Fin-S  pron|that-Dem  verb|do-Inf-S  adv|just    adv|as      adv|well   ?
    # %gra:  1|3|AUX          2|3|NSUBJ      3|6|ROOT       4|5|ADVMOD  5|3|ADVMOD  6|5|FIXED  7|3|PUNCT

    # *MOT:  here      .
    # %mor:  adv|here  .
    # %gra:  1|1|ROOT  2|1|PUNCT

The string ``"Eve"`` appears in the file paths for Eve's data,
which is what we've passed in to the ``files`` keyword argument of :meth:`~rustling.chat.CHAT.filter`
for filtering. There are 20 CHAT data files for Eve in Brown.


Filtering by Participant
------------------------

To filter by participant, use the ``participants`` keyword argument.
Let's further filter ``eve`` into child speech and child-directed speech:

.. code-block:: python

    eve_chi = eve.filter(participants="CHI")  # child speech
    eve_chi.head()
    # *CHI:  more             cookie       .
    # %mor:  adj|more-Cmp-S1  noun|cookie  .
    # %gra:  1|2|AMOD         2|2|ROOT     3|2|PUNCT
    # %int:  distinctive , loud

    # *CHI:  more             cookie       .
    # %mor:  adj|more-Cmp-S1  noun|cookie  .
    # %gra:  1|2|AMOD         2|2|ROOT     3|2|PUNCT
    # %int:  distinctive , loud

    # *CHI:  more             juice       ?
    # %mor:  adj|more-Cmp-S1  noun|juice  ?
    # %gra:  1|2|AMOD         2|2|ROOT    3|2|PUNCT

    # *CHI:  Fraser        .
    # %mor:  propn|Fraser  .
    # %gra:  1|1|ROOT      2|1|PUNCT
    # %com:  pronounces Fraser as fr&jdij .

    # *CHI:  Fraser        .
    # %mor:  propn|Fraser  .
    # %gra:  1|1|ROOT      2|1|PUNCT


    eve_cds = eve.filter(participants="^(?!CHI$)")  # child-directed speech, regex ^(?!CHI$) for "not CHI"
    eve_cds.head()
    # *MOT:  you                  more             cookies           ?
    # %mor:  pron|you-Prs-Acc-S2  adj|more-Cmp-S1  noun|cookie-Plur  ?
    # %gra:  1|3|NSUBJ            2|3|AMOD         3|3|ROOT          4|3|PUNCT

    # *MOT:  how_about      another              graham        cracker       ?
    # %mor:  intj|howabout  det|another-Def-Ind  noun|graham   noun|cracker  ?
    # %gra:  1|4|DISCOURSE  2|4|DET              3|4|COMPOUND  4|4|ROOT      5|4|PUNCT

    # *MOT:  would            that           do             just        as          well       ?
    # %mor:  aux|would-Fin-S  pron|that-Dem  verb|do-Inf-S  adv|just    adv|as      adv|well   ?
    # %gra:  1|3|AUX          2|3|NSUBJ      3|6|ROOT       4|5|ADVMOD  5|3|ADVMOD  6|5|FIXED  7|3|PUNCT

    # *MOT:  here      .
    # %mor:  adv|here  .
    # %gra:  1|1|ROOT  2|1|PUNCT

    # *MOT:  here      you                  go                       .
    # %mor:  adv|here  pron|you-Prs-Nom-S2  verb|go-Fin-Ind-Pres-S2  .
    # %gra:  1|3|ROOT  2|3|NSUBJ            3|1|ADVCL-RELCL          4|1|PUNCT

The ``participants`` argument of :meth:`~rustling.chat.CHAT.filter` supports
regex matching (which is also true for the ``files`` argument, though not illustrated here).
We've taken advantage of this capability to filter Eve's data down to
child-directed speech, by the regular expression ``"^(?!CHI$)"``
for "not CHI".


Words
-----

The :class:`~rustling.chat.CHAT` method :meth:`~rustling.chat.CHAT.words`
returns the transcriptions as segmented words.

Calling :meth:`~rustling.chat.CHAT.words` with no arguments gives a
flat list of all the words:

.. code-block:: python

    eve_chi.words()[:9]
    # ['more', 'cookie', '.', 'more', 'cookie', '.', 'more', 'juice', '?']
    len(eve_chi.words())
    # 44119
    eve_cds.words()[:9]
    # ['you', 'more', 'cookies', '?', 'how_about', 'another', 'graham', 'cracker', '?']
    len(eve_cds.words())
    # 76198

To preserve the utterance-level structure, pass in ``by_utterance=True``
so that an inner list is created around the words from each utterance:

.. code-block:: python

    eve_chi.words(by_utterance=True)[:5]
    # [['more', 'cookie', '.'],
    #  ['more', 'cookie', '.'],
    #  ['more', 'juice', '?'],
    #  ['Fraser', '.'],
    #  ['Fraser', '.']]
    len(eve_chi.words(by_utterance=True))
    # 12113
    eve_cds.words(by_utterance=True)[:5]
    # [['you', 'more', 'cookies', '?'],
    #  ['how_about', 'another', 'graham', 'cracker', '?'],
    #  ['would', 'that', 'do', 'just', 'as', 'well', '?'],
    #  ['here', '.'],
    #  ['here', 'you', 'go', '.']]
    len(eve_cds.words(by_utterance=True))
    # 14807

Eve's data comes from 20 CHAT data files.
To get the file-level structure, pass in ``by_file=True``.
Each inner list contains the flat words from one file:

.. code-block:: python

    eve_chi_by_file = eve_chi.words(by_file=True)
    len(eve_chi_by_file)
    # 20
    eve_chi_by_file[0][:9]
    # ['more', 'cookie', '.', 'more', 'cookie', '.', 'more', 'juice', '?']
    eve_cds_by_file = eve_cds.words(by_file=True)
    len(eve_cds_by_file)
    # 20
    eve_cds_by_file[0][:9]
    # ['you', 'more', 'cookies', '?', 'how_about', 'another', 'graham', 'cracker', '?']

Passing both ``by_utterance=True`` and ``by_file=True`` gives a list of files,
where each file is a list of utterances, and each utterance is a list of words:

.. code-block:: python

    eve_chi_both = eve_chi.words(by_utterance=True, by_file=True)
    len(eve_chi_both)
    # 20
    len(eve_chi_both[0])
    # 741
    eve_chi_both[0][:5]
    # [['more', 'cookie', '.'],
    #  ['more', 'cookie', '.'],
    #  ['more', 'juice', '?'],
    #  ['Fraser', '.'],
    #  ['Fraser', '.']]
    eve_cds_both = eve_cds.words(by_utterance=True, by_file=True)
    len(eve_cds_both)
    # 20
    len(eve_cds_both[0])
    # 847
    eve_cds_both[0][:5]
    # [['you', 'more', 'cookies', '?'],
    #  ['how_about', 'another', 'graham', 'cracker', '?'],
    #  ['would', 'that', 'do', 'just', 'as', 'well', '?'],
    #  ['here', '.'],
    #  ['here', 'you', 'go', '.']]


Tokens
------

While :meth:`~rustling.chat.CHAT.words` gives you transcriptions as plain strings,
:meth:`~rustling.chat.CHAT.tokens` gives you the ``%mor`` and ``%gra``
annotations bundled with each word:

.. code-block:: python

    eve_chi.tokens()[:3]
    # [Token(word='more', pos='adj', mor='more-Cmp-S1', gra=Gra(dep=1, head=2, rel='AMOD')),
    #  Token(word='cookie', pos='noun', mor='cookie', gra=Gra(dep=2, head=2, rel='ROOT')),
    #  Token(word='.', pos='', mor='.', gra=Gra(dep=3, head=2, rel='PUNCT'))]

Each element is a :class:`~rustling.chat.Token` object
with the attributes ``word``, ``pos``, ``mor``, and ``gra``:

.. code-block:: python

    first_token = eve_chi.tokens()[0]
    first_token.word
    # 'more'
    first_token.pos
    # 'adj'
    first_token.mor
    # 'more-Cmp-S1'
    first_token.gra
    # Gra(dep=1, head=2, rel='AMOD')

The ``gra`` attribute is a :class:`~rustling.chat.Gra` object,
with the attributes
``dep`` (the position of the word in the utterance),
``head`` (position of the head word),
and ``rel`` (the grammatical relation):

.. code-block:: python

    first_token.gra.dep
    # 1
    first_token.gra.head
    # 2
    first_token.gra.rel
    # 'AMOD'

Like :meth:`~rustling.chat.CHAT.words`,
:meth:`~rustling.chat.CHAT.tokens` also accepts
``by_utterance`` and ``by_file`` to organize the results
at the utterance and file level, respectively.

Clitics
^^^^^^^

In CHAT, clitics are morphemes that attach to a host word but carry their own
part-of-speech and morphological information on the ``%mor`` tier.
Postclitics are marked with ``~`` and preclitics with ``$``.
For example, the contraction *that's* is annotated as
``pro:dem|that~cop|be&3S`` -- the demonstrative pronoun *that* followed by
the postclitic copula *be*.
When ``rustling.chat`` parses such forms, the host word's :class:`~rustling.chat.Token`
receives the transcribed word (e.g., ``"that's"``), while clitic tokens
get an empty string for their ``word`` attribute but retain their ``pos``,
``mor``, and ``gra`` annotations.
This means the number of tokens in an utterance can exceed the number of words,
because each clitic produces its own :class:`~rustling.chat.Token`:

.. code-block:: python

    from rustling.chat import CHAT

    # "that's good ." with %mor: pro:dem|that~cop|be&3S adj|good .
    chat_str = (
        "@UTF8\n@Begin\n"
        "@Participants:\tCHI Target_Child\n"
        "*CHI:\tthat's good .\n"
        "%mor:\tpro:dem|that~cop|be&3S adj|good .\n"
        "@End\n"
    )
    reader = CHAT.from_strs([chat_str])
    tokens = reader.tokens(by_utterance=True)[0]
    len(tokens)
    # 4 (three words, but four tokens because of the postclitic)
    tokens[0].word, tokens[0].pos
    # ("that's", 'pro:dem')
    tokens[1].word, tokens[1].pos
    # ('', 'cop')           # postclitic: empty word, but POS is retained
    tokens[2].word, tokens[2].pos
    # ('good', 'adj')
    tokens[3].word, tokens[3].pos
    # ('.', '')


Utterances
----------

The :meth:`~rustling.chat.CHAT.utterances` method returns
:class:`~rustling.chat.Utterance` objects that bundle together
the participant, tokens, original tiers, and time marks for each utterance:

.. code-block:: python

    eve_chi.utterances()[0]
    # Utterance(participant='CHI', tokens=[...3 tokens], time_marks=None)

Each :class:`~rustling.chat.Utterance` object has the following attributes:

* ``participant`` -- the speaker code (e.g., ``'CHI'``, ``'MOT'``).
* ``tokens`` -- a list of :class:`~rustling.chat.Token` objects,
  the same kind introduced in the Tokens section above.
* ``audible`` -- the audibly faithful transcription of this utterance,
  with CHAT coding conventions stripped out while preserving
  repetitions and retracings as they were heard; ``None`` for changeable headers.
* ``tiers`` -- a dictionary of the original, unparsed tier lines.
* ``time_marks`` -- a tuple of ``(start, end)`` in milliseconds, or ``None``.
* ``changeable_header`` -- a :class:`~rustling.chat.ChangeableHeader` object
  if this entry is a mid-file header, or ``None`` for regular utterances.

Let's inspect these attributes on the first utterance of Eve's child speech:

.. code-block:: python

    u = eve_chi.utterances()[0]
    u.participant
    # 'CHI'
    u.tokens
    # [Token(word='more', pos='adj', mor='more-Cmp-S1', gra=Gra(dep=1, head=2, rel='AMOD')),
    #  Token(word='cookie', pos='noun', mor='cookie', gra=Gra(dep=2, head=2, rel='ROOT')),
    #  Token(word='.', pos='', mor='.', gra=Gra(dep=3, head=2, rel='PUNCT'))]
    u.tokens[0].word
    # 'more'
    u.tokens[0].pos
    # 'adj'
    u.audible
    # 'more cookie .'
    u.time_marks is None
    # True

The ``tokens`` here are exactly the same :class:`~rustling.chat.Token` objects
returned by :meth:`~rustling.chat.CHAT.tokens` --
each with the ``word``, ``pos``, ``mor``, and ``gra`` attributes
as described in the Tokens section above.

Like :meth:`~rustling.chat.CHAT.words` and :meth:`~rustling.chat.CHAT.tokens`,
:meth:`~rustling.chat.CHAT.utterances` accepts ``by_file``
to organize the results at the file level:

.. code-block:: python

    len(eve_chi.utterances())  # number of utterances, in Eve's child speech data
    # 12113
    eve_chi_by_file = eve_chi.utterances(by_file=True)
    len(eve_chi_by_file)  # number of files, in Eve's child speech data
    # 20
    len(eve_chi_by_file[0])  # number of utterances in the 1st file of Eve's child speech data
    # 741


Audibly Faithful Transcription
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``audible`` attribute of an :class:`~rustling.chat.Utterance` object
gives you a transcription that faithfully represents what was audibly spoken,
with CHAT coding conventions (e.g., ``[+ IMP]``) stripped out
while preserving repetitions and retracings as they were heard:

.. code-block:: python

    from rustling.chat import CHAT

    # Repetitions marked with [x N] are expanded:
    data1 = CHAT.from_strs(["*CHI:\tno [x 3] ."])
    data1.utterances()[0].audible
    # 'no no no .'

    # Retracings are kept as spoken:
    data2 = CHAT.from_strs(["*CHI:\tI want [/] I want cookie ."])
    data2.utterances()[0].audible
    # 'I want I want cookie .'

This transcription is useful for tasks where the goal is to model
the actual speech signal, such as automatic speech recognition (ASR)
and forced alignment, where to the extent possible the text matches what was audibly produced.
For changeable header entries, ``audible`` is ``None``.


Changeable Headers
^^^^^^^^^^^^^^^^^^

CHAT data files can contain mid-file headers marked by ``@`` in the source data,
such as ``@G``, ``@Comment``, ``@Date``, and ``@Situation``.
These "changeable headers" (as they're called in the official CHAT documentation)
signal metadata changes within a recording session
(as opposed to the file-level headers that appear at the top of a CHAT file).

When :meth:`~rustling.chat.CHAT.utterances` encounters a mid-file header,
it includes it in the returned list as an :class:`~rustling.chat.Utterance` object
whose ``changeable_header`` attribute is set
(while ``participant``, ``tokens``, and ``tiers`` are all ``None``):

.. code-block:: python

    eve = brown.filter(files="Eve")
    utts = eve.utterances()
    headers = [u for u in utts if u.changeable_header is not None]
    len(headers)
    # 49
    h = headers[0]
    h.changeable_header
    # <builtins.ChangeableHeader_Date object at ...>
    h.changeable_header.value
    # '17-OCT-1962'
    h.participant is None
    # True
    h.tokens is None
    # True
    h.tiers is None
    # True

You can use ``isinstance`` with :class:`~rustling.chat.ChangeableHeader` variants
to classify the headers you find.
For example, to collect all dates, comments, and situations from Eve's data:

.. code-block:: python

    from rustling.chat import ChangeableHeader
    found_dates = []
    found_comments = []
    found_situations = []
    for u in eve.utterances():
        if u.changeable_header is not None:
            ch = u.changeable_header
            if isinstance(ch, ChangeableHeader.Date):
                found_dates.append(ch.value)
            elif isinstance(ch, ChangeableHeader.Comment):
                found_comments.append(ch.value)
            elif isinstance(ch, ChangeableHeader.Situation):
                found_situations.append(ch.value)
    found_dates[:5]
    # ['17-OCT-1962', '31-OCT-1962', '28-NOV-1962', '10-DEC-1962', '12-DEC-1962']
    found_comments[:3]
    # ['end of episode', '15:00-16:00', '30-JAN-1963 , 10:45-11:45']
    found_situations[:2]
    # ['Eve is playing with large wooden beads. she sorts them by colors , although she often fails to use color names appropriately.',
    #  'Father is going to have apple']


Time Marks
^^^^^^^^^^

Many of the more recent CHILDES datasets (especially starting from the 1990s)
come with digitized audio and video data associated with the text-based CHAT data files.
In these datasets, an utterance in the CHAT file has time marks to indicate
its start and end time (in milliseconds) in the corresponding audio and/or video data.
If the information is available, the ``time_marks`` attribute of an
:class:`~rustling.chat.Utterance` object is a tuple of two integers,
e.g., ``(0, 1073)``, for ``·0_1073·`` found at the end of the CHAT main tier.


Original Tiers
^^^^^^^^^^^^^^

You may sometimes need the original, unparsed transcription lines,
because they contain information (e.g., annotations for pauses) that is dropped
when :class:`~rustling.chat.Token` objects are constructed
from the cleaned-up words aligned with ``%mor`` and ``%gra``.
Or you may need access to other ``%`` tiers,
e.g., ``%int`` for intonation or ``%com`` for comments.
The ``tiers`` attribute of an :class:`~rustling.chat.Utterance` object
gives you a dictionary of all the original tiers of the utterance
for your custom needs:

.. code-block:: python

    u = eve_chi.utterances()[0]
    u.tiers
    # {'%gra': '1|2|AMOD 2|2|ROOT 3|2|PUNCT',
    #  '%int': 'distinctive , loud',
    #  'CHI': 'more cookie . [+ IMP]',
    #  '%mor': 'adj|more-Cmp-S1 noun|cookie .'}

The dictionary keys include the participant code (``'CHI'``) for the main tier
and the dependent tier names (``'%mor'``, ``'%gra'``, ``'%int'``, etc.).
Notice that the main tier retains the original transcription ``'more cookie . [+ IMP]'``,
including the ``[+ IMP]`` annotation that is not part of the parsed tokens.


.. _chat_from_utterances:

Creating a ``CHAT`` Object from ``Utterance`` Objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you have a list of :class:`~rustling.chat.Utterance` objects
(e.g., after filtering or transforming utterances programmatically),
you can construct a new :class:`~rustling.chat.CHAT` reader from them
using the :meth:`~rustling.chat.CHAT.from_utterances` classmethod.
The resulting reader behaves like any other :class:`~rustling.chat.CHAT` object,
so you can call :meth:`~rustling.chat.CHAT.words`, :meth:`~rustling.chat.CHAT.tokens`,
and other methods on it as usual:

.. code-block:: python

    eve_chi = eve.filter(participants="CHI")
    utts = eve_chi.utterances()

    # Create a new reader from the first 10 utterances
    subset = chat.CHAT.from_utterances(utts[:10])
    subset.words()[:9]
    # ['more', 'cookie', '.', 'more', 'cookie', '.', 'more', 'juice', '?']
    len(subset.utterances())
    # 10

    # Round-trip: reconstructing a reader preserves all data
    reconstructed = chat.CHAT.from_utterances(utts)
    reconstructed.words() == eve_chi.words()
    # True