pyoxidizer 0.24.0

Package self-contained Python applications
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
.. py:currentmodule:: starlark_pyoxidizer

===============
Technical Notes
===============

CPython Initialization
======================

Most code lives in ``pylifecycle.c``.

Call tree with Python 3.7::

    ``Py_Initialize()``
      ``Py_InitializeEx()``
        ``_Py_InitializeFromConfig(_PyCoreConfig config)``
          ``_Py_InitializeCore(PyInterpreterState, _PyCoreConfig)``
            Sets up allocators.
            ``_Py_InitializeCore_impl(PyInterpreterState, _PyCoreConfig)``
              Does most of the initialization.
              Runtime, new interpreter state, thread state, GIL, built-in types,
              Initializes sys module and sets up sys.modules.
              Initializes builtins module.
              ``_PyImport_Init()``
                Copies ``interp->builtins`` to ``interp->builtins_copy``.
              ``_PyImportHooks_Init()``
                Sets up ``sys.meta_path``, ``sys.path_importer_cache``,
                ``sys.path_hooks`` to empty data structures.
              ``initimport()``
                ``PyImport_ImportFrozenModule("_frozen_importlib")``
                ``PyImport_AddModule("_frozen_importlib")``
                ``interp->importlib = importlib``
                ``interp->import_func = interp->builtins.__import__``
                ``PyInit__imp()``
                  Initializes ``_imp`` module, which is implemented in C.
                ``sys.modules["_imp"} = imp``
                ``importlib._install(sys, _imp)``
                ``_PyImportZip_Init()``

          ``_Py_InitializeMainInterpreter(interp, _PyMainInterpreterConfig)``
            ``_PySys_EndInit()``
              ``sys.path = XXX``
              ``sys.executable = XXX``
              ``sys.prefix = XXX``
              ``sys.base_prefix = XXX``
              ``sys.exec_prefix = XXX``
              ``sys.base_exec_prefix = XXX``
              ``sys.argv = XXX``
              ``sys.warnoptions = XXX``
              ``sys._xoptions = XXX``
              ``sys.flags = XXX``
              ``sys.dont_write_bytecode = XXX``
            ``initexternalimport()``
              ``interp->importlib._install_external_importers()``
            ``initfsencoding()``
              ``_PyCodec_Lookup(Py_FilesystemDefaultEncoding)``
                ``_PyCodecRegistry_Init()``
                  ``interp->codec_search_path = []``
                  ``interp->codec_search_cache = {}``
                  ``interp->codec_error_registry = {}``
                  # This is the first non-frozen import during startup.
                  ``PyImport_ImportModuleNoBlock("encodings")``
                ``interp->codec_search_cache[codec_name]``
                ``for p in interp->codec_search_path: p[codec_name]``
            ``initsigs()``
            ``add_main_module()``
              ``PyImport_AddModule("__main__")``
            ``init_sys_streams()``
              ``PyImport_ImportModule("encodings.utf_8")``
              ``PyImport_ImportModule("encodings.latin_1")``
              ``PyImport_ImportModule("io")``
              Consults ``PYTHONIOENCODING`` and gets encoding and error mode.
              Sets up ``sys.__stdin__``, ``sys.__stdout__``, ``sys.__stderr__``.
            Sets warning options.
            Sets ``_PyRuntime.initialized``, which is what ``Py_IsInitialized()``
            returns.
            ``initsite()``
              ``PyImport_ImportModule("site")``

CPython Importing Mechanism
===========================

``Lib/importlib`` defines importing mechanisms and is 100% Python.

``Programs/_freeze_importlib.c`` is a program that takes a path to an input
``.py`` file and path to output ``.h`` file. It initializes a Python interpreter
and compiles the ``.py`` file to marshalled bytecode. It writes out a ``.h``
file with an inline ``const unsigned char _Py_M__importlib`` array containing
bytecode.

``Lib/importlib/_bootstrap_external.py`` compiled to
``Python/importlib_external.h`` with ``_Py_M__importlib_external[]``.

``Lib/importlib/_bootstrap.py`` compiled to
``Python/importlib.h`` with ``_Py_M__importlib[]``.

``Python/frozen.c`` has ``_PyImport_FrozenModules[]`` effectively mapping
``_frozen_importlib`` to ``importlib._bootstrap`` and
``_frozen_importlib_external`` to ``importlib._bootstrap_external``.

``initimport()`` calls ``PyImport_ImportFrozenModule("_frozen_importlib")``,
effectively ``import importlib._bootstrap``. Module import doesn't appear
to have meaningful side-effects.

``importlib._bootstrap.__import__`` is installed as ``interp->import_func``.

C implemented ``_imp`` module is initialized.

``importlib._bootstrap._install(sys, _imp`` is called. Calls
``_setup(sys, _imp)`` and adds ``BuiltinImporter`` and ``FrozenImporter``
to ``sys.meta_path``.

``_setup()`` defines globals ``_imp`` and ``sys``. Populates ``__name__``,
``__loader__``, ``__package__``, ``__spec__``, ``__path__``, ``__file__``,
``__cached__`` on all ``sys.modules`` entries. Also loads builtins
``_thread``, ``_warnings``, and ``_weakref``.

Later during interpreter initialization, ``initexternal()`` effectively calls
``importlib._bootstrap._install_external_importers()``. This runs
``import _frozen_importlib_external``, which is effectively
``import importlib._bootstrap_external``. This module handle is aliased to
``importlib._bootstrap._bootstrap_external``.

``importlib._bootstrap_external`` import doesn't appear to have significant
side-effects.

``importlib._bootstrap_external._install()`` is called with a reference to
``importlib._bootstrap``. ``_setup()`` is called.

``importlib._bootstrap._setup()`` imports builtins ``_io``, ``_warnings``,
``_builtins``, ``marshal``. Either ``posix`` or ``nt`` imported depending
on OS. Various module-level attributes set defining run-time environment.
This includes ``_winreg``. ``SOURCE_SUFFIXES`` and ``EXTENSION_SUFFIXES``
are updated accordingly.

``importlib._bootstrap._get_supported_file_loaders()`` returns various
loaders. ``ExtensionFileLoader`` configured from ``_imp.extension_suffixes()``.
``SourceFileLoader`` configured from ``SOURCE_SUFFIXES``.
``SourcelessFileLoader`` configured from ``BYTECODE_SUFFIXES``.

``FileFinder.path_hook()`` called with all loaders and result added to
``sys.path_hooks``. ``PathFinder`` added to ``sys.meta_path``.

``sys.modules`` After Interpreter Init
======================================

============================== ========== ================================
Module                         Type       Source
============================== ========== ================================
``__main__``                              ``add_main_module()``
``_abc``                       builtin    ``abc``
``_codecs``                    builtin    ``initfsencoding()``
``_frozen_importlib``          frozen     ``initimport()``
``_frozen_importlib_external`` frozen     ``initexternal()``
``_imp``                       builtin    ``initimport()``
``_io``                        builtin    ``importlib._bootstrap._setup()``
``_signal``                    builtin    ``initsigs()``
``_thread``                    builtin    ``importlib._bootstrap._setup()``
``_warnings``                  builtin    ``importlib._bootstrap._setup()``
``_weakref``                   builtin    ``importlib._bootstrap._setup()``
``_winreg``                    builtin    ``importlib._bootstrap._setup()``
``abc``                        py
``builtins``                   builtin    ``_Py_InitializeCore_impl()``
``codecs``                     py         ``encodings`` via ``initfsencoding()``
``encodings``                  py         ``initfsencoding()``
``encodings.aliases``          py         ``encodings``
``encodings.latin_1``          py         ``init_sys_streams()``
``encodings.utf_8``            py         ``init_sys_streams()`` + ``initfsencoding()``
``io``                         py         ``init_sys_streams()``
``marshal``                    builtin    ``importlib._bootstrap._setup()``
``nt``                         builtin    ``importlib._bootstrap._setup()``
``posix``                      builtin    ``importlib._bootstrap._setup()``
``readline``                   builtin
``sys``                        builtin    ``_Py_InitializeCore_impl()``
``zipimport``                  builtin    ``initimport()``
============================== ========== ================================

Modules Imported by ``site.py``
===============================

``_collections_abc``
``_sitebuiltins``
``_stat``
``atexit``
``genericpath``
``os``
``os.path``
``posixpath``
``rlcompleter``
``site``
``stat``

Random Notes
============

Frozen importer iterates an array looking for module names. On each item, it
calls ``_PyUnicode_EqualToASCIIString()``, which verifies the search name is
ASCII. Performing an O(n) scan for every frozen module if there are a large
number of frozen modules could contribute performance overhead. A better frozen
importer would use a map/hash/dict for lookups. This //may// require CPython
API breakages, as the ``PyImport_FrozenModules`` data structure is documented
as part of the public API and its value could be updated dynamically at
run-time.

``importlib._bootstrap`` cannot call ``import`` because the global import
hook isn't registered until after ``initimport()``.

``importlib._bootstrap_external`` is the best place to monkeypatch because
of the limited run-time functionality available during ``importlib._bootstrap``.

It's a bit wonky that ``Py_Initialize()`` will import modules from the
standard library and it doesn't appear possible to disable this. If
``site.py`` is disabled, non-extension builtins are limited to
``codecs``, ``encodings``, ``abc``, and whatever ``encodings.*`` modules
are needed by ``initfsencoding()`` and ``init_sys_streams()``.

An attempt was made to freeze the set of standard library modules loaded
during initialization. However, the built-in extension importer doesn't
set all of the module attributes that are expected of the modules system.
The ``from . import aliases`` in ``encodings/__init__.py`` is confused
without these attributes. And relative imports seemed to have issues as
well. One would think it would be possible to run an embedded interpreter
with all standard library modules frozen, but this doesn't work.

Desired Changes from Python to Aid PyOxidizer
=============================================

As part of implementing PyOxidizer, we've encountered numerous shortcomings
in Python that have made implementation more difficult. This section attempts
to capture those along with our desired outcomes.

General Lack of Clear Specifications
------------------------------------

PyOxidizer has had to implement a lot of low-level functionality, notably
around interpreter initialization and module/resource importing. We have
also had to reinvent aspects of packaging so it can be performed in Rust.

Various Python functionality is not defined in specifications. Rather, it
is defined by PEPs plus implementations in code. And when there are PEPs,
often there isn't a single PEP outlining the clear current state of the
world: many PEPs are stated like *builds on top of PEP XYZ*. Often the
only canonical source of how something works is the implementation in
code. And when there are questions for clarification, it isn't clear whether
code or a PEP is wrong because oftentimes there isn't a single PEP that
is the canonical source of truth.

It would be highly preferred for Python to publish clear specifications
for how various mechanisms work. A PEP would be a diff to a specification
(possibly creating a new specification) and a discussion around it. That
way there would be a clear specification that can be consulted as the
source of truth for how things should behave.

``__file__`` Ambiguity
----------------------

It isn't clear whether ``__file__`` is actually required and what all
is derived from existence of ``__file__``. It also isn't clear what
``__file__`` should be set to if it wouldn't be a concrete filesystem
path. Can ``__file__`` be virtual? Can it refer to a binary/archive
containing the module?

Semantics of ``__file__`` need more clarification.

``importlib.metadata`` Documentation Deficiencies
-------------------------------------------------

See https://bugs.python.org/issue38594.

``importlib`` Resources Directory Ambiguity
-------------------------------------------

See https://bugs.python.org/issue36128,
https://gitlab.com/python-devs/importlib_resources/issues/58,
and https://gitlab.com/python-devs/importlib_resources/-/issues/90.

Standardizing a Python Distribution Format
------------------------------------------

PyOxidizer consumes Python distributions and repackages them. e.g. it
takes an archive containing a Python executable, standard library,
support libraries, etc and transforms them into new binaries or
distributable artifacts.

There is no standard for representing a Python distribution. This is
something that PyOxidizer has had to invent itself via the
``python-build-standalone`` project and its ``PYTHON.json`` files.

Should Python have a standardized way of describing Python distribution
archives and should CPython distribute said distributions, it would make
PyOxidizer largely agnostic of the distributor flavor being consumed
and allow PyOxidizer (and other Python packaging tools) to more easily
target other distribution flavors. e.g. you could swap out CPython for
PyPy and tooling largely wouldn't care.

Ability to Install Meta Path Importers Before ``Py_Initialize()``
-----------------------------------------------------------------

``Py_Initialize()`` will import some standard library modules during
its execution. It does so using the default meta path importers available
to the distribution. This means that standard library modules must come
from the filesystem (``PathImporter``), frozen modules, built-in extension
modules, or zip files (via ``PathImporter``).

This restriction prevents importing the entirety of the standard library
from the binary containing Python, in effect preventing the use of
self-contained executables. PyOxidizer works around this by patching
the ``importlib._bootstrap`` and ``importlib._bootstrap_external`` source
code, compiling that to bytecode, and making said bytecode available as
a frozen module. The patched code (which runs as part of ``Py_Initialize()``)
installs a ``sys.meta_path`` importer which imports modules from memory.
This solution is extremely hacky, but is necessary to achieve single file
executables with all imports serviced from memory.

In order for this to work, PyOxidizer needs a copy of these ``importlib``
modules so it can patch them and compile them to bytecode. This is
problematic in some cases because e.g. the Windows embeddable Python
distributions ship only the bytecode of these modules in a ``pythonXY.zip``
file. So PyOxidizer needs to find the source code from another location
when consuming these distributions.

But patching the ``importlib`` bootstrap modules is hacky itself. It would
be better if PyOxidizer didn't need to do this at all. This could be
achieved by splitting up the interpreter initialization APIs to give embedding
applications the opportunity to muck with ``sys.meta_path`` before any
``import`` is performed. It could also be achieved by introducing an
initialization config option to somehow inject code at the right point
during startup to register the ``sys.meta_path`` importer. This
could be done by importing a named module (presumably serviced by the
frozen or built-in importer) and having that module run code to modify
``sys.meta_path`` as a side-effect of module evaluation at import time.
A variation would be to define a callable in said module to call after the
module is importer. Whatever the solution, there needs to be a way to
somehow inject a ``sys.meta_path`` importer before any ``import`` not
serviced by the frozen or built-in importers is performed.

Lacking Support for Statically Linked Builds
--------------------------------------------

Python really wants you to be using shared libraries for ``libpython``
and extension modules seem to strongly insist on this.

On Windows, there is no official Visual Studio project configuration
for static builds. Actually achieving one requires a lot of hacks to
the build system (see ``python-build-standalone`` project).

There is ~0 support for building statically linked extension modules
in packaging tools, from the build step itself all the way up to
distribution. (PyOxidizer's approach is to hack ``distutils`` to
record and save the object files that were compiled and then ``PyOxidizer``
manually links these object files into the final binary.)

To achieve a statically linked executable containing ``libpython`` and
extension modules, you effectively need to build everything from source.
And if you want to distribute that executable, you often need to build
with special toolchains to ensure binary portability.

There is tons of room for Python to better support static linking.
A possible good place to start would be for packaging tools to support
building extension modules which don't rely on a dynamic ``libpython``.
If artifacts containing the raw object files designed for static
linking were made available on PyPI, PyOxidizer could download
pre-built binaries and link them directly into an executable or custom
``libpython``. This would avoid having to recompile said extension
modules at repackaging time. The compatibility guarantees would likely
look a lot like existing binary wheels.

On a related front, it would be nice if musl libc based binary wheels were
standardized. There are some concerns about the performance and compatibility
of musl libc when it comes to Python. But musl libc is a valid deploy
target nonetheless and it would be nice if Python officially supported
it. (FWIW the performance concerns seem to stem from memory allocator
performance and PyOxidizer supports using jemalloc as the allocator,
bypassing this problem.)

Windows Embeddable Distributions Missing Functionality
------------------------------------------------------

The Windows embeddable zip file distributions of CPython are missing
certain functionality.

The distributions do not contain source code for Python modules in the
standard library. This means PyOxidizer can't easily bundle sources from
these distributions.

The ``ensurepip`` module is not present in the distribution. So there is
no way to install ``pip`` using the distribution itself.

The ``venv`` module is also not present in the distribution. So there's
no way to create virtualenvs using the distribution itself.

The Python C development headers are not part of the distribution, so
even if you install packaging tools, you can't build C extensions.

Extension Module / Shared Library Filename Ambiguity
----------------------------------------------------

On some platforms, Python extension modules and shared libraries have
the same filename extension. e.g. on Linux, both are named ``foo.so``.

PyOxidizer's packaging functionality needs to classify files as
specific resource types (source modules, bytecode modules, resource
files, extension modules, shared libraries, etc). Because certain file
patterns (like ``.so``) are ambiguous, PyOxidizer cannot perform this
classification trivially.

It would be much preferred if there were unique file extensions that
distinguished Python extension modules from regular shared libraries.

On Windows, this is already the case with the ``.pyd`` extension.
However, POSIX architectures aren't so fortunate.

Ambiguous File Classification
-----------------------------

This is somewhat related to the previous section but is more generic.

Python's default path-based importer dynamically looks for presence
of various files on the filesystem and loads the first type variant
(extension module, bytecode, source, etc) discovered.

PyOxidizer's importer indexes resources during packaging and its
import-time resource resolution is static: the type of resource is
baked into the definition of the resource.

These approaches are somewhat at odds with each other. The path-based
importer is dynamic in nature: it defers answering questions until
a specific resource is requested. PyOxidizer's importer is static /
pre-compiled: it must classify a resource based on its filename/path
so it can bake that knowledge into an immutable data structure. It
does not have knowledge of what names will be requested at run-time.

Bridging this divide has revealed various ambiguities and corner cases
in the filenames of Python resources.

The Python extension module or shared library ambiguity is described
above.

There is also an ambiguity with extra files that aren't part of
a known Python package. If you attempt to classify every file in
a ``sys.path`` directory, it is tempting to classify a file as a
Python module (``.py``, ``.pyc``, or extension module), package
resource (``importlib.resources``), or package metadata (e.g.
``.dist-info`` files accessed via ``importlib.metadata``). However,
there exists the possibility that a file is not obviously classified
as one of these.

For example, a file ``foo/libfoo.so`` without the presence of a
``foo/__init__.py`` file is ambiguous. We could say this is an
extension module (``foo.libfoo``) due to the extension module
shared library ambiguity. We could also consider this a package
resource ``foo:libfoo.so`` or ``"":foo/libfoo.so``. Although the
latter case of using an empty string for the package name doesn't
make much sense. And we arguably shouldn't consider it a resource
of ``foo`` because no obvious ``foo`` Python package exists!

This is relevant in the real world because various Python packages
rely on installing arbitrary files in ``sys.path`` directories.
For example, ``numpy`` installs files like
``numpy.libs/libz-eb09ad1d.so.1.2.3``, where the ``numpy.libs``
directory only contains file extensions ``*.so[.*]``. Note that
this example is particularly confusing because the directory names
in ``sys.path`` directories are typically split on ``.`` and
correspond to Python [sub-]packages.

Because there is no unambiguous way to classify all files in
a ``sys.path`` directory and because Python packaging tools allow
the presence of files not contained within a known Python package
(identified by the presence of an ``__init__`` file/module), this
externalizes the requirement to introduce an *other* classification
of files. And because a specific file can't easily be classified
as a specific type, this effectively prevents the use of *resource*
loading techniques not involving explicit filesystem I/O without
significant smarts. I.e. because PyOxidizer cannot easily
unambiguously identify file X as a specific type, it is forced to
materialize that file at a similar location on the run-time system.
However, if runtimes like PyOxidizer were able to identify the
type of a file by its file extension and/or presence of other files,
it would know exactly how to load/treat the file at run-time without
having to resort to heuristics.

This ambiguity effectively means that PyOxidizer needs to:

* Determine if a file is a shared library or not (because shared
  libraries are treated specially and we can't unambiguously identify
  a shared library from its file extension).
* Examine symbols within shared libraries to see if a Python extension
  module is present (via presence of ``PyInit_*`` symbols).
* Preserve *extra* files not present in a Python package. (In the case
  of numpy, there are no *obvious* links to the shared libraries in the
  ``numpy.libs`` directory: this relative path is encoded within the
  extension module shared library via e.g. ``DT_NEEDED``.)

The most robust mitigation to this ambiguity is for all files
associated with an installable Python package/distribution to be
annotated with their type and for Python package installers to refuse
to process files that aren't identified. This could be achieved by
having a ``.dist-info/`` file annotating the *role* of each file.

Push Harder for Wheels
----------------------

Wheels are superior for Python packaging distribution because they
are more *static* and follow a finite set of rules for how they
should be installed. In theory, one could write code to install a
wheel in any programming language. Non-wheel distributions, however,
are a different matter entirely. A ``.tar.gz`` source distribution
often relies on running a ``setup.py`` file, which requires a Python
interpreter.

In the ideal world, PyOxidizer doesn't care about how a package is
built: just the files that comprise the installed package. So wheels
are a more desirable distribution format. In fact, PyOxidizer has
Rust code for extracting wheels and repackaging their contents: no
Python necessary. This means PyOxidizer can do things like download
wheels targeting non-native architectures and it *just works*.

As good as wheels are, they are universal in Python land. There are
tons of packages that don't have wheel distributions and continue to
offer the older ``.tar.gz`` distribution format.

We would like to see a concerted effort to push harder for the
presence of wheels. For example, PyPI could encourage/nag package
maintainers to upload wheels.

No Way to Hook ``open()``
-------------------------

``oxidized_importer`` wants to load Python modules and resource data
from memory, without using files.

There is a convention of using virtual paths to express paths within
some other entity. e.g. the zip importer uses ``/path/to/archive.zip/foo.py``
refers to the path ``foo.py`` within the ``/path/to/archive.zip`` zip file.
It is also common to use the current executable's path to refer to
entities within the current executable. e.g. ``/path/to/myapp/foo.py``
would refer to a ``foo.py`` somehow embedded in the ``/path/to/myapp``
executable.

These virtual paths are a great idea. You can even implement ``pathlib.Path``
around these paths and have a custom ``Path.open()`` that does custom I/O.
However, it is really easy for these paths to *leak* and to get fed in to
``io.open()`` or similar APIs for operating on filesystem paths. For example,
someone does ``open(foo.__path__, "rb")`` instead of ``foo.__path__.open("rb")``.
If this happens, you'll likely get an I/O error since virtual paths aren't
real filesystem paths.

It would be really nice if Python had some abstraction around filesystem
I/O that allowed custom paths to be registered. This is what schemes in URIs
and URLs are for. e.g. ``file:///path/to/file``. However, schemes aren't
paths per se. So if we want to preserve compatibility with a path based
API and allow ``io.open()`` to work with virtual paths, we need a mechanism
to register a hook to intercept ``io.open()`` (and possibly other I/O
operations like ``stat()``) so we can plumb in a custom I/O implementation.

PEP 578 almost does this with ``PyFile_SetOpenCodeHook()`` and the
``io.open_code()`` mechanism. But ``io.open_code()`` is only for a limited
use case and isn't generally usable.

``sys.executable`` is a String Instead of List
----------------------------------------------

Python applications often want to invoke a new Python interpreter process.
Generally, you use ``sys.executable`` to find the filesystem path to
``python`` then run that executable.

This is all fine for traditional Python interpreter install layouts that have
a ``python`` executable. However, in embedded contexts, there may not be
a ``python`` executable. Rather, the application embedding Python may provide
a more advanced way to invoke a Python interpreter. e.g. ``myapp python
<interpreter arguments>``.

Since ``sys.executable`` is a string and is often fed directly into ``exec()``,
it isn't possible to express a multi-argument *run a Python interpreter* value
through ``sys.executable``.

To do this robustly while maintaining backwards compatibility, we need a new
attribute somewhere that defines a list of arguments for invoking a Python
interpreter. In traditional Python install environments, this would be
``[sys.executable]``.

This idea was proposed at
https://mail.python.org/archives/list/python-ideas@python.org/thread/O66N56PB4U6AGICGBSRFD2OWA5JWMFC6/#O66N56PB4U6AGICGBSRFD2OWA5JWMFC6.