pyembed/technotes.rs
1// This Source Code Form is subject to the terms of the Mozilla Public
2// License, v. 2.0. If a copy of the MPL was not distributed with this
3// file, You can obtain one at https://mozilla.org/MPL/2.0/.
4
5/*!
6Technical Implementation Notes
7
8When trying to understand the code, a good place to start is
9`MainPythonInterpreter.new()`, as this will initialize the CPython runtime and
10Python initialization is where most of the magic occurs.
11
12A lot of initialization code revolves around mapping
13`OxidizedPythonInterpreterConfig` members to C API calls. This functionality is
14rather straightforward. There's nothing really novel or complicated here. So
15we won't cover it.
16
17# Python Memory Allocators
18
19There exist several
20[CPython APIs for memory management](https://docs.python.org/3/c-api/memory.html).
21CPython defines multiple memory allocator *domains* and it is possible to
22use a custom memory allocator for each using the `PyMem_SetAllocator()` API.
23
24See the documentation in the `pyalloc` module for more on this topic.
25
26# Module Importing
27
28The module importing mechanisms provided by this crate are one of the
29most complicated parts of the crate. This section aims to explain how it
30works. But before we go into the technical details, we need an understanding
31of how Python module importing works.
32
33## High Level Python Importing Overview
34
35A *meta path importer* is a Python object implementing
36the [importlib.abc.MetaPathFinder](https://docs.python.org/3.7/library/importlib.html#importlib.abc.MetaPathFinder)
37interface and is registered on [sys.meta_path](https://docs.python.org/3.7/library/sys.html#sys.meta_path).
38Essentially, when the `__import__` function / `import` statement is called,
39Python's importing internals traverse entities in `sys.meta_path` and
40ask each *finder* to load a module. The first *meta path importer* that knows
41about the module is used.
42
43By default, Python configures 3 *meta path importers*: an importer for
44built-in extension modules (`BuiltinImporter`), frozen modules
45(`FrozenImporter`), and filesystem-based modules (`PathFinder`). You can
46see these on a fresh Python interpreter:
47
48```text
49 $ python3.7 -c 'import sys; print(sys.meta_path)`
50 [<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
51```
52
53These types are all implemented in Python code in the Python standard
54library, specifically in the `importlib._bootstrap` and
55`importlib._bootstrap_external` modules.
56
57Built-in extension modules are compiled into the Python library. These are often
58extension modules required by core Python (such as the `_codecs`, `_io`, and
59`_signal` modules). But it is possible for other extensions - such as those
60provided by Python's standard library or 3rd party packages - to exist as
61built-in extension modules as well.
62
63For importing built-in extension modules, there's a global `PyImport_Inittab`
64array containing members defining the extension/module name and a pointer to
65its C initialization function. There are undocumented functions exported to
66Python (such as `_imp.exec_builtin()` that allow Python code to call into C code
67which knows how to e.g. instantiate these extension modules. The
68`BuiltinImporter` calls into these C-backed functions to service imports of
69built-in extension modules.
70
71Frozen modules are Python modules that have their bytecode backed by memory.
72There is a global `PyImport_FrozenModules` array that - like
73`PyImport_Inittab` - defines module names and a pointer to bytecode data. The
74`FrozenImporter` calls into undocumented C functions exported to Python to try
75to service import requests for frozen modules.
76
77Path-based module loading via the `PathFinder` meta path importer is what
78most people are likely familiar with. It uses `sys.path` and a handful of
79other settings to traverse filesystem paths, looking for modules in those
80locations. e.g. if `sys.path` contains
81`['', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/lib/python3/dist-packages']`,
82`PathFinder` will look for `.py`, `.pyc`, and compiled extension modules
83(`.so`, `.pyd`, etc) in each of those paths to service an import request.
84Path-based module loading is a complicated beast, as it deals with all
85kinds of complexity like caching bytecode `.pyc` files, differentiating
86between Python modules and extension modules, namespace packages, finding
87search locations in registry entries, etc. Altogether, there are 1500+ lines
88constituting path-based importing logic in `importlib._bootstrap_external`!
89
90## Default Initialization of Python Importing Mechanism
91
92CPython's internals go through a convoluted series of steps to initialize
93the importing mechanism. This is because there's a bit of chicken-and-egg
94scenario going on. The *meta path importers* are implemented as Python
95modules using Python source code (`importlib._bootstrap` and
96`importlib._bootstrap_external`). But in order to execute Python code you
97need an initialized Python interpreter. And in order to execute a Python
98module you need to import it. And how do you do any of this if the importing
99functionality is implemented as Python source code and as a module?!
100
101A few tricks are employed.
102
103At Python build time, the source code for `importlib._bootstrap` and
104`importlib._bootstrap_external` are compiled into bytecode. This bytecode is
105made available to the global `PyImport_FrozenModules` array as the
106`_frozen_importlib` and `_frozen_importlib_external` module names,
107respectively. This means the bytecode is available for Python to load
108from memory and the original `.py` files are not needed.
109
110During interpreter initialization, Python initializes some special
111built-in extension modules using its internal import mechanism APIs. These
112bypass the Python-based APIs like `__import__`. This limited set of
113modules includes `_imp` and `sys`, which are both completely implemented in
114C.
115
116During initialization, the interpreter also knows to explicitly look for
117and load the `_frozen_importlib` module from its frozen bytecode. It creates
118a new module object by hand without going through the normal import mechanism.
119It then calls the `_install()` function in the loaded module. This function
120executes Python code on the partially bootstrapped Python interpreter which
121culminates with `BuiltinImporter` and `FrozenImporter` being registered on
122`sys.meta_path`. At this point, the interpreter can import compiled
123built-in extension modules and frozen modules. Subsequent interpreter
124initialization henceforth uses the initialized importing mechanism to
125import modules via normal import means.
126
127Later during interpreter initialization, the `_frozen_importlib_external`
128frozen module is loaded from bytecode and its `_install()` is also called.
129This self-installation adds `PathFinder` to `sys.meta_path`. At this point,
130modules can be imported from the filesystem. This includes `.py` based modules
131from the Python standard library as well as any 3rd party modules.
132
133Interpreter initialization continues on to do other things, such as initialize
134signal handlers, initialize the filesystem encoding, set up the `sys.std*`
135streams, etc. This involves importing various `.py` backed modules (from the
136filesystem). Eventually interpreter initialization is complete and the
137interpreter is ready to execute the user's Python code!
138
139## Our Importing Mechanism
140
141We use the multi-phase initialization mechanism provided by CPython 3.8+
142(PEP-587) to import `oxidized_importer` and inject its `OxidizedFinder`
143onto `sys.meta_path` during interpreter initialization.
144
145Essentially:
146
1471. Add `oxidized_importer` to `PyImport_Inittab` so it can be serviced by
148 `BuiltinImporter`.
1492. Enable multi-phase initialization by setting `PyConfig._init_main = 0`.
1503. Call `Py_InitializeFromConfig()` to initialize Python up to the point
151 where `.py` based modules need to be loaded.
1524. Construct an `OxidizedFinder` and install it on `sys.meta_path`. This entails
153 loading resources data, indexing built-ins and frozen modules, and clearing out
154 `sys.met_path` of the default meta path importers.
1555. Call `_Py_InitializeMain()` to finish Python initialization. `OxidizedFinder` is
156 able to service Python standard library imports.
1576. Clear out `sys.meta_path` and `sys.path_hooks` from unwanted changes made as part
158 of initializing "external" importers.
159
160By injecting `OxidizedFinder` onto `sys.meta_path[0]`, we effectively make it the
161highest priority importer. And if it has indexed everything needed as part of
162Python interpreter initialization, it essentially preempts the other standard
163library importers from doing anything.
164
165*/