pyoxidizer 0.24.0

Package self-contained Python applications
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
.. py:currentmodule:: starlark_pyoxidizer

.. _packaging_resources:

================================
Managing How Resources are Added
================================

An important concept in PyOxidizer packaging is how to manage *resources*
that are added to built applications.

A *resource* is some entity that will be packaged and distributed. Examples
of *resources* include Python module source and bytecode, Python
extension modules, and arbitrary files on the filesystem.

*Resources* are represented by a dedicated Starlark type for each
resource flavor (see :ref:`packaging_resource_types`).

During evaluation of PyOxidizer's Starlark configuration files,
*resources* are created and *added* to another Starlark type whose
job is to collect all desired *resources* and then do something with
them.

.. _packaging_resources_classified_files:

Classified Resources Versus Files
=================================

All resources in PyOxidizer are ultimately derived from or representable
by a file or a file-like primitive. For example, a
:py:class:`PythonModuleSource` is derived from or could be
manifested as a ``.py`` file.

Various PyOxidizer functionality works by scanning existing files and
turning those files into *resources*.

This file scanning functionality has two modes of operation: *classified*
and *files*. In *files* mode, PyOxidizer simply emits resources corresponding
to the raw files it encounters. In *classified* mode, PyOxidizer attempts to
*classify* a file as a particular resource and emit a strongly-typed
resource like :py:class:`PythonModuleSource` or :py:class:`PythonExtensionModule`.

*Classified* mode is more powerful because PyOxidizer is able to build
an *index* of typed resources at packaging time and make this *index*
available to :ref:`oxidized_importer` at run-time to facilitate faster
loading of resources.

However, the main downside to *classified* mode is it relies on being able
to identify files properly and this is unreliable. Python file layouts are
under-specified and there are many edge cases where PyOxidizer fails to
properly classify a file. See :ref:`cli_find_resources` for how to identify
problems here.

In *files* mode, PyOxidizer simply indexes and manages a named file
and its content. There is far less potential for PyOxidizer to make
mistakes about a file's type and how it is handled. This means that
*files* mode often *just works* when *classified* mode doesn't. The main
downside to *files* mode is that :ref:`oxidized_importer` doesn't have a
rich index embedded in the built binary, so you will have to rely on
Python's default filesystem-based importer, which is slower than
``oxidized_importer``.

.. _packaging_resource_packaging_policy:

Packaging Policies and Adding Resources
=======================================

The exact mechanism by which *resources* are emitted and added to *resource
collectors* is influenced by a *packaging policy* (represented by the
:py:class:`PythonPackagingPolicy` Starlark type) and attributes on
each resource object influencing how they are added.

When *resources* are created, the *packaging policy* determines whether
emitted resources are *classified* or simply *files*. And the *packaging
policy* is applied to each created resource to populate the initial values
for the various ``add_*`` attributes on the Starlark *resource* types.

When a resource is added (e.g. by calling
:py:meth:`PythonExecutable.add_python_resource`), these aforementioned
``add_*`` attributes are consulted and used to influence exactly how that
*resource* is added/packaged.

For example, a :py:class:`PythonModuleSource` can set attributes
indicating to exclude source code and only generate bytecode at
a specific optimization level. Or a :py:class:`PythonExtensionModule`
can set attributes saying to prefer to compile it into the built
binary or materialize it as a standalone dynamic extension module
(e.g. ``my_ext.so`` or ``my_ext.pyd``).

.. _packaging_resource_types:

Resource Types
==============

The following Starlark types represent individual resources:

:py:class:`PythonModuleSource`
   Source code for a Python module. Roughly equivalent to a ``.py`` file.

   This type can also be converted to Python bytecode (roughly equivalent
   to a ``.pyc``) when added to a resource collector.

:py:class:`PythonExtensionModule`
   A Python module defined through compiled, machine-native code. On Linux,
   these are typically encountered as ``.so`` files. On Windows, ``.pyd`` files.

:py:class:`PythonPackageResource`
   A non-module *resource file* loadable by Python resources APIs, such as
   those in ``importlib.resources``.

:py:class:`PythonPackageDistributionResource`
   A non-module *resource file* defining metadata for a Python package.
   Typically accessed via ``importlib.metadata``. This is how files in
   ``*.dist-info`` or ``*.egg-info`` directories are represented.

:py:class:`File`
   Represents a filesystem path and its content.

:py:class:`starlark_tugger.FileContent`
   Represents the content of a filesystem file.

   This is different from :py:class:`File` in that it only
   represents file content and doesn't have an associated path. (It is
   likely these 2 types will be merged someday.)

There are also Starlark types that are logically containers for multiple
resources:

:py:class:`starlark_tugger.FileManifest`
   Holds a mapping of relative filesystem paths to
   :py:class:`starlark_tugger.FileContent` instances. This type effectively
   allows modeling a directory tree.

:py:class:`PythonEmbeddedResources`
   Holds a collection of Python resources of various types. (This type is often
   hidden away. e.g. inside a :py:class:`PythonExecutable` instance.)

.. _packaging_resource_locations:

Resource Locations
==================

Resources have the concept of a *location*. A resource's *location*
determines where the data for that resource is packaged and how that
resource is loaded at run-time.

In-Memory
---------

When a Python resource is placed in the *in-memory* location, the content
behind the resource will be embedded in a built binary and loaded from there
by the Python interpreter.

Python modules imported from memory do not have the ``__file__`` attribute
set. This can cause compatibility issues if Python code is relying on the
existence of this module. See :ref:`no_file` for more.

Filesystem-Relative
-------------------

When a Python resource is placed in the *filesystem-relative* location,
the resource will be materialized as a file next to the produced entity.
e.g. a *filesystem-relative* :py:class:`PythonModuleSource` for the ``foo.bar``
Python module added to a :py:class:`PythonExecutable` will be materialized as the
file ``foo/bar.py`` or ``foo/bar/__init__.py`` in a directory next to the
built executable.

Resources added to *filesystem-relative* locations should be materialized
under paths that preserve semantics with standard Python file layouts. For
e.g. Python source and bytecode modules, it should be possible to point
``sys.path`` of any Python interpreter at the destination directory and
the modules will be loadable.

During packaging, PyOxidizer *indexes* all *filesystem-relative* resources
and embeds metadata about them in the built binary. While the files on the
filesystem may look like a standard Python install layout, loading them is
serviced by PyOxidizer's custom importer, not the standard importer that
Python uses by default.

.. _packaging_resource_custom_policies:

Customizing Python Packaging Policies
=====================================

As described in :ref:`packaging_resource_packaging_policy`, a
:py:class:`PythonPackagingPolicy` Starlark type instance is bound to every
entity creating *resource* instances and this *packaging policy* is
used to derive the default ``add_*`` attributes which influence
what happens when a resource is added to some entity.

:py:class:`PythonPackagingPolicy` instances can be customized to influence
what the default values of the ``add_*`` attributes are.

The primary mechanisms for doing this are:

1. Modifying the :py:class:`PythonPackagingPolicy` instance's internal
   state. See :py:class:`PythonPackagingPolicy` for the full
   list of object attributes and methods that can be set or called.
2. Registering a function that will be called whenever a resource
   is created. This enables custom Starlark code to perform
   arbitrarily complex logic to influence settings and enables
   application developers to devise packaging strategies more
   advanced than what PyOxidizer provides out-of-the-box.

The following sections give examples of customized packaging
policies.

.. _packaging_resources_resources_mode:

Changing the Resource Handling Mode
-----------------------------------

As documented in :ref:`packaging_resources_classified_files`, PyOxidizer
can operate on *classified* resources or *files*-based resources.

:py:meth:`PythonPackagingPolicy.set_resource_handling_mode`
exists to change the operating mode of a :py:class:`PythonPackagingPolicy`
instance.

.. code-block:: python

   def make_exe():
       dist = default_python_distribution()

       policy = dist.make_python_packaging_policy()

       # Set policy attributes to only operate on "classified" resource types.
       # (This is the default.)
       policy.set_resource_handling_mode("classify")

       # Set policy attributes to only operate on `File` resource types.
       policy.set_resource_handling_mode("files")

:py:meth:`PythonPackagingPolicy.set_resource_handling_mode` is
just a convenience method for manipulating a collection of attributes on
:py:class:`PythonPackagingPolicy` instances. If you don't like the behavior of
its pre-defined modes, feel free to adjust attributes to suit your needs.
You can even configure things to emit both *classified* and *files*
variants simultaneously!

.. _packaging_resource_default_resource_location:

Customizing Default Resource Locations
--------------------------------------

The :py:attr:`PythonPackagingPolicy.resources_location` and
:py:attr:`PythonPackagingPolicy.resources_location_fallback` attributes define
primary and fallback locations that resources should attempt to be added
to. These effectively define the default values for the ``add_location``
and ``add_location_fallback`` attributes on individual resource objects.

The accepted values are:

``in-memory``
   Load resources from memory.

``filesystem-relative:prefix``
   Load resources from the filesystem at a path relative to some entity
   (probably the binary being built).

Additionally, :py:attr:`PythonPackagingPolicy.resources_location_fallback` can be
set to ``None`` to remove a fallback location.

And here is how you would manage these values in Starlark:

.. code-block:: python

   def make_exe():
       dist = default_python_distribution()

       policy = dist.make_python_packaging_policy()
       policy.resources_location = "in-memory"
       policy.resources_location_fallback = None

       # Only allow resources to be added to the in-memory location.
       exe = dist.to_python_executable(
           name = "myapp",
           packaging_policy = policy,
       )

       # Only allow resources to be added to the filesystem-relative location under
       # a "lib" directory.

       policy = dist.make_python_packaging_policy()
       policy.resources_location = "filesystem-relative:lib"
       policy.resources_location_fallback = None

       exe = dist.to_python_executable(
           name = "myapp",
           packaging_policy = policy,
       )

       # Try to add resources to in-memory first. If that fails, add them to a
       # "lib" directory relative to the built executable.

       policy = dist.make_python_packaging_policy()
       policy.resources_location = "in-memory"
       policy.resources_location_fallback = "filesystem-relative:lib"

       exe = dist.to_python_executable(
           name = "myapp",
           packaging_policy = policy,
       )

       return exe

.. _packaging_resource_callback:

Using Callbacks to Influence Resource Attributes
------------------------------------------------

The :py:meth:`PythonPackagingPolicy.register_resource_callback` method will
register a function to be called when resources are created. This function
receives as arguments the active :py:class:`PythonPackagingPolicy` and the newly
created resource.

Functions registered as resource callbacks are called after the
``add_*`` attributes are derived for a resource but before the resource
is otherwise made available to other Starlark code. This means that
these callbacks provide a hook point where resources can be modified as
soon as they are created.

``register_resource_callback()`` can be called multiple times to register
multiple callbacks. Registered functions will be called in order of
registration.

Functions can be leveraged to unify all resource packaging logic in a
single place, making your Starlark configuration files easier to reason
about.

Here's an example showing how to route all resources belonging to
a single package to a ``filesystem-relative`` location and everything
else to memory:

.. code-block:: python

   def resource_callback(policy, resource):
       if type(resource) in ("PythonModuleSource", "PythonPackageResource", "PythonPackageDistributionResource"):
           if resource.package == "my_package":
               resource.add_location = "filesystem-relative:lib"
           else:
               resource.add_location = "in-memory"

   def make_exe():
       dist = default_python_distribution()

       policy = dist.make_python_packaging_policy()
       policy.register_resource_callback(resource_callback)

       exe = dist.to_python_executable(
           name = "myapp",
           packaging_policy = policy,
       )

       exe.add_python_resources(exe.pip_install(["my_package"]))

.. _python_extension_module_location_compatibility:

``PythonExtensionModule`` Location Compatibility
================================================

Many resources *just work* in any available location. This is not the case for
:py:class:`PythonExtensionModule` instances!

While there only exists a single :py:class:`PythonExtensionModule` type to represent
Python extension modules, Python extension modules come in various flavors.
Examples of flavors include:

* A module that is part of a Python *distribution* and is compiled into
  ``libpython`` (a *builtin* extension module).
* A module that is part of a Python *distribution* that is compiled as a
  standalone shared library (e.g. a ``.so`` or ``.pyd`` file).
* A non-*distribution* module that is compiled as a standalone shared library.
* A non-*distribution* module that is compiled as a static library.

Not all extension module *flavors* are compatible with all Python
*distributions*. Furthermore, not all *flavors* are compatible with all
build configurations.

Here are some of the rules governing extension modules and their locations:

* A *builtin* extension module that's part of a Python *distribution* will
  always be statically linked into ``libpython``.
* A Windows Python distribution with a statically linked ``libpython``
  (e.g. the ``standalone_static`` *distribution flavor*) is not capable
  of loading extension modules defined as shared libraries and only supports
  loading *builtin* extension modules statically linked into the binary.
* A Windows Python distribution with a dynamically linked ``libpython``
  (e.g. the ``standalone_dynamic`` *distribution flavor*) is capable of
  loading shared library backed extension modules from the *in-memory*
  location. Other operating systems do not support the *in-memory* location
  for loading shared library extension modules.
* If the current build configuration targets Linux MUSL-libc, shared library
  extension modules are not supported and all extensions must be statically
  linked into the binary.
* If the object files for the extension module are available, the extension
  module may be statically linked into the produced binary.
* If loading extension modules from in-memory import is supported, the
  extension module will have its dynamic library embedded in the binary.
* The extension module will be materialized as a file next to the produced
  binary and will be loaded from the filesystem. (This is how Python
  extension modules typically work.)

.. note::

   Extension module handling is one of the more nuanced aspects of PyOxidizer.
   There are likely many subtle bugs and room for improvement. If you
   experience problems handling extension modules, please consider
   `filing an issue <https://github.com/indygreg/PyOxidizer/issues>`_.