pyoxidizer 0.24.0

.. py:currentmodule:: starlark_pyoxidizer

.. _rust_porting:

====================================
Porting a Python Application to Rust
====================================

PyOxidizer can be used to gradually port a Python application to Rust.
What we mean by this is that Python code in an application would slowly
be rewritten in Rust.

Overview
========

When porting a Python application to Rust, the goal is to port Python
code - and possibly Python C extension code - to Rust. Parts of the Rust
code will presumably need to call into Python code and vice-versa.

When porting code to Rust, there are essentially two *flavors* of Rust
code that will be written and executed:

1. *Vanilla* Rust code
2. *Python-flavored* Rust code

*Vanilla* Rust code is standard Rust code. It is what you would write if
authoring a Rust-only project.

*Python-flavored* Rust code is Rust code that interacts with the Python C
API. It is regular Rust code, of course, but it is littered with references
to `PyObject` and function calls into the Python C API (although these
function calls may be abstracted so you don't have to use ``unsafe``).

These different *flavors* of Rust code dictate different approaches to
porting. Both *flavors*/approaches can be used simultaneously when porting
an application to Rust.

*Vanilla* Rust code will supplement the boilerplate Rust code that PyOxidizer
uses to define and build a standalone executable embedded Python. See
:ref:`extending_rust_projects` for more.

*Python-flavored* Rust code typically involves writing Python extension
modules in Rust. In this approach, you create a Python extension modules
implemented in Rust and then make them available to the Python interpreter,
which is managed by a Rust project.

.. _extending_rust_projects:

Extending Rust Projects
=======================

When building an application from a standalone ``pyoxidizer.bzl`` file,
PyOxidizer creates and builds a temporary, boilerplate Rust project behind
the scenes. This Rust project has just enough code to initialize and run an
embedded Python interpreter. That's the extent of the Rust code.

PyOxidizer also supports persistent Rust projects. In this mode, you have
full control over the Rust project and can add custom Rust code to it as
you desire. In this mode, you can run Rust code independent of the Python
interpreter.

Supplementing the Rust code contained in your executable gives you the power
to run arbitrary Rust code however you see fit. Here are some common scenarios
this can enable:

* Implementing argument parsing in Rust instead of Python. This could allow you
  to parse out the sub-command being invoked and dispatch to pure Rust code
  paths if possible, falling back to running Python code only if necessary.
* Running a *forking* server, which doesn't start a Python interpreter until an
  event occurs.
* Starting a thread with a high-performance application component implemented in
  Rust. For example, you could run a thread servicing a high-performance logging
  subsystem or HTTP server implemented in Rust and have that thread interact with
  a Python interpreter via a pipe or some other handle.

Getting Started
---------------

To extend a Rust project with custom Rust code, you'll first want to
materialize the boilerplate Rust project used by PyOxidizer::

    $ pyoxidizer init-rust-project myapp

See :ref:`rust_projects` for details on the files materialized by this command.

If you are using version control, now would be a good time to add the created
files to version control. e.g.::

    $ git add myapp
    $ git commit -m 'create boilerplate PyOxidizer project'

From here, your next steps are to modify the Rust project to do something
new and different.

The auto-generated ``src/main.rs`` file contains the ``main()`` function used
as the entrypoint for the Rust executable. The default file will simply
instantiate a Python interpreter from a configuration, run that interpreter,
then exit the process.

To extend your application with custom Rust code, simply add custom code to
``main()``. e.g.

.. code-block:: rust

   fn main() {
       println!("hello from Rust!")

       // Code auto-generated by ``pyoxidizer init-rust-project`` goes here.
       // ...
   }

That is literally all there is to it!

To build your custom Rust project, ``pyoxidizer build`` is the most robust way
to do that. But it is also possible to use ``cargo build``.

What Can Go Wrong
-----------------

``pyoxidizer`` Not Found or Rust Code Version Mismatch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When using ``cargo build``, the ``pyoxidizer`` executable will be invoked behind
the scenes. This requires that executable to be on ``PATH`` and for the version
to be compatible with the Rust code you are trying to build. (The Rust APIs do
change from time to time.)

If the ``pyoxidizer`` executable is not on ``PATH`` or its version doesn't
match the Rust code, you can forcefully tell the Rust build system which
``pyoxidizer`` executable to use::

    $ PYOXIDIZER_EXE=/path/to/pyoxidizer cargo build

``thread 'main' panicked at 'jemalloc is not available in this build configuration'``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you see this error, the problem is that the Python interpreter configuration
says to use *jemalloc* as the memory allocator but the Rust project was built
without *jemalloc* support. This is likely because the default Rust project
features in ``Cargo.toml`` don't include *jemalloc* by default.

You can resolve this issue by either disabling jemalloc in the Python
configuration or by enabling jemalloc in Rust.

To disable jemalloc, open your ``pyoxidizer.bzl`` file and find the
definition of ``allocator_backend``. You can set it to ``raw_allocator="default"``
so Python uses the system memory allocator instead of jemalloc.

To enable jemalloc, you have a few options.

First, you could build the Rust project with jemalloc support::

    $ cargo build --features allocator-jemalloc

Or, you modify ``Cargo.toml`` so the *allocator-jemalloc* feature is enabled by
default::

.. code-block:: toml

   [features]
   default = ["build-mode-pyoxidizer-exe", "allocator-jemalloc"]

*jemalloc* is typically a faster allocator than the system allocator. So if
you care about performance, you may want to use it.

Implementing Python Extension Modules in Rust
=============================================

If you want to port a Python application to Rust, chances are that you
will need to have Rust and Python code interact with each other. A common
way to do this is to implement Python extensions in Rust so that Rust code
will be invoked as a Python interpreter is running.

There are two ways Rust-implemented Python extension modules can be
consumed by PyOxidizer:

1. Define them via Python packaging tools (e.g. via a ``setup.py`` file
   for your Python package).
2. Define them in Rust code and register them as a *built-in* extension
   module.

Python Built Rust Extension Modules
-----------------------------------

If you've defined a Rust Python extension module via a Python package
build tool (e.g. inside a ``setup.py``), PyOxidizer should automatically
detect said extension module as part of packaging the corresponding Python
package: there is no need to take special action to tell PyOxidizer it is
a Rust extension, as this is all handled by Python packaging tools invoked
as part of processing your ``pyoxidizer.bzl`` file.

See :ref:`packaging` for more.

The topic of authoring Python extension modules implemented in Rust is
arguably outside the scope of this documentation. A search engine search
for ``Rust Python extension`` should set you on the right track.

Built-in Rust Extension Modules
-------------------------------

A Python extension module is defined as a ``PyInit__<name>`` function which
is called to initialize an extension module. Typically, Python extension
modules are compiled as standalone shared libraries, which are then loaded
into a process, after which their ``PyInit__<name>`` function is called.

But Python has an additional mechanism for defining extension modules:
*built-ins*. A *built-in* extension module is simply an extension module
whose ``PyInit__<name>`` function is already present in the process address
space. Typically, these are extensions that are part of the Python distribution
itself and are compiled directly into ``libpython``.

When you instantiate a Python interpreter, you give it a list of the
available *built-in* Python extension modules. And PyOxidizer's ``pyembed``
crate allows you to supplement the default list with custom extensions.

To use *built-in* extension modules implemented in Rust, you'll need to
implement said extension module in Rust, either as part of your application's
Rust crate or as part of a different crate. Either way, you'll need to
extend the boilerplate Rust project code (see :ref:`extending_rust_projects`)
and tell it about additional *built-in* extension modules. See
:ref:`pyembed_extension_modules` for instructions on how to do this.

The tricky part here is implementing your Rust extension module.

You probably want to use the `cpython <https://crates.io/crates/cpython>`_
or `PyO3 <https://crates.io/crates/PyO3>`_ Rust crates for interfacing with the
CPython API, as these provide an interface that is more ergonomic and doesn't
require use of ``unsafe { }``. Use of these crates is beyond the scope of the
PyOxidizer documentation.

If you attempt to use the ``cpython`` or ``PyO3`` macros for defining a
Python extension module, you'll likely run into problems because these assume
that extension modules are standalone shared libraries, which isn't the case for
*built-in* extension modules!

If you attempt to use a separate Rust crate to define your extension module,
you may run into Python symbol issues at link time because the build system
for the ``cpython`` and ``PyO3`` crates will use their own logic for locating
a Python interpreter and that interpreter may not have a configuration that
is compatible with the one embedded in your PyOxidizer binary!

At the end of the day, all you need to register a *built-in* extension module
with PyOxidizer is an ``extern "C" fn () -> *mut python3_sys::PyObject``. Here
is the boilerplate for defining a Python extension module in Rust (this uses
the ``cpython`` crate).

.. code-block:: rust

    use python3_sys as pyffi;
    use cpython::{PyErr, PyModule, PyObject};

    static mut MODULE_DEF: pyffi::PyModuleDef = pyffi::PyModuleDef {
        m_base: pyffi::PyModuleDef_HEAD_INIT,
        m_name: std::ptr::null(),
        m_doc: std::ptr::null(),
        m_size: std::mem::size_of::<ModuleState>() as isize,
        m_methods: 0 as *mut _,
        m_slots: 0 as *mut _,
        m_traverse: None,
        m_clear: None,
        m_free: None,
    };

    #[allow(non_snake_case)]
    pub extern "C" fn PyInit_my_module() -> *mut pyffi::PyObject {
        let py = unsafe { cpython::Python::assume_gil_acquired() };

        unsafe {
            if MODULE_DEF.m_name.is_null() {
                MODULE_DEF.m_name = "my_module".as_ptr() as *const _;
                MODULE_DEF.m_doc = "usage docs".as_ptr() as *const _;
            }
        }

        let module = unsafe { pyffi::PyModule_Create(&mut MODULE_DEF) };

        if module.is_null() {
            return module;
        }

        let module = match unsafe { pyffi::from_owned_ptr(py, module).cast_into::<PyModule>(py) } {
            Ok(m) => m,
            Err(e) => {
                PyErr::from(e).restore(py);
                return std::ptr::null_mut();
            }
        };

        match module_init(py, &module) {
            Ok(()) => module.into_object().steal_ptr(),
            Err(e) => {
                e.restore(py);
                std::ptr::null_mut()
            }
        }
    }

If you want a concrete example of what this looks like and how to do things like
define Python types and have Python functions implemented in Rust, do a search for
``PyInit_oxidized_importer`` in the source code of the ``pyembed`` crate (which
is part of the PyOxidizer repository) and go from there.

The documentation for authoring Python extension modules and using the Python
C API is well beyond the scope of this document. A good place to start is the
`official documentation <https://docs.python.org/3/extending/index.html>`_.