Statically linking Python with Cython generated modules and packages!

Aug
2011
19

Programming, Python

2 comments

My latest adventures with SDL bindings eventually led me to Cython, a very recommendable tool if you are looking to extract a little bit more juice out of your Python app performance, or just hide your source a little bit more obscurely. It compiles almost any Python code as is, and it includes extensions to the language that allow even faster converted code with the inclusion of static typed variables and other niceties. Cython offers a way to automatically compile your .py/.pyx modules, and you load those dynamically with the familiar import command, the usage of the imported module is exactly the same as if it were a native .py module or a compiled c module.

At this point, it’s important to mention that the generated .c files and their corresponding linked versions depend on the Python runtime, you can’t make a standalone executable out of them…are least not easily. Now, let’s suppose you didn’t read the “not easily” part I just mentioned, and that you wanted to integrate this module (or any other module you made in C from scratch) in a statically linked Python interpreter, how you’d go about it?

The following instructions were tested under Ubuntu Natty 64 bits. First, start by downloading the Python source. Extract, copy Modules/Setup.dist to Modules/Setup and run configure with the following parameters:

./configure LDFLAGS=”-Wl,–no-export-dynamic -static-libgcc -static” CPPFLAGS=”-static -fPIC” LINKFORSHARED=” ” DYNLOADFILE=”dynload_stub.o” –disable-shared –prefix=”/path/to/where/you/want/it/installed”

 

Followed by the all too familiar make & make install

You will see A LOT of errors that you can ¿safely? ignore mostly related to the fact that the c modules that come from Python won’t compile in static mode without some help. Once this crazyness stops, you’ll have a static Python interpreter (you can check with ldd ./python to see that it’s actually a standalone executable).

Now, this Python interpreter is lacking severely in content, and no one wants to re invent the wheel, specially such a fine wheel as Python provides…Go to that Modules/Setup file and take a look…search for the #*shared* line, remove it and replace it by *static* (with no # sign)…now look for some notable modules and uncomment them. Run the process again (configure and make) and this time you’ll end up with some builtin modules that you can import.

By now, you are probably catching my drift… lets suppose you have a module test.py, run “cython test.py” on it, and you’ll get a test.c file…copy it to Modules under the Python source, and edit Modules/Setup adding a line:

 

test test.c

Do the configure and make dance again, and now you should be able to do “import test” in the new Python interpreter, which will load the module as builtin. Neat, right?

If you go further down the rabbit hole and start depending on 3rd party libraries (or your own!), you will need to pay attention to how dependencies are specified in Modules/Setup. In short, you put whatever compiler and linker directives you need after the source files for the module.

This is all fine and dandy, but we haven’t broken anything yet…Let’s try something more advanced…imagine you have a full Python package already made (as in a full hierachy of modules arranged in folders and subfolders, etc), and you want to do the same Cython fueled embedding with it…After hitting your head on the wall for a looong while, you’ll figure out that actually you can’t (easily) do it…Basically because the Python interpreter builtin system is not geared towards packages, but rather towards shallow modules.

So, there’s two ways around it (that I know of). The first one is to use a series of shallow modules, and string them into a package like structure by means of importing submodules from the parent modules…

main.py
import submod1 as _submod1
submod1 = _submod1

This is boring, error prone, requires a lot of glue code, it doesn’t play well with your module structure if you want it to also work in non compiled mode, etc.

The alternative is hacking the Python code just a little bit. Namely, the Python/import.c file, look for the find_module function and add:

 if (is_builtin(fullname)) {
         strcpy(buf, fullname);
         return &fd_builtin;
     }

Place this code near the top of the function, right above the “if (path != NULL && PyString_Check(path)) {” line seems like a good place. What it does is to check the full module name (package1.package2.module) and sees if it is builtin. The official Python code doesn’t do this, it checks only for the module name for the reasons stated above.

Besides this little patch, you have to alter the Cython generated code just a bit…look for the “Py_InitModule4″ line, and replace the module name for the whole package name (if the module is package1.package2.module that line will only say “module”, you need to replace it by the whole enchilada). Doing this by hand is a PITA, but a simple find+sed command takes care of it swiftly. Also, while you are unleashing your sed kung fu, take care of the init??? functions, if a module is at package1.package2.mymodule, replace initmymodule by initpackage1_package2_mymodule (the reason why you have to do this will become clear later…or maybe not and I’m just making this stuff up)

Now, you have to go back to the Modules/Setup and edit the module line you added by appending all your sources (seems like a good job for a Python script, right?). If you run configure and make at this point you’ll then see that…it doesn’t quite work. Why? Because Python depends on a __path__ variable to figure out which module is a package and which one is just a module. Yes, you need to add those…

This is simple enough, in every package __init__.py file, add a __path__=[‘package1/package2/…”,] line with the right path for the location of the file.

And finally, you are ready…well, not yet. There are two things more you need to do…first, as the Python build system is geared towards shallow packages, you’ll have a problem if files in different subpackages have the same name, as they’ll end up overwriting each other when they are compiled (this will certainly happen for the __init__.py files), so you have to figure out a way to flatten your structure before adding them to Modules/Setup. What I do is scan the whole structure and copy the *.c files to a separate folder, replacing the ‘/’ of the directory separator by a ‘+’ sign. This way package1/package2/module.c becomes package1+package2+module.c. Then, add all this files to the same line in Modules/Setup, and then it comes the final piece of glue:

If your overall package is called…let’s say “test” to be creative, create a test.c file with something like this:

 

#include "Python.h"
static PyMethodDef nomethods[] = {  {NULL, NULL}};
extern void inittest_module1();
extern void inittest_package1();
extern void inittest_package1_submodule();

PyMODINIT_FUNC
inittest(){
    PyObject* module;
    PyObject* __path__;

    // Add a __path__ attribute so Python knows that this is a package
    PyObject* package_gilbert = PyImport_AddModule("test");
    Py_InitModule("test", nomethods);
    __path__ = PyList_New(1);
    PyList_SetItem(__path__, 0, PyString_FromString("test"));
    PyModule_AddObject(package_test, "__path__", __path__);

    PyImport_AppendInittab("test.package1", inittest_package1);
    PyImport_AppendInittab("test.package1.submodule", inittest_package1_submodule);
    }

Append this file also to the Modules/Setup line. What this file does is create a “test” package, set up the __path__ variable accordingly, and append to the Python internal builtin table all of our modules. Now the reason for renaming the init functions earlier should become clear (just nod even if you go lost at Statically linking Python…)

Finally, run configure and make for the last time and your builtin package should be there…or not, there’s literally a hundred places where things could go wrong and the online documentation on the subject is quite sparse, that’s why I’m leaving this here for those brave souls that wish to try it. If something or everything in the process is not clear enough, let me know in the comments and good luck! (You’ll definitively will need it).