Wednesday, December 9, 2020

Python Platform Ecosystem - Enabling execution of user code in a platform

Working on a SaaS Multi-tenant platform, one of  the key aspects is to enable an ecosystem around it.  Building an ecosystem involves allowing user code to be executed in the platform. This will ensure that users can augment the functionality of the platform with complementing functionality using their own code. Ensuring this functionality enables versatility, growth and adaption of the platform, however it comes with its own challenges.

One of the key aspects in building an ecosystem is to allow user code to be executed inside the platform. This raises the following challenges.

  • The user code should not interfere with the core platform and other users
  • These packages need to be separate from the packages used by core platform or other users
  • Installing of packages that will be used by the user code
  • Ensure the user code to ensure it uses the packages installed
It is a huge as to ensure all of the above. Searching the web; I found information scattered over on how this can be achieved. Let us look into detail on how these can be achieved

Segregation of code
There will be an exclusive area for each user where their code will reside along with the libraries they need for the execution of the code; This is easily achievable by providing each user with a directory structure which will house the code automatically by the platform. Since the activities behind the scene are performed by the platform, there is no concern for security

Identify packages to be installed by user
Each user in his project structure will have a PythonLib sub-folder where the python libraries will be placed. 

How do we find what packages are already available and what is needed?

Python provides a library pkg_resources. This package has a function WorkingSet() which provides the packages that are available. 
  • This can be called without any parameters to get the global packages available
  • We can pass an array of path to find out the packages that have been installed in those paths
import pkg_resources

local_packages = pkg_resources.WorkingSet(["/home/ubuntu/<user>/pythonlib"])
local_package_list = [i.key for i in local_packages]

global_packages = pkg_resources.WorkingSet()
global_package_list = [i.key for i in global_packages] 

We can identify what packages are available globally to use and what need to be installed. Any package that needs to be installed will be installed in /home/ubuntu/<user>/pythonlib and that way it will not clash with the global packages and also with other users

Installing packages programmatically 

This is one of the very easy tasks to do

import subprocess
import sys

subprocess.check_call([sys.executable, "-m", "pip", "install", "-t", "/user/pythonlib/path", "packageIneed"])

Ensure the code uses the packages available in /user/pythonlib/path 

This is one of the tricky things to do. What I have done in my platform is to ensure that the local pythonlib path is a subdirectory of the separate user folder structure

Consider the following directory structure

/home/ubuntu/user1 -> this is an exclusive path for user1
    /usermodules -> this directory has all the python user modules
        /localpackages -> this is the directory where all the local packages are installed
            __init__.py -> empty file to ensure that localpackages is considered as a package

The trick is to modify the import statements just before execution so that they refer to the localpackages directory

Following is the example where yfinance & pandas are locally installed

Original Code:
import yfinance 
from pandas import read_csv

Modified Code:

from .localpackages import yfinance
from .localpackages.pandas import read_csv
I am have this stopgap program to do the above. I am still looking for a clean solution to do this using lexical parsing

pyobj = open(filename,"r")
pycode = pyobj.read()
pycodelines = pycode.splitlines()
pyobj.close()
packages = ["yfinance", "pandas"]

pyobj = open(filename,"w")
for line in pycodelines:
    token = line.split(" ")
    stripped_line = line.lstrip(' ')
    stripped_token = stripped_line.split(" ")
    if stripped_token[0] in ["import", "from"]:
        if stripped_token[1] in packages:
            new_token = []
            from_set = False
            import_set = False
            for t in token:
                if t == "from":
                    from_set = True
                    new_token.append(t)
                    continue
                if t == "import" and not from_set:
                    t = "from .localmodules " + t
                    new_token.append(t)
                    continue
                if from_set:
                    if t in packages:
                        t = ".localmodules." + t
                        new_token.append(t)
                        continue
                new_token.append(t)
            new_line = " ".join(new_token)
        else:
            new_line = line
    else:
        new_line = line
    print(new_line)
    pyobj.write(new_line + "\n")
    
pyobj.close()

No comments:

Post a Comment