This article covers how to resolve python dependencies using Python’s Abstract Syntax Trees (AST). There are different and maybe better ways to understand the scope of your python dependencies. This article tries to display how AST can be used for different types of scenarios.

The use-case#

In a complex python repository, there are different modules which all import objects from each other.

An increase in the number of cross module imports leads to multiple issues in a large codebase:

  • Increase in time taken to run unit tests
  • Increase in app startup time
  • Cascading effects on upstream changes

In order to decouple these cross module imports, I am planning to pick objects of high contention and then start an effort to decouple those to another library of its own.

This functionality already comes built-in with most build tools like https://www.pantsbuild.org/. However, this article tries to leverage ASTs to achieve the same goal.

Solution#

The solution proposed here for the cross module imports is to pick a high contention object/function, and then copy over all the dependencies of the module to a destination folder.

The input should be as follows:

  • Provide the file path and the object name. The object can be of any format, i.e a class, function, variable, etc.
  • Provide the folder path where the result of dependency resolver should construct the object and its dependencies, i.e the output folder

The solution operates based on the following steps:

  • Read the file to check the imports already present in the file
  • Copy the object text element to the provided input folder. The path inside the folder should be created dynamically. Ensure that the text element is added to the existing path and not overridden
  • Get the text element of the object and convert it into tokens
  • Match the tokens in the imported list and the tokens of the object. If there are cross module imports, then convert the imported object to a file path and object combination similar to the input
  • Repeat the Processing steps again
  • This should ensure all the dependencies are picked up

Keeping in mind the above steps, we define the following elements with their required goals

  • DependencyResolverManager
  • FileVisitor
  • ImportAnalyzer
  • DependencyResolverIO

Based on the above proposed steps, we define the DependencyResolverManager class which takes the object information, tokenizes the object code and figures out if any of the objects has been imported from another module. If it finds any matching imports, it recurses to that object and finds its imports. It continues the recursive loop till there are no more cross imports found.

class DependencyResolverManager:
  def _resolve_matching_objects(self, object_code: str, imported_objects: dict[str, str]):
    imported_objects_set: set[str] = set(imported_objects.keys())
    tokenizer: "Tokenizer" = Tokenizer(code=object_code)
    matching_objects: list["DependencyResolverManager"] = []                                                                                                                                                                                   
    for matched_object in imported_objects_set.intersection(tokenizer.tokenize()):
        matched_module: str = imported_objects[matched_object]                                                                                                                                                                                 
        file_path: str                                                                                                                                                                                                                         
        try:                                                                                                                                                                                                                                   
            file_path = ModuleHelper.get_module_file_from_name(matched_object, matched_module)                                                                                                                                                 
            resolve_for_obj: str = f"{file_path}::{matched_object}"                                                                                                                                                                            
            resolver: "DependencyResolverManager" = DependencyResolverManager(                                                                                                                                                                 
                resolve_for_obj=resolve_for_obj, resolve_count=self.resolve_count                                                                                                                                                              
            )                                                                                                                                                                                                                                  
            resolver.process_obj()                                                                                                                                                                                                             
        except Exception as e:                                                                                                                                                                                                                 
            print(e, matched_object, matched_module, traceback.print_exc())                                                                                                                                                                    
            continue

The Tokenizer is a wrapper class on the tokenize module

class Tokenizer:
    def __init__(self, *args, **kwargs):

         self.code: str = kwargs["code"]                                                                                                                                                                                                                              
    def tokenize(self) -> set[str]:                                                                                                                                                                                                                                  
        tokens = tokenize(BytesIO(self.code.encode('utf-8')).readline)                                                                                                                                                                                               
        token_set = set()                                                                                                                                                                                                                                            
        for toknum, tokval, _, _, _ in tokens:                                                                                                                                                                                                                       
            if toknum == NAME:                                                                                                                                                                                                                                       
                token_set.add(tokval)                                                                                                                                                                                                                                
        return token_setReferences

Now we enter the most interesting part of the article. The AST module.

We use the AST module here to find the imported objects, and to find definitions of classes, functions, variables, etc anything which can be imported.

Here is the ImportAnalyzer class which inherits from the ast.nodeVisitor class.

class ImportAnalyzer(ast.NodeVisitor):
        def __init__(self, *args, **kwargs):
          self.imported_modules: set[str] = set()
          self.module_name: str = kwargs["module_name"]
          self.object_name: str = kwargs.get("object_name", "")                                                                                                                                                                                              
          self.defined_objects: dict[str, str] = {}                                                                                                                                                                                                          
                                                                                                                                                                                                                                                             
      def _filter_modules(self, node_module: str) -> bool:                                                                                                                                                                                                   
          return False                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                             
      def visit_ImportFrom(self, node) -> None:                                                                                                                                                                                                              
          if node == None or node.module == None:
              return                                                                                                                                                                                                                                         
          if self._filter_modules(node.module):                                                                                                                                                                                                              
              return                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                             
          self.imported_modules.add(node.module)                                                                                                                                                                                                             
          for node_name in node.names:                                                                                                                                                                                                                       
              self.object_module_mapping[node_name.name] = node.module                                                                                                                                                                                       
                                                                                                                                                                                                                                                             
      def visit_FunctionDef(self, node) -> None:                                                                                                                                                                                                             
          self.defined_objects[node.name] = node                                                                                                                                                                                                             
                                                                                                                                                                                                                                                             
      def visit_ClassDef(self, node) -> None:                                                                                                                                                                                                                
          self.defined_objects[node.name] = node                                                                                                                                                                                                             
                                                                                                                                                                                                                                                             
      def visit_Assign(self, node) -> None:                                                                                                                                                                                                                  
          for assigned_obj in node.targets:                                                                                                                                                                                                                  
              curr_value: str = ""                                                                                                                                                                                                                           
              if "id" not in assigned_obj.__dict__.keys():                                                                                                                                                                                                   
                  curr_value = assigned_obj.value.id                                                                                                                                                                                                         
              else:                                                                                                                                                                                                                                          
                  curr_value = assigned_obj.id                                                                                                                                                                                                               
                                                                                                                                                                                                                                                             
              self.defined_objects[curr_value] = node                                                                                                                                                                                                        
                                                                                                                                                                                                                                                             
      def visit_AnnAssign(self, node) -> None:                                                                                                                                                                                                               
          curr_value: str = ""                                                                                                                                                                                                                               
          if "id" not in node.target.__dict__.keys():                                                                                                                                                                                                        
              curr_value = node.target.value.id                                                                                                                                                                                                              
          else:                                                                                                                                                                                                                                              
              curr_value = node.target.id                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                             
          self.defined_objects[curr_value] = node

The main methods to look at there are the methods prefixed with visit_* string. The AST module calls has these methods as callbacks whenever the specified type of object is encountered. The list of different object types can be found here.

The last step is taken care of by the DependencyResolverIO class which creates a similar folder structure as in the original repository.

The code for the above solution can be found here.

I hope you liked the article. Please let me know if you have any queries regarding the article. Happy reading!!

References#