Using Python AST to resolve dependencies
This article covers how to resolve python dependencies using Python’s Abstract Syntax Trees (AST). There are different and maybe better ways to understand the scope of your python dependencies. This article tries to display how AST can be used for different types of scenarios.
The use-case#
In a complex python repository, there are different modules which all import objects from each other.
An increase in the number of cross module imports leads to multiple issues in a large codebase:
- Increase in time taken to run unit tests
- Increase in app startup time
- Cascading effects on upstream changes
In order to decouple these cross module imports, I am planning to pick objects of high contention and then start an effort to decouple those to another library of its own.
This functionality already comes built-in with most build tools like https://www.pantsbuild.org/. However, this article tries to leverage ASTs to achieve the same goal.
Solution#
The solution proposed here for the cross module imports is to pick a high contention object/function, and then copy over all the dependencies of the module to a destination folder.
The input should be as follows:
- Provide the file path and the object name. The object can be of any format, i.e a class, function, variable, etc.
- Provide the folder path where the result of dependency resolver should construct the object and its dependencies, i.e the output folder
The solution operates based on the following steps:
- Read the file to check the imports already present in the file
- Copy the object text element to the provided input folder. The path inside the folder should be created dynamically. Ensure that the text element is added to the existing path and not overridden
- Get the text element of the object and convert it into tokens
- Match the tokens in the imported list and the tokens of the object. If there are cross module imports, then convert the imported object to a file path and object combination similar to the input
- Repeat the Processing steps again
- This should ensure all the dependencies are picked up
Keeping in mind the above steps, we define the following elements with their required goals
- DependencyResolverManager
- FileVisitor
- ImportAnalyzer
- DependencyResolverIO
Based on the above proposed steps, we define the DependencyResolverManager
class which takes the object information, tokenizes the object code and figures out if any of the objects has been imported from another module. If it finds any matching imports, it recurses to that object and finds its imports. It continues the recursive loop till there are no more cross imports found.
class DependencyResolverManager:
def _resolve_matching_objects(self, object_code: str, imported_objects: dict[str, str]):
imported_objects_set: set[str] = set(imported_objects.keys())
tokenizer: "Tokenizer" = Tokenizer(code=object_code)
matching_objects: list["DependencyResolverManager"] = []
for matched_object in imported_objects_set.intersection(tokenizer.tokenize()):
matched_module: str = imported_objects[matched_object]
file_path: str
try:
file_path = ModuleHelper.get_module_file_from_name(matched_object, matched_module)
resolve_for_obj: str = f"{file_path}::{matched_object}"
resolver: "DependencyResolverManager" = DependencyResolverManager(
resolve_for_obj=resolve_for_obj, resolve_count=self.resolve_count
)
resolver.process_obj()
except Exception as e:
print(e, matched_object, matched_module, traceback.print_exc())
continue
The Tokenizer
is a wrapper class on the tokenize
module
class Tokenizer:
def __init__(self, *args, **kwargs):
self.code: str = kwargs["code"]
def tokenize(self) -> set[str]:
tokens = tokenize(BytesIO(self.code.encode('utf-8')).readline)
token_set = set()
for toknum, tokval, _, _, _ in tokens:
if toknum == NAME:
token_set.add(tokval)
return token_setReferences
Now we enter the most interesting part of the article. The AST module.
We use the AST module here to find the imported objects, and to find definitions of classes, functions, variables, etc anything which can be imported.
Here is the ImportAnalyzer
class which inherits from the ast.nodeVisitor
class.
class ImportAnalyzer(ast.NodeVisitor):
def __init__(self, *args, **kwargs):
self.imported_modules: set[str] = set()
self.module_name: str = kwargs["module_name"]
self.object_name: str = kwargs.get("object_name", "")
self.defined_objects: dict[str, str] = {}
def _filter_modules(self, node_module: str) -> bool:
return False
def visit_ImportFrom(self, node) -> None:
if node == None or node.module == None:
return
if self._filter_modules(node.module):
return
self.imported_modules.add(node.module)
for node_name in node.names:
self.object_module_mapping[node_name.name] = node.module
def visit_FunctionDef(self, node) -> None:
self.defined_objects[node.name] = node
def visit_ClassDef(self, node) -> None:
self.defined_objects[node.name] = node
def visit_Assign(self, node) -> None:
for assigned_obj in node.targets:
curr_value: str = ""
if "id" not in assigned_obj.__dict__.keys():
curr_value = assigned_obj.value.id
else:
curr_value = assigned_obj.id
self.defined_objects[curr_value] = node
def visit_AnnAssign(self, node) -> None:
curr_value: str = ""
if "id" not in node.target.__dict__.keys():
curr_value = node.target.value.id
else:
curr_value = node.target.id
self.defined_objects[curr_value] = node
The main methods to look at there are the methods prefixed with visit_*
string. The AST module calls has these methods as callbacks whenever the specified type of object is encountered. The list of different object types can be found here.
The last step is taken care of by the DependencyResolverIO
class which creates a similar folder structure as in the original repository.
The code for the above solution can be found here.
I hope you liked the article. Please let me know if you have any queries regarding the article. Happy reading!!