It's time to open up that Python code you wrote a while back and add a few new features. Or maybe you just finished some new code and want to give it a once-over before releasing it to production. Either way, it’s time for some Python refactoring.
In this post, we’re going to cover four techniques for Python refactoring. Some of them are specific to the Python language, and one is more general. We’ll include samples with each technique to demonstrate it in action and show you how to apply it to your code.
What Is Refactoring?
Before we look at the techniques, let’s make sure we agree on what refactoring means.
Refactoring is altering source code without changing its behavior. We do this to improve the code’s design so it’s easier to maintain, update, and test.
Refactoring has been around for a long time, but Martin Fowler recently wrote a book on the subject, and his definition is worth taking a look at.
Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which "too small to be worth doing". However the cumulative effect of each of these transformations is quite significant. By doing them in small steps you reduce the risk of introducing errors. You also avoid having the system broken while you are carrying out the restructuring - which allows you to gradually refactor a system over an extended period of time
Fowler introduces a few key concepts here. Refactoring consists of small steps. It’s not rewriting code; it’s making small, gradual improvements. It’s also an ongoing process, not a series of big sweeping changes.
Refactoring Python Code
Use Enumerate() Instead of Loop Variables
Let’s go through a list and print the items with a counter.
i = 0 for item in things: print('{}: {}'.format(i, item)) i = i + 1
The counter doesn’t help with readability at all here. It’s initialized outside the loop, and incrementing it requires an extra step. In Java or C, we could use the increment operator inside the print statement, but it doesn’t exist in Python. The increment operator, to be honest, wouldn’t do much to make this code more readable either.
But there’s an easier and more pythonic solution than adding a variable and incrementing it inside the loop.
for i, item in enumerate(things): print('{}: {}'.format(i, item))
Python’s enumerate() does the work for you by adding a counter to the list. This code is half as long and much easier to understand.
Inline Variables When Possible
Refactoring isn’t only about reducing the number of lines in your source code even though that’s often a common side effect.
def get_value(self, key): if key in self._jinja_vars: value = self._jinja_vars[key] if value is not None: return value else: return None else: return None
In this example, value is declared and then goes out of scope almost immediately. We have two opportunities to shorten this code and make it more readable at the same time.
def get_value(self, key): if key in self._jinja_vars: return self._jinja_vars[key] else: return None
When we get rid of the check for None, we don’t need to create value at all. This is safe because we can return the value without checking for None since we would return it anyway. The important check, verifying that the key is in the dictionary, has already been done.
Use Values() To Traverse Dictionaries
Python's tools for manipulating its data structures make it easy to write readable code. But if you're coming to Python from another language, you might not be familiar with them.
Let's look at a function that searches a dictionary of dictionaries for a specific key/value pair.
def get_item_by_color(self, color): for keys in items: item = items[key] if item['color'] == color: return item
At first glance, this function looks like another candidate for the previous refactoring. We could factor out the creation of item on line 3 and use the dictionary key instead. But would that really make this code easier to read?
There's a better way. The values() function will return each value in the dictionary instead of the key.
def get_item_by_color(self, color): for item in items.values(): if item['color'] == color: return item
Since we can work directly on the item provided by the loop, this change shortens the code and makes it easier to understand at the same time.
DRY: Don’t Repeat Yourself
Many developers learn Python as a utility language for writing command-line scripts. They’re accustomed to seeing large blocks of code in a single source file, with few or no functions. The files grow line-by-line over time, often via cutting and pasting.
But even in the simplest scripts, don’t repeat yourself (DRY) is a programming guideline you should follow.
Let’s look at a code snippet that downloads information from a REST API. We could imagine that it started retrieving just one dictionary from the server and growing over time. Each time the developer added a new function, they copied and pasted the previous one and made a few modifications.
Each function repeats the code to query the web server. If we want to switch to a different authentication method, add request logging, or enhance error recovery, we’ll have to modify each one of them.
def _get_field_database(self, jira_host): url = '{}{}'.format(jira_host, '/rest/api/3/field') result = requests.get(url, auth=HTTPBasicAuth(self._jira_user, self._jira_passwd)) schema = json.loads(result.text) fields_by_name = {} for field in tqdm(schema, file=sys.stdout, desc="Fields database"): fields_by_name[field['name']] = field return fields_by_name def _get_user_database(self, jira_host): url = '{}{}'.format(jira_host, '/rest/api/3/users?maxResults=500') result = requests.get(url, auth=HTTPBasicAuth(self._jira_user, self._jira_passwd)) users = json.loads(result.text) user_by_name = {} user_by_email = {} for field in tqdm(users, file=sys.stdout, desc="User database"): if 'displayName' in field: user_by_name[field['displayName']] = field if 'emailAddress' in field: user_by_email[field['emailAddress']] = field else: user_by_email[field['displayName']] = field return user_by_name, user_by_email def _get_projects_database(self, jira_host): url = '{}{}'.format(jira_host, '/rest/api/3/project/search') result = requests.get(url, auth=HTTPBasicAuth(self._jira_user, self._jira_passwd)) projects = json.loads(result.text) project_by_name = {} for field in projects['values']: project_by_name[field['name']] = field return project_by_name
So let’s factor the REST API query into a separate function.
def _get_field_database(self, jira_host): schema = self._exec_query('/rest/api/3/field', jira_host) fields_by_name = {} for field in tqdm(schema, file=sys.stdout, desc="Fields database"): fields_by_name[field['name']] = field return fields_by_name def _exec_query(self, query_string, jira_host): url = '{}{}'.format(jira_host, query_string) result = requests.get(url, auth=HTTPBasicAuth(self._jira_user, self._jira_passwd)) return json.loads(result.text) def _get_user_database(self, jira_host): users = self._exec_query('/rest/api/3/users?maxResults=500', jira_host) user_by_name = {} user_by_email = {} for field in tqdm(users, file=sys.stdout, desc="User database"): if 'displayName' in field: user_by_name[field['displayName']] = field if 'emailAddress' in field: user_by_email[field['emailAddress']] = field else: user_by_email[field['displayName']] = field return user_by_name, user_by_email def _get_projects_database(self, jira_host): projects = self._exec_query('/rest/api/3/project/search', jira_host) project_by_name = {} for field in projects['values']: project_by_name[field['name']] = field return project_by_name
Now each function calls _exec_query() to get the data it needs from the server. This codes scans more easily now, and modifying the REST query code will be simpler when the time comes.
Python Refactoring in Pair Programming
Birgitta Böckeler and Nina Siessegger briefly discuss refactoring in a post about pair programming on Martin Fowler's blog. It ties the activity to pair programming with test-driven development (TDD.) Refactoring is a critical part of the TDD process since you always take the opportunity to refactor new code after passing a test.
The post recommends a "ping-pong" methodology, where pairs take turns writing a test, writing code to pass the test, then refactoring the code together. Having a second set of eyes to review code and make sure it's readable as soon as it's verified with a unit test makes for better code.
Are you interested in reading more about distributed agile teams? Read this guide about three steps that will help you get started now!
Get Started With Python Refactoring
We talked about what refactoring is and what its goals are. Then we covered four simple refactoring techniques that will help you make your Python code easier to read and maintain. We also discussed how you could fit refactoring into a pair programming setting.
Take the next step and see how you can apply these techniques to your Python code. But don't stop there! There are plenty more refactoring techniques you can use to up your Python game.
This post was written by Eric Goebelbecker. Eric has worked in the financial markets in New York City for 25 years, developing infrastructure for market data and financial information exchange (FIX) protocol networks. He loves to talk about what makes teams effective (or not so effective!).