- Python Advent Calendar
- Posts
- Day 2: Pandas Flavor Chains
Day 2: Pandas Flavor Chains
Unlock the sweet taste of idiomatic pandas
š Python Advent Calendar: Day 2! š
Every day until Christmas weāll open a new door to Python secrets. Behind todayās door you will discover a whole new way to use pandas by combining the principles of method chaining with the power of pandas_flavor. Who doesnāt love a good chain Ā¹?
1ļøā£ Modern Pandas: An idiomatic guide by a pandas maintainer
š Written: April 2016
One of our top recommended teaching resources at Coefficient is Tom Augspurgerās blog post on method chaining in pandas. Adapting a concept Tom borrowed from Jeff Allen of dplyr, hereās a story we might tell in Python:
come_to( find_out( check(make(santa_claus, "list"), n=2), "naughty_or_nice" ), "town" )
and hereās how we can use the āpipeā operator in R, which feeds the thing on the left into the first argument of the function on the right:
santa_claus %>% make("list") %>% check(n=2) %>% find_out("naughty_or_nice") %>% come_to("town")
Hopefully youāre not writing Python āinside outā like the first example here, but it is very common (especially for pandas users) to create an entire variable name just for a temporary transformation step. I call this the āhype manā pandas style:
made_list = make(santa_claus, "list") checked_twice = check(made_list, n=2) found_out = find_out(checked_twice, "naughty_or_nice") santas_in_town = come_to(found_out, "town")
Unless you really like inventing variable names, this style of code can be avoided using method chains to create cleaner, more readable, and in some cases more performant code:
santa_claus = pd.DataFrame() ( santa_claus.pipe(make, "list") .pipe(check, n=2) .pipe(find_out, "naughty_or_nice") .pipe(come_to, "town") )
There are some good arguments for āhype manā style, for example when first writing the pipeline, testing the outputs of each stage, debugging or performance optimisation, but in the long-term our code should be āwritten for people to read, and only incidentally for machines to executeā (Harold Abelson, Structure and Interpretation of Computer Programs).
It would be nice, however, if we didnāt need those .pipe()
arguments. Hereās what Tom said about this back in 2016:
Monkeypatching on your own methods is fragile. Itās not easy to correctly subclass pandasā DataFrame to extend it with your own methods. Composition, where you create a class that holds onto a DataFrame internally, may be fine for your own code, but it wonāt interact well with the rest of the ecosystem so your code will be littered with lines extracting and repacking the underlying DataFrame.
With this concept in place, let us introduce todayās packageā¦ š„š„š„
2ļøā£ pandas_flavor: DIY custom DataFrame methods
š Last updated: July 2023
ā¬ļø Downloads: 53,651/week
āļø License: MIT
š PyPI | ā GitHub Stars: 288
š What is it?
A simpler API was added in pandas 0.23 for registering methods (and accessors) to DataFrames and Series. This library makes it easy to add your own custom functionality to any DataFrame, or even to share a custom DataFrame analytics suite within your team by registering your analytics class under a single namespaced accessor.
š¦ Install
pip install pandas-flavor
š ļø Use
Because itās Christmas (in 23 days!), letās make a working version of our Santa Claus example. As always, you can find the full notebook in the GitHub repo for this advent calendar. Note that weāre using the wonderful Faker package to generate some random names.
# Imports & setup import pandas as pd from faker import Faker fake = Faker() # A normal Python function, for now... def make(df: pd.DataFrame, item: str, n=4) -> pd.DataFrame: """Add a column called `item` to a pandas DataFrame, with n rows.""" return df.assign(item=[fake.name() for i in range(n)])
We can call this function on an empty DataFrameā¦
santa_claus = pd.DataFrame() make(santa_claus, item="names", n=5)
ā¦or we can do this using the .pipe()
method:
santa_claus.pipe(make, item="list")
Time for some š© pandas_flavor magic šŖ, letās add a custom pandas method called .make() by adding a single decorator to our function:
import pandas_flavor as pf @pf.register_dataframe_method def make(df, item, n=4): return df.assign(item=[fake.name() for i in range(n)]) santa_claus.make(item="list", n=2)
Finally, hereās a full working demo, you can find the code in this GitHub repo.
First, a little revisionā¦hereās what weāve seen so far:
Now letās write functions for make()
, check()
, find_out()
and come_to()
And finally, letās unlock that sweet taste of š¼ idiomatic pandas š¼
šŗ Stay Tuned for More! š
If youāre enjoying this Python Advent Calendar so far, think to yourself: who else would like this? Share it with them! Forward this email, share it via WhatsApp with your family, tag us on Twitter (@CoefficientData) or LinkedIn, whatever works. Donāt forget we also have the python-advent-calendar GitHub repo full of code examples from every newsletter.
See you tomorrow! š
Your Python Advent Calendar Team š
š¤ Python Advent Calendar is brought to you by Coefficient, a data consultancy with expertise in data science, software engineering, devops, machine learning and other AI-related services. We code, we teach, we speak, weāre part of the PyData London Meetup team, and we love giving back to the community. If youād like to work with us, just email [email protected] and weāll set up a call to say hello. āļø
P.S. We love feedback! Did you like today's content? Did we miss a good Python package? Is there a package or tool or top tip you think we should feature? Whatever it is, reach out to us. Hereās a link to our privacy policy.
Ā¹ Other good chains: the All Saints BBC Radio 2 live version, and (for the band name alone) the cover by Meetwood Flac.