Day 2: Pandas Flavor Chains

Unlock the sweet taste of idiomatic pandas

šŸŽ„ Python Advent Calendar: Day 2! šŸŽ„

Every day until Christmas weā€™ll open a new door to Python secrets. Behind todayā€™s door you will discover a whole new way to use pandas by combining the principles of method chaining with the power of pandas_flavor. Who doesnā€™t love a good chainĀ Ā¹?

1ļøāƒ£Ā Ā Modern Pandas: An idiomatic guide by a pandas maintainer

šŸ“†Ā Written: April 2016

One of our top recommended teaching resources at Coefficient is Tom Augspurgerā€™s blog post on method chaining in pandas. Adapting a concept Tom borrowed from Jeff Allen of dplyr, hereā€™s a story we might tell in Python:

come_to(
    find_out(
	check(make(santa_claus, "list"), n=2),
    	"naughty_or_nice"
    ),
    "town"
)

and hereā€™s how we can use the ā€œpipeā€ operator in R, which feeds the thing on the left into the first argument of the function on the right:

santa_claus %>%
    make("list") %>%
    check(n=2) %>%
    find_out("naughty_or_nice") %>%
    come_to("town")

Hopefully youā€™re not writing Python ā€œinside outā€ like the first example here, but it is very common (especially for pandas users) to create an entire variable name just for a temporary transformation step. I call this the ā€œhype manā€ pandas style:

made_list = make(santa_claus, "list")
checked_twice = check(made_list, n=2)
found_out = find_out(checked_twice, "naughty_or_nice")
santas_in_town = come_to(found_out, "town")

Unless you really like inventing variable names, this style of code can be avoided using method chains to create cleaner, more readable, and in some cases more performant code:

santa_claus = pd.DataFrame()
(
    santa_claus.pipe(make, "list")
    .pipe(check, n=2)
    .pipe(find_out, "naughty_or_nice")
    .pipe(come_to, "town")
)

There are some good arguments for ā€œhype manā€ style, for example when first writing the pipeline, testing the outputs of each stage, debugging or performance optimisation, but in the long-term our code should be ā€œwritten for people to read, and only incidentally for machines to executeā€ (Harold Abelson, Structure and Interpretation of Computer Programs).

It would be nice, however, if we didnā€™t need those .pipe() arguments. Hereā€™s what Tom said about this back in 2016:

Monkeypatching on your own methods is fragile. Itā€™s not easy to correctly subclass pandasā€™ DataFrame to extend it with your own methods. Composition, where you create a class that holds onto a DataFrame internally, may be fine for your own code, but it wonā€™t interact well with the rest of the ecosystem so your code will be littered with lines extracting and repacking the underlying DataFrame.

With this concept in place, let us introduce todayā€™s packageā€¦ šŸ„šŸ„šŸ„

2ļøāƒ£Ā Ā Ā pandas_flavor: DIY custom DataFrame methods

šŸ“†Ā Last updated: July 2023
ā¬‡ļøĀ Ā Downloads: 53,651/week
āš–ļøĀ Ā License: MIT
šŸĀ PyPIĀ |Ā Ā ā­ GitHub Stars: 288

šŸ” What is it?

A simpler API was added in pandas 0.23 for registering methods (and accessors) to DataFrames and Series. This library makes it easy to add your own custom functionality to any DataFrame, or even to share a custom DataFrame analytics suite within your team by registering your analytics class under a single namespaced accessor.

šŸ“¦ Install

pip install pandas-flavor

šŸ› ļø Use

Because itā€™s Christmas (in 23 days!), letā€™s make a working version of our Santa Claus example. As always, you can find the full notebook in the GitHub repo for this advent calendar. Note that weā€™re using the wonderful Faker package to generate some random names.

# Imports & setup
import pandas as pd
from faker import Faker

fake = Faker()


# A normal Python function, for now...


def make(df: pd.DataFrame, item: str, n=4) -> pd.DataFrame:
    """Add a column called `item` to a pandas DataFrame, with n rows."""
    return df.assign(item=[fake.name() for i in range(n)])

We can call this function on an empty DataFrameā€¦

santa_claus = pd.DataFrame()
make(santa_claus, item="names", n=5)

ā€¦or we can do this using the .pipe() method:

santa_claus.pipe(make, item="list")

Time for some šŸŽ©Ā pandas_flavor magicĀ šŸŖ„, letā€™s add a custom pandas method called .make() by adding a single decorator to our function:

import pandas_flavor as pf


@pf.register_dataframe_method
def make(df, item, n=4):
    return df.assign(item=[fake.name() for i in range(n)])

santa_claus.make(item="list", n=2)

Finally, hereā€™s a full working demo, you can find the code in this GitHub repo.

First, a little revisionā€¦hereā€™s what weā€™ve seen so far:

Now letā€™s write functions for make(), check(), find_out() and come_to()

And finally, letā€™s unlock that sweet taste of šŸ¼Ā idiomatic pandasĀ šŸ¼Ā 

šŸ“ŗ Stay Tuned for More! šŸ‘€

If youā€™re enjoying this Python Advent Calendar so far, think to yourself: who else would like this? Share it with them! Forward this email, share it via WhatsApp with your family, tag us on Twitter (@CoefficientData) or LinkedIn, whatever works. Donā€™t forget we also have theĀ python-advent-calendar GitHub repo full of code examples from every newsletter.

See you tomorrow! šŸ

Your Python Advent Calendar Team šŸ˜ƒĀ 

šŸ¤– Python Advent Calendar is brought to you by Coefficient, a data consultancy with expertise in data science, software engineering, devops, machine learning and other AI-related services. We code, we teach, we speak, weā€™re part of the PyData London Meetup team, and we love giving back to the community. If youā€™d like to work with us, just email [email protected] and weā€™ll set up a call to say hello. ā˜Žļø

P.S. We love feedback! Did you like today's content? Did we miss a good Python package? Is there a package or tool or top tip you think we should feature? Whatever it is, reach out to us. Hereā€™s a link to our privacy policy.

Ā¹ Other good chains: the All Saints BBC Radio 2 live version, and (for the band name alone) the cover by Meetwood Flac.