- Python Advent Calendar
- Posts
- Day 4: Everyone's utils.py
Day 4: Everyone's utils.py
"If it isn't built-in, it's a bolton"
đ Python Advent Calendar: Day 4! đ
Yesterday, we covered PyJanitor, a library that extends pandas with a variety of handy methods for data cleaning. Behind todayâs door we introduce you to its Python-only equivalent, boltons. With 250+ helper functions available, itâs likely that thereâs a function somewhere inside that youâve written yourself, and some other function that youâll need soon. Donât reinvent the wheel, see if it exists already, and import it instead!
Boltons: âeveryoneâs utils.pyâ
đ Last updated: December 2023
âŹď¸ Downloads: 687,184/week
âď¸ License: BSD-2 Clause
đ PyPI | â GitHub Stars: 6.3k
đ What is it?
The goal of Boltons is to include functionality that âshould be in the standard libraryâ but isnât, containing 250+ utility functions across 26 modules:
đ Enhancements to the standard library modules such as itertools (iterutils), functools (funcutils), datetime (timeutils), and urllib (urlutils).
𧰠Convenience methods for specific types such as dictionaries (dictutils), strings (formatutils and strutils), lists (listutils), 2D tables (tableutils), containers (namedutils), and sets (setutils).
đ¤ Helper utils for working with the filesystem (fileutils and pathutils), I/O (ioutils), JSON (jsonutils), object caching (cacheutils), and type handling (typeutils).
đ Specialised functions for mathematics (mathutils) and statistics (statsutils).
đ Advanced libraries for debugging (debugutils), garbage collection (gcutils), mailboxes (mboxutils), priority queues (queueutils), sockets (socketutils), tracebacks (tbutils).
đŚ Install
pip install boltons
đ ď¸ Use
1. Small batch processing
If youâve read The Lean Startup by Eric Ries, youâll know the most efficient way to prepare and send Christmas cards is in small batches. Itâs a good time of year to test how âleanâ your family & friends are by asking them this question:
Whatâs the fastest way to take 100 Christmas letters and for each to fold it, put it in an envelope, seal it, address it, and stamp it?
A) The whole process, one at a time.
B) Fold all the letters, put them in envelopes, seal them, and so on.
C) Option B, but in small batches of 5-10 letters.
The most common answer is B (batch process each step), yet multiple studies have confirmed that option A (do one letter at a time) gets there faster. If youâre sceptical, watch this video. My own hard-won advice from 13 years at the frontline of data science & software engineering is that:
Working in small batches is faster. This is a key principle of lean manufacturing: to reduce âWIPâ (Work In Progress); otherwise, you end up spending valuable time moving around piles of WIP and discussing the WIP on your Jira board.
Adopt a âpipe cleaningâ development approach. Eric Ries makes the excellent point that you may find out that the letters donât fit in the envelopes. Would you rather find that out immediately or wait until youâve spent time folding all the letters first?
Boltons doesnât help with the philosophy of work management, but it does have a handy feature for chunking large lists into batches. Use cases involve writing your own minibatch logic for deep learning, or batching requests for API calls.
from boltons.iterutils import chunked from faker import Faker fake = Faker() christmas_cards = [fake.name() for _ in range(20)] for batch in chunked(christmas_cards, 3): print(f"Mailing batch: {batch}") Mailing batch: ['Diane Mcdowell', 'Tony Cross', 'Robert Yu'] Mailing batch: ['Thomas Jones', 'Lisa Johnson', 'Ralph Munoz'] Mailing batch: ['Peggy Berry', 'Marcus Martinez', 'Kathleen Walton'] Mailing batch: ['Daniel Stevens', 'Michael Cobb', 'Ashley Blair'] Mailing batch: ['Kathryn Jones', 'James Rodriguez', 'Christine Espinoza'] Mailing batch: ['Todd Kaiser', 'Ashley Pace', 'Tammy Dougherty'] Mailing batch: ['Evelyn Delgado', 'Scott Wang']
2. Window functions
Windowing operations come built-in with pandas but not Python itself. Applications range from calculating statistics such as a moving average, time series modelling, and natural language processing (NLP) techniques such as n-gram detection. Hereâs a simple approach for detecting n-grams in text:
from boltons.iterutils import windowed lyrics = "I don't want a lot for Christmas" for combo in windowed(lyrics.split(), 2): print(combo) # ('I', "don't") # ("don't", 'want') # ('want', 'a') # ('a', 'lot') # ('lot', 'for') # ('for', 'Christmas')
Letâs calculate the top 2-grams across the whole song:
words = lyrics.lower().split() two_grams = windowed(words, 2) ( pd.Series(two_grams) .value_counts(ascending=True) .tail() .plot(kind='barh', color='red') )
The most popular 2-grams in the most popular Christmas song
3. Unique values
You may already be a fan of using set()
or np.unique()
or pd.Series.unique()
to identify the unique values within an iterable. The boltons unique()
function is handy for when youâre not using numpy/pandas, and want to preserve the order in which each element is encountered.
from boltons.iterutils import unique print("".join(set(lyrics))) # kpei-)dlrs'HoTSbaYD,NIcMPnuvOj?yhgfCw W(tmA print("".join(unique(lyrics))) # "I don'twalfrChismTejugcbpyMvkAY()DS,ONPHW?-"
4. Flattening lists of lists
If youâre like me, you keep lists upon lists of gift ideas. Lists of the best board games, soothing music, and other lists of cool stuff. Use the flatten()
function from boltons.iterutils to flatten your lists-inside-lists-inside-lists into a single shopping list ready for prioritisation and allocation.
from boltons.iterutils import flatten gift_ideas = [ ["Lego lighthouse", "Azul board game"], [ "Programmer socks", [ "Python scarf", [ "Type-hinted tshirt", "Suntrap album", ], ], ], "Raspberry Pi", ] print(list(flatten(gift_ideas))) # ['Lego lighthouse', 'Azul board game', # 'Programmer socks', 'Python scarf', # 'Type-hinted tshirt', 'Suntrap album', # 'Raspberry Pi']
5. Exponential backoff
Perhaps youâll be using the ChatGPT API to write customised cards and poems for everyone this Christmas? (We made a tutorial if you want to learn how.) For those of you with thousands of friends, you may need to add some defensive time.sleep()
lines into your code alongside an âexponential backoffâ within your retry logic. The boltons.itertools.backoff function handles the numbercruching for you, returning a list of geometrically-increasing floating-point numbers:
from boltons.iterutils import backoff times = backoff( start=1, stop=10, count=5, factor=3, jitter=True, ) print(times) # [0.5496827240064363, 2.6675852562747178, # 7.326622231605776, 0.14987974382568758, # 9.861013287132742] print(backoff(start=1, stop=30, count=5, factor=4)) # [1.0, 4.0, 16.0, 30.0, 30.0]
6. Create an in-memory cache
There are situations where you want to store information in-memory, but have to be selective about what you keep. When your kitchen fridge runs out of space, what do you discard? The items you use least frequently (âLeast Recently Usedâ), or the items that are oldest (âLeast Recently Insertedâ)?
Boltons provides a few utilities for caching, the LRU Cache (âLeast Recently Usedâ) and the LRI Cache (âLeast Recently Insertedâ). The idea is to cache a preset number of items (say, 5 items) and, when the cache is full, to discard the item that was least recently used or least recently inserted. Hereâs how the LRU Cache works:
from boltons.cacheutils import LRU recipe_cache = LRU(max_size=2) recipe_cache["Gingerbread"] = ( "Flour, Ginger, Molasses, Sugar, " "Butter, Baking soda, Cinnamon, Cloves" ) recipe_cache["Mince Pie"] = ( "Mincemeat (dried fruits & spices), " "Pastry dough, Sugar (for dusting, optional)" ) # What's in our recipe cache? print(dict(recipe_cache).keys()) # ['Gingerbread', 'Mince Pie'] # We can access the gingerbread ingredients print(recipe_cache["Gingerbread"]) # Flour, Ginger, Molasses, Sugar, Butter, Baking soda, Cinnamon, Cloves # Inserting a new entry will discard the Gingerbread entry recipe_cache["Christmas Pudding"] = ( "Currants, Raisins, Suet, Brown sugar, " "Breadcrumbs, Flour, Mixed spice, " "Candied peel, Eggs, Stout" ) print(recipe_cache.get("Mince Pie")) # None
7. Memoization
Whether youâre mass-producing Christmas gifts with generative AI, or implementing your own travelling salesman solver to optimise your Christmas party travel plans, developing against any API can be a slow experience unless you can cache your API calls. The concept of memoization is incredibly valuable to learn and apply:
Memoization is an optimization technique used to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again.
Boltons provides utilities for memoization that use your caching tool of choice. Hereâs an example using a âslowâ function that takes a long time to wrap presents. When it sees the same input, it remembers this and immediately returns the output calculated previously.
import time from tqdm import tqdm from boltons.cacheutils import cached, LRU my_cache = LRU() @cached(my_cache) def wrap_gifts(total): # Assume some complex calculation for _ in tqdm(range(total)): time.sleep(0.5) return "đ" * total print(wrap_gifts(20))
boltons also provides tools for memoizing methods with cachedmethod
and class properties with cachedproperty
8. String utils
boltons includes a wide variety of helper utilities for so many things Iâve handwritten myself. Hereâs an in-exhaustive selection, from slugifiers to pluralizers to a nicer solution for string replacement than a long list of .replace(a, b).replace(c, d).replace(e, f)
!
from boltons.strutils import ( camel2under, under2camel, slugify, split_punct_ws, ordinalize, pluralize, singularize, find_hashtags, MultiReplace, ) camel2under("AllIWantForChristmasIsYou") # 'all_i_want_for_christmas_is_you' under2camel("all_i_want_for_christmas_is_you") # 'AllIWantForChristmasIsYou' slugify("Day 4: Everyone's utils.py") # 'day_4_everyone_s_utils_py' split_punct_ws("Day 4: Everyone's utils.py") # ['Day', '4', 'Everyone', 's', 'utils', 'py'] ordinalize(4) # '4th' pluralize("python") # pythons pluralize("sheep") # 'sheep' pluralize("util") # 'utils' pluralize("utils") # đ¤ # 'utilses' singularize("pythons") # 'python' singularize("FEET") # 'FOOT' # Using .replace() "Day 4: Everyone's utils.py".replace( "Day 4", "Today" ).replace("Everyone's", "Your new") # 'Today: Your new utils.py' # Using MultiReplace from boltons MultiReplace( { "Day 4": "Today", "Everyone's": "Your new", } ).sub("Day 4: Everyone's utils.py") # 'Today: Your new utils.py'
âď¸ What are your favourite boltons? đ§
Thatâs it for todayâs update! Thereâs so much included in boltons that we didnât cover. If thereâs a boltons feature you think we should have included, why not â¤ď¸ share the link â¤ď¸ to this newsletter on LinkedIn or Twitter and tell us about it!
If youâre enjoying this series, or have feedback or ideas on how we can make it better, please reach out to us via [email protected] or @CoefficientData on Twitter.
See you tomorrow! đ
Your Python Advent Calendar Team đ
đ¤ Python Advent Calendar is brought to you by Coefficient, a data consultancy with expertise in data science, software engineering, devops, machine learning and other AI-related services. We code, we teach, we speak, weâre part of the PyData London Meetup team, and we love giving back to the community. If youâd like to work with us, just email [email protected] and weâll set up a call to say hello. âď¸