Python Concurrency: A Guide to Threads, Processes, and Asyncio | Chandrashekhar Kachawa | Tech Blog

Python Concurrency: A Guide to Threads, Processes, and Asyncio

python

Your Python script needs to do multiple things at once to be faster. But how? Python offers a rich but sometimes confusing landscape of concurrency and parallelism tools. Should you use threads, processes, or asyncio?

Choosing the right tool for the job is the key to writing efficient, scalable code. This guide will walk you through the three main concurrency models in Python, explaining what they are, how to use them, and when to choose each one.

The Foundation: concurrent.futures

For traditional, blocking code, Python’s concurrent.futures module provides a beautiful, high-level API for managing pools of threads and processes. It introduces the “Executor” pattern, where you submit jobs to a pool and retrieve the results.

1. ThreadPoolExecutor: For I/O-Bound Work

When to use it: When your task spends most of its time waiting for external resources. This is called I/O-bound work. Examples include:

  • Making network requests (e.g., calling APIs, scraping websites).
  • Reading or writing from a slow disk.
  • Querying a database.

Due to Python’s Global Interpreter Lock (GIL), threads are not suitable for speeding up CPU-heavy tasks, but they are perfect for I/O-bound tasks because they can “wait” concurrently.

Syntax Example: Let’s download the content of several web pages.

import requests
from concurrent.futures import ThreadPoolExecutor

URLS = [
    "https://www.python.org/",
    "https://www.djangoproject.com/",
    "https://flask.palletsprojects.com/",
]

def fetch_url(url: str):
    print(f"Fetching {url}...")
    response = requests.get(url)
    print(f"Fetched {url} with status {response.status_code}")
    return len(response.content)

with ThreadPoolExecutor(max_workers=5) as executor:
    # The map function runs `fetch_url` for each item in URLS
    results = executor.map(fetch_url, URLS)

for url, length in zip(URLS, results):
    print(f"URL: {url}, Length: {length}")

The thread pool allows all three requests.get() calls to happen concurrently, dramatically speeding up the total execution time.

2. ProcessPoolExecutor: For CPU-Bound Work

When to use it: When your task is doing heavy computation and maxing out a CPU core. This is called CPU-bound work. Examples include:

  • Complex mathematical calculations.
  • Image or video processing.
  • Data analysis on large datasets.

Processes run in separate memory spaces and have their own Python interpreter, which allows them to bypass the GIL and run on different CPU cores in true parallel.

Syntax Example: Let’s perform a heavy calculation on a list of numbers.

from concurrent.futures import ProcessPoolExecutor

def heavy_calculation(num: int):
    print(f"Calculating for {num}...")
    # A silly, CPU-intensive task
    return sum(i * i for i in range(num))

numbers_to_process = [100_000, 150_000, 200_000]

with ProcessPoolExecutor() as executor:
    results = executor.map(heavy_calculation, numbers_to_process)

for num, result in zip(numbers_to_process, results):
    print(f"Calculation for {num} returned {result}")

Notice the syntax is nearly identical to the thread pool example! concurrent.futures makes it easy to switch between them.

3. The Third Model: asyncio

What if your code is already async and uses libraries like httpx or asyncpg? In this case, you don’t need threads or processes. asyncio has its own, more efficient tools for managing I/O-bound concurrency.

Managing Async Tasks: gather and TaskGroup

Let’s say we have a simple async function:

async def say_after(delay: int, what: str):
    await asyncio.sleep(delay)
    print(what)

asyncio.gather

gather is a high-level utility to run multiple awaitable objects concurrently.

Gathering Coroutines (most common): You can pass coroutine objects directly to gather. It will automatically schedule them as tasks.

import asyncio

# This runs both `say_after` calls concurrently
await asyncio.gather(
    say_after(1, "hello"),
    say_after(2, "world")
)

Gathering Tasks: You can also create tasks explicitly with asyncio.create_task and then gather them. This gives you more control if you need to interact with the Task objects before they are complete.

task1 = asyncio.create_task(say_after(1, "hello"))
task2 = asyncio.create_task(say_after(2, "world"))
await asyncio.gather(task1, task2)

asyncio.TaskGroup (The Modern, Safe Way)

Introduced in Python 3.11, TaskGroup is a modern and safer way to manage concurrent tasks. It uses a context manager (async with) to create a scope, guaranteeing that all tasks created within it are awaited before the block is exited.

Benefits:

  • Structured Concurrency: No more “forgotten” tasks that run in the background.
  • Superior Exception Handling: If any task in the group fails, all other tasks are automatically cancelled.

Syntax Example:

async with asyncio.TaskGroup() as tg:
    tg.create_task(say_after(1, "hello"))
    tg.create_task(say_after(2, "world"))

print("Both tasks have now completed.")

For new asyncio code, TaskGroup is generally preferred over gather.

Summary: When to Use Which?

Is your task…And are you using…Your best tool is…
CPU-BoundAny Python codeProcessPoolExecutor
I/O-BoundBlocking libraries (e.g., requests, psycopg2)ThreadPoolExecutor
I/O-Boundasync libraries (e.g., httpx, asyncpg)asyncio (TaskGroup or gather)

Conclusion

Python provides powerful tools for every concurrency and parallelism need. The key is to correctly identify the nature of your task. By choosing the right model—Processes for CPU-bound work, Threads for blocking I/O, and Asyncio for non-blocking I/O—you can write efficient, scalable, and high-performance applications.

Latest Posts

Enjoyed this article? Follow me on X for more content and updates!

Follow @Ctrixdev