Skip to content

Technology · Python

Python's Free-Threaded Mode: What the GIL Removal Actually Means for Your Code

Python 3.13 shipped an experimental mode that removes the Global Interpreter Lock. Here's what the GIL actually does, what free-threaded Python changes, and what it still doesn't fix.

Anurag Verma

Anurag Verma

7 min read

Python's Free-Threaded Mode: What the GIL Removal Actually Means for Your Code

Sponsored

Share

The Global Interpreter Lock has been Python’s most complained-about feature for 30 years. It prevents more than one thread from executing Python bytecode at a time, which means CPU-bound multi-threaded programs in CPython don’t actually run in parallel. They take turns.

Python 3.13, released in October 2024, ships with an experimental free-threaded build that removes the GIL entirely. Python 3.14 continues stabilizing it. The change is real, it’s in the official CPython distribution, and it affects how you should think about concurrency in Python going forward.

What the GIL Actually Does

The GIL is a mutex that protects access to Python objects. Because CPython’s memory management is not thread-safe at the object level, the GIL ensures only one thread modifies Python state at a time.

The consequence: two threads running CPU-bound Python code don’t run simultaneously on two cores. One acquires the GIL, runs a chunk of bytecode, releases it, and the other thread gets a turn. The OS might schedule them on separate cores, but the GIL forces them to take turns at the Python layer.

This isn’t a problem for I/O-bound work. When a thread waits on a network socket or file read, it releases the GIL, letting other threads run Python bytecode freely. asyncio, threads with httpx, and database calls all perform fine under the GIL because waiting for I/O doesn’t hold it.

CPU-bound parallelism was the problem. Encoding video, running numerical computations without NumPy, parsing large JSON files, training a model in pure Python: none of these benefited from threading. The standard workaround was multiprocessing, which uses separate processes (each with its own GIL) at the cost of memory duplication and inter-process serialization overhead.

Free-Threaded Python 3.13

Python 3.13 ships two separate builds:

  • The standard CPython build, with the GIL intact
  • A free-threaded build (the t variant), with the GIL disabled

On most systems, you install the free-threaded build separately. On macOS with Homebrew it’s python@3.13t. Python.org installers include an option for the free-threaded build. On Linux, pyenv can install it with pyenv install 3.13t.

To check if the GIL is enabled at runtime:

import sys

if hasattr(sys, "_is_gil_enabled"):
    print("GIL enabled:", sys._is_gil_enabled())
else:
    print("Running standard CPython (GIL always on)")

You can also disable the GIL at startup on a free-threaded build if you want the option to enable it temporarily for compatibility:

PYTHON_GIL=0 python3.13t your_script.py

With the GIL off, multiple threads can execute Python bytecode simultaneously on separate cores.

What Actually Changes

Here’s a direct comparison. A function that counts work CPU-bound:

import threading
import time

def count_up(n):
    total = 0
    for i in range(n):
        total += i
    return total

N = 50_000_000
thread_count = 4

start = time.perf_counter()
threads = [threading.Thread(target=count_up, args=(N,)) for _ in range(thread_count)]
for t in threads:
    t.start()
for t in threads:
    t.join()
elapsed = time.perf_counter() - start

print(f"{thread_count} threads, elapsed: {elapsed:.2f}s")

On standard CPython 3.13 with 4 threads, this takes roughly the same time as running single-threaded. The GIL serializes the threads. On the free-threaded build with PYTHON_GIL=0, all 4 threads run on separate cores, and the wall-clock time drops proportionally to core count.

For multiprocessing users doing CPU-bound parallelism, this is significant. Threads are lighter than processes: no need to serialize data across process boundaries with pickle, no separate memory space, no fork overhead. Shared-memory parallelism becomes practical in pure Python for the first time.

What It Doesn’t Fix

Free-threaded Python is not a silver bullet.

Single-threaded code gets slower. The GIL simplified CPython’s memory management. Without it, CPython must use fine-grained per-object locking and other synchronization primitives to stay thread-safe. In 3.13, single-threaded free-threaded builds are typically 5-15% slower than the GIL version. The CPython team is actively working to close this gap in 3.14 and beyond.

Not all C extensions are thread-safe. Thousands of Python packages include C extensions. Those extensions were written assuming the GIL would protect them. When the GIL is off, an extension that uses Python’s C API without its own locking can corrupt memory or crash. Extension authors need to audit and update their code. The ecosystem is catching up, but it’s not complete.

NumPy, for example, released partial support for no-GIL operation. Cython added flags for free-threaded support. Check the status of any critical dependency before depending on free-threaded mode in production.

I/O-bound code sees no benefit. If your bottleneck is network calls, database queries, or file I/O, the GIL was already releasing during those waits. asyncio remains the right model for I/O concurrency. Free-threaded mode doesn’t change that calculus.

Race conditions are now your problem. The GIL provided a kind of accidental thread safety: many operations that touched Python objects were atomic at the bytecode level because only one thread ran at a time. With the GIL off, you need actual locks for shared mutable state:

import threading

counter = 0
lock = threading.Lock()

def increment(n):
    global counter
    for _ in range(n):
        with lock:
            counter += 1  # Without the lock, counter += 1 is a race condition

The counter += 1 operation is not atomic. It’s a read, add, and write. Two threads can interleave those steps and lose increments. The GIL used to make this unlikely (though not impossible). Without it, it’s guaranteed to happen eventually.

Using concurrent.futures with Free-Threaded Python

The concurrent.futures.ThreadPoolExecutor is the cleanest way to use threads for CPU-bound work:

from concurrent.futures import ThreadPoolExecutor, as_completed
import sys

def process_chunk(data_chunk):
    # CPU-bound work
    return sum(x * x for x in data_chunk)

data = list(range(10_000_000))
chunk_size = len(data) // 4
chunks = [data[i:i+chunk_size] for i in range(0, len(data), chunk_size)]

with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(process_chunk, chunk) for chunk in chunks]
    results = [f.result() for f in as_completed(futures)]

total = sum(results)
print(f"Total: {total}")

On standard CPython, this gets no speedup over a single-threaded loop. On free-threaded CPython with PYTHON_GIL=0, all 4 workers run in parallel.

The same code, no changes, different behavior. That’s the appeal.

When to Actually Use It

For most production Python work today, the recommendation is to wait. The 3.13 free-threaded mode is marked experimental. Extension compatibility is incomplete. The per-thread overhead exists.

The cases where it makes sense to experiment now:

  • Pure Python CPU-bound workloads with no C extension dependencies. Text processing pipelines, algorithmic work, data transformation in standard library types.
  • Libraries you control where you can audit for thread safety yourself.
  • Internal tools where a crash is annoying, not a production incident.

For production services handling user traffic, stick with the GIL build until Python 3.14 stabilizes free-threaded mode further and the ecosystem catches up. The CPython team’s roadmap aims for free-threaded to be the default eventually, but “eventually” isn’t 2026.

For multiprocessing users doing heavy CPU parallelism, it’s worth benchmarking your specific workload on 3.13t now. If your critical path is pure Python with minimal extension use, you might find free-threaded threads faster than process pools while being significantly easier to manage.

The Longer View

Removing the GIL is one of the most significant structural changes to CPython since Python 2 to 3. It took years of design work (PEP 703, by Sam Gross), careful reference-counting rewrites, and a lot of correctness testing to land.

The full benefit won’t arrive in one release. The ecosystem needs time to update extensions, tooling needs to understand free-threaded builds, and the CPython interpreter itself will keep getting faster in this mode with each release.

What’s already true: for the first time, threading is a credible choice for CPU-bound parallel work in Python. That changes how experienced Python developers will structure compute-heavy code, and it makes the language more competitive in workloads where threads were previously non-starters.

Sponsored

Enjoyed it? Pass it on.

Share this article.

Sponsored

The dispatch

Working notes from
the studio.

A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.

No spam, ever. Unsubscribe anytime.

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored