Threading
About
Threading in Python is a way to achieve concurrency by running multiple threads within a single process. Python's threading
module provides tools to create and manage threads, allowing multiple tasks to be performed concurrently.
Thread
A thread is a lightweight, independent (and smallest) unit of execution within a process. Threads share the same memory space of the process, which allows for efficient communication but also introduces challenges like race conditions. For simplicity, we can assume that a thread is simply a subset of a process :))
Important Notes
While it might be tempting to think of threading as running two or more processors simultaneously on the same program, threads DO NOT actually execute in parallel within a single process
Threads do not truly run simultaneously in Python
- Only one thread executes Python bytecode at a time in CPython
- This is caused by Global Interpreter Lock (GIL), a lock that ensures only one thread can execute Python code at a time, even on a multi-core processor
- Threads are actually just taking turns executing Python Code
Parallelism requires special assistance
- If we want multiple tasks to truly run at the same time (parallelism), we have to bypass the GIL. This can be achieved with non-standard Python implementations (
Jython
etc) or withmultiprocessing
module, which creates separate processes instead of threads. Each process has its own Python interpreter and memory space, so they can run in parallel without being affected by the GIL
Implementation
Creating A Thread
To start a separate thread, we create a Thread
instance and then tell it to .start()
Example from codecademy
import threading, time, random
# simulates waiting time (e.g., an API call/response)
def slow_function(thread_index):
time.sleep(random.randint(1, 10))
print("Thread {} done!".format(thread_index))
def run_threads():
threads = []
for thread_index in range(5):
individual_thread = threading.Thread(target=slow_function, args=(thread_index,))
threads.append(individual_thread)
individual_thread.start()
# at this point threads are running independently from the main flow of application and each other
print("Main flow of application")
# This ensures that all threads finish before the main flow of application continues
for individual_thread in threads:
individual_thread.join()
print("All threads are done")
run_threads()
This results in the following output
Main flow of application
Thread 1 done!
Thread 4 done!
Thread 3 done!
Thread 2 done!
Thread 0 done!
All threads are done
Synchronization
Threads share the same memory space, so data inconsistencies may arise. Thread synchronization acts as a mechanism which ensures that two or more concurrent threads do not simultaneously execute some particular program segment known as the critical section.
In python, synchronization tools like Lock
help manage access to shared resources
lock = threading.Lock()
def safe_increment(counter):
with lock:
counter[0] += 1
Daemon Threads
A daemon thread runs in the background and terminates when the main program exits
def background_task():
while True:
print("Running in the background...")
time.sleep(2)
daemon_thread = threading.Thread(target=background_task, daemon=True)
daemon_thread.start()
Use Cases
- IO Bound Tasks: Network requests, file I/O, or database operations that spend time waiting for external resources
- Non-CPU-Bound Operations: Tasks where the GIL won't significantly impact performance, as threads can perform concurrent operations while waiting
Multithreading
Multithreading in Python involves running multiple threads concurrently within a single process. Threading is the broader concept, and multithreading is a specific case of it involving multiple threads