Implementation:Turboderp org Exllamav2 ThreadPool
| Knowledge Sources | |
|---|---|
| Domains | Concurrency, C_Extension |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Header-only C++ library providing a ThreadPool class for asynchronous task execution and a Barrier class for thread synchronization, used throughout ExLlamaV2's C++ extension layer.
Description
threadpool.h defines two concurrency primitives:
ThreadPool implements a classic thread pool pattern with:
- A configurable number of worker threads created at construction time via ThreadPool(size_t threads).
- A task queue protected by a std::mutex and signaled via a std::condition_variable.
- A templated enqueue() method that accepts any callable with arguments and returns a std::future for the result, allowing callers to submit work and retrieve results asynchronously.
- Worker threads that loop indefinitely, waiting on the condition variable for new tasks. They exit cleanly when stop is set to true and the queue is drained.
- The destructor sets the stop flag, notifies all workers, and joins all threads to ensure clean shutdown.
Barrier implements a reusable synchronization barrier with:
- arrive_and_wait() -- Each thread increments a counter; when the counter reaches num_threads, the generation is advanced and all waiting threads are released via cv.notify_all(). Threads that arrive early wait on a condition variable gated by the generation counter, preventing spurious wakeups.
- reset(int new_num_threads) -- Dynamically changes the thread count, resets the counter, advances the generation to unblock any currently waiting threads, and notifies all.
Usage
The ThreadPool is used by ExtTPContext (tensor parallelism context) to dispatch parallel operations across multiple GPU devices. The Barrier is used for cross-device synchronization points during tensor-parallel inference, ensuring all devices have completed a phase before proceeding.
Code Reference
Source Location
- Repository: Turboderp_org_Exllamav2
- File: exllamav2/exllamav2_ext/cpp/threadpool.h
- Lines: 1-124
Signature
class ThreadPool
{
public:
ThreadPool(size_t threads);
~ThreadPool();
template<class F, class... Args>
auto enqueue(F&& f, Args&&... args)
-> std::future<typename std::result_of<F(Args...)>::type>;
};
class Barrier
{
public:
Barrier(int num_threads);
void arrive_and_wait();
void reset(int new_num_threads);
};
Import
#include "threadpool.h"
I/O Contract
| Class | Method | Input | Output | Description |
|---|---|---|---|---|
| ThreadPool | constructor | size_t threads | ThreadPool instance | Creates pool with specified number of worker threads |
| ThreadPool | enqueue(f, args...) | Callable + arguments | std::future<return_type> | Submits task, returns future for asynchronous result retrieval |
| ThreadPool | destructor | -- | -- | Sets stop flag, notifies all workers, joins all threads |
| Barrier | constructor | int num_threads | Barrier instance | Creates barrier for the specified number of participating threads |
| Barrier | arrive_and_wait() | -- | -- | Blocks until all threads have arrived; uses generation counter to prevent spurious wakeups |
| Barrier | reset(new_num_threads) | int new_num_threads | -- | Resets barrier for a new thread count, unblocks any waiting threads |
Usage Examples
#include "threadpool.h"
// Create a pool with 4 worker threads
ThreadPool pool(4);
// Submit tasks and collect futures
std::vector<std::future<int>> results;
for (int i = 0; i < 8; i++) {
results.push_back(pool.enqueue([i] {
// perform work on device i % 4
return i * i;
}));
}
// Collect results
for (auto& f : results) {
int result = f.get();
}
// Barrier usage for synchronizing 4 threads
Barrier barrier(4);
// Each thread calls:
barrier.arrive_and_wait(); // blocks until all 4 arrive
Related Pages
- Turboderp_org_Exllamav2_Ext_TP_H -- Tensor parallelism context that uses ThreadPool and Barrier