Universal GPU Acceleration for C++ Parallel Algorithms
Zero code changes. Any GPU. Pure Vulkan.
// Use parallax::allocator for GPU-accessible memory
std::vector<float, parallax::allocator<float>> data(1'000'000, 1.0f);
// Just use std::execution::par - runs on GPU automatically!
std::for_each(std::execution::par,
data.begin(), data.end(),
[](float& x) { x *= 2.0f; });
// Memory coherence is automatic - no explicit sync needed!
// Works on AMD, NVIDIA, Intel - anything with Vulkan
Works on AMD, NVIDIA, Intel, Qualcomm, ARM Mali - any GPU with Vulkan 1.2+
Direct Vulkan compute. No translation layers, no runtime interpretation
Unified memory with automatic host-device synchronization
Apache 2.0 licensed. Community-driven development
Production benchmarks on NVIDIA GTX 980M (December 2025)
| Dataset Size | std::for_each | std::transform | Efficiency |
|---|---|---|---|
| 1K elements | 0.18 M/s | 0.38 M/s | Low (overhead) |
| 10K elements | 36.77 M/s | 36.63 M/s | Good |
| 100K elements | 228 M/s | 243 M/s | Excellent |
| 1M elements | 744 M/s | 732 M/s | Excellent |