Production benchmarks on NVIDIA GTX 980M (December 2025)
GPU: NVIDIA GeForce GTX 980M
API: Vulkan 1.2
Driver: Latest NVIDIA drivers
OS: Linux
Performance with [](float& x) { x = x * 2.0f + 1.0f; } lambda
| Dataset Size | Time (ms) | Throughput (M elem/s) | Efficiency |
|---|---|---|---|
| 1,000 | 5.57 | 0.18 | Low (overhead) |
| 10,000 | 0.27 | 36.77 | Good |
| 100,000 | 0.44 | 228.06 | Excellent |
| 1,000,000 | 1.34 | 744.36 | Excellent |
Performance with [](float x) { return x * 2.0f + 1.0f; } lambda
| Dataset Size | Time (ms) | Throughput (M elem/s) | Efficiency |
|---|---|---|---|
| 1,000 | 2.61 | 0.38 | Low (overhead) |
| 10,000 | 0.27 | 36.63 | Good |
| 100,000 | 0.41 | 243.33 | Excellent |
| 1,000,000 | 1.37 | 732.34 | Excellent |
================ Parallax CTS Results ================
Total Tests: 47
Passed: 47
Failed: 0
Success Rate: 100%
Category Breakdown:
algorithms: 22/22 ✓
memory: 15/15 ✓
performance: 10/10 ✓
======================================================
All tested patterns work correctly
| Pattern | Example | Result |
|---|---|---|
| Compound multiply | x *= 2.0f |
✅ PASS |
| Compound add | x += 3.0f |
✅ PASS |
| Explicit assign | x = x * 2.0f |
✅ PASS |
| Complex expr | x = x*2 + 1 |
✅ PASS |
| Division | x /= 2.0f |
✅ PASS |
| Subtraction | x -= 1.0f |
✅ PASS |
| Return value | return x * 2.0f |
✅ PASS |
| Component | Time (ms) | Notes |
|---|---|---|
| Kernel load | ~10 | One-time, cached |
| Kernel launch | 1-2 | Per invocation |
| GPU execution | 0.3-1.5 | Scales with data |
| Sync back | <0.1 | Unified memory |
Break-even point: ~5K-10K elements
| Feature | Parallax | CUDA | OpenCL | TBB |
|---|---|---|---|---|
| Source changes | None | Major | Major | None |
| ISO C++ compliance | 100% | 0% | 0% | 100% |
| GPU vendor support | All | NVIDIA only | All | N/A |
| Ease of use | Excellent | Poor | Poor | Excellent |
| Performance | Good | Excellent | Good | CPU-only |
| Algorithm | Status | Tested | Performance |
|---|---|---|---|
std::for_each |
✅ Production | 100% | 744 M/s |
std::transform |
✅ Production | 100% | 732 M/s |
std::reduce |
⏳ Planned | - | - |
100% test pass rate
700+ M elem/s on large datasets
No crashes or memory leaks
Works on NVIDIA, AMD, Intel GPUs
100% ISO C++20 compliant