HIP parallel primitives for developing performant GPU-accelerated code on ROCm
https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocprim