Efficient and performance-portable vector software CPUs provide SIMD/vector instructions that apply the same operation to multiple data items. This can reduce energy usage e.g. fivefold because fewer instructions are executed. We also often see 5-10x speedups. Code: https://github.com/google/highway Paper: https://arxiv.org/abs/2205.05982v1 Testing: github.com/google/…ocess.md @ai_machinelearning_big_data