Explicit Vector Intrinsics
One solution to this problem is to use vector intrinsics, which look like normal C functions, but have a one-to-one mapping with vector instructions. This isn’t a new concept; the square root instruction on architectures such as x86 is typically generated in the same way by compilers.
Using vector intrinsics has a significant disadvantage, however. Vector instruction sets differ between architectures. Using intrinsics, which have a one-to-one mapping with instructions, restricts your code to one architecture. It also prevents your code from working on older chips that don’t have a vector unit.
What’s really needed is a method of writing vector code that isn’t tied to a specific instruction set. Sadly, the C specification doesn’t provide this mechanism. Fortunately, however, the GNU Compiler Collection (GCC) does.