Testing Equality
One omission that’s particularly irritating in current versions is that you can’t test for equality of vectors. This is a shame, because it’s surprisingly difficult to write code for this testing. With AltiVec, the comparison instructions set a condition register, which can be branched on directly, while SSE requires a bit more ingenuity.
SSE doesn’t have instructions for testing equality on integers, only on floating-point values. Fortunately, we can use these anyway, because equality for floating-point quantities just means that the bits are all the same, just as it does for integers.
When we perform a comparison with SSE, we get a vector of 4 ints. Since these either have all bits set to 1 or all bits set to 0, we can just take the top bit from each one and put it into a scalar. If this is 0, C interprets it as false; otherwise, it’s true. With AltiVec, the effort is a little easier, because we can just use the vec_all_eq intrinsic to test whether all of the values are the same.
Finally, we have to implement a scalar version, in case someone tries to compile it on a machine without AltiVec or SSE. We end up with the following function:
INLINE int equal(v4si v1, v4si v2) { #if defined(__ALTIVEC__) return vec_all_eq((vector int)v1, (vector int)v2); #elif defined(__SSE__) v4si compare = __builtin_ia32_cmpeqps((v4sf)v1, (v4sf)v2); return __builtin_ia32_movmskps((v4sf)compare); #else int * s1 = (int*)&v1; int * s2 = (int*)&v2; return ( s1[0] == s2[0] && s1[1] == s2[1] && s1[2] == s2[2] && s1[3] == s2[3] ); #endif }
We can use this function anywhere we would otherwise use ==, if we were dealing with scalar quantities. The equal.c example uses this function; try changing the values of the two vectors to make sure it works:
$ gcc -std=c99 -msse equal.c $./a.out $ gcc -arch ppc -std=c99 -faltivec equal.c $ ./a.out $ gcc -std=c99 equal.c && ./a.out
In each instance, foo and bar had the same values.
Note that the -arch switch is used only when cross-compiling. In this example, I compiled the AltiVec/Power PC code path on an Intel Mac, and it then ran in emulation under Rosetta. This approach isn’t likely to give very good performance, but it’s quite convenient for testing multiple code paths on the same machine. Since it won’t give an accurate performance metric, however, it’s always better to test code like this on a real system.