> What sort of performance gains did you see from switching to the BCA
> format, and which CPUs specifically had this issue, by the way?

I think currently few CPUs have this issue, and probably I do not have
access to any of them anyway, so I did not bother to measure. But note
that the change in our case was completely neutral in any other aspect
(the code is neither more complicated, nor larger, nor anything at
all). So, with all other things being completely equal, why not to
choose the option that may give you (even with a very low probability) a
slight advantage?

(If I knew I would have to justify the change, I probably would not have
done it, since just the job of explaining it is enough to offset those
frail gains :-)

-- Roberto