He claims to be using FMA operations in his program, which is the program in question that crashes using FMA operations, and he goes on to explain that other programs that use FMA operations do not crash. This is why I attributed it to user programmer error, or compiler error.
What if it's neither of those things? The crash could be caused by a design fault in the CPU, something which may or may not be possible to be worked around in software/firmware without a serious performance penalty, or perhaps at all. This is not purely theoretical, it has actually happened in practice in a real-world program which was not specifically designed to misbehave. Luckily there was a pain-free workaround in that case.
Using optimizations for an architecture other than the one you're currently working with is known to cause problems.
It should never happen in theory and in practice it only happens exceptionally rarely, assuming "problems" doesn't include slowdowns, something which can easily happen, i.e. optimizing for one processor can result in code being inadvertently deoptimized for another. Optimizations which work for one processor should never cause another to crash.