bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
probably the best choice for any system is:
bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
-mfpmath=both
only works with gcc, not clang. I'm not sure what the default -O2
or -O3
setting is. gcc -O3
enables auto-vectorization, but that won't always help, and could make it slower.
What this does: --copt
for bazel build
passes an option directly to gcc for compiling C and C++ files (but not linking, so you need a different option for cross-file link-time-optimization)
x86-64 gcc defaults to using only SSE2 or older SIMD instructions, so you can run the binaries on any x86-64 system. (See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). That's not what you want. You want to make a binary that takes advantage of all the instructions your CPU can run, because you're only running this binary on the system where you built it.
-march=native
enables all the options your CPU supports, so it makes -mavx512f -mavx2 -mavx -mfma -msse4.2
redundant. (Also, -mavx2
already enables -mavx
and -msse4.2
, so Yaroslav's command should have been fine). Also if you're using a CPU that doesn't support one of these options (like FMA), using -mfma
would make a binary that faults with illegal instructions.
TensorFlow's ./configure
defaults to enabling -march=native
, so using that should avoid needing to specify compiler options manually.
-march=native
enables -mtune=native
, so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads.
This all applies to gcc, clang, or ICC.