Intel 386 and AMD x86-64 Options
2012-12-31 08:30
381 查看
These `-m' options are defined for the i386 and x86-64 family of computers:
genericProduce code optimized for the most common IA32/AMD64/EM64T processors. If you know the CPU on which your code will run, then you should use the corresponding -mtune option instead of -mtune=generic. But, if you do not know exactly
what CPU users of your application will have, then you should use this option.
As new processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, the code generated option will change to reflect the processors that were most common when that version of GCC
was released.
There is no -march=generic option because -march indicates the instruction set the compiler can use, and there is no generic instruction set applicable to all processors. In contrast, -mtune indicates the processor
(or, in this case, collection of processors) for which the code is optimized.
nativeThis selects the CPU to tune for at compilation time by determining the processor type of the compiling machine. Using -mtune=nativewill produce code optimized for the local machine under the constraints of the selected instruction set. Using -march=native will
enable all instruction subsets supported by the local machine (hence the result might not run on different machines).
i386Original Intel's i386 CPU.
i486Intel's i486 CPU. (No scheduling is implemented for this chip.)
i586, pentiumIntel Pentium CPU with no MMX support.
pentium-mmxIntel PentiumMMX CPU based on Pentium core with MMX instruction set support.
pentiumproIntel PentiumPro CPU.
i686Same as
pentium2Intel Pentium2 CPU based on PentiumPro core with MMX instruction set support.
pentium3, pentium3mIntel Pentium3 CPU based on PentiumPro core with MMX and SSE instruction set support.
pentium-mLow power version of Intel Pentium3 CPU with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks.
pentium4, pentium4mIntel Pentium4 CPU with MMX, SSE and SSE2 instruction set support.
prescottImproved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction set support.
noconaImproved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support.
core2Intel Core2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.
corei7Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 instruction set support.
corei7-avxIntel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support.
core-avx-iIntel Core CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C instruction set support.
atomIntel Atom CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.
k6AMD K6 CPU with MMX instruction set support.
k6-2, k6-3Improved versions of AMD K6 CPU with MMX and 3DNow! instruction set support.
athlon, athlon-tbirdAMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow! and SSE prefetch instructions support.
athlon-4, athlon-xp, athlon-mpImproved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow! and full SSE instruction set support.
k8, opteron, athlon64, athlon-fxAMD K8 core based CPUs with x86-64 instruction set support. (This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow! and 64-bit instruction set extensions.)
k8-sse3, opteron-sse3, athlon64-sse3Improved versions of k8, opteron and athlon64 with SSE3 instruction set support.
amdfam10, barcelonaAMD Family 10h core based CPUs with x86-64 instruction set support. (This supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit instruction set extensions.)
bdver1AMD Family 15h core based CPUs with x86-64 instruction set support. (This supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
bdver2AMD Family 15h core based CPUs with x86-64 instruction set support. (This supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
btver1AMD Family 14h core based CPUs with x86-64 instruction set support. (This supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit instruction set extensions.)
winchip-c6IDT Winchip C6 CPU, dealt in same way as i486 with additional MMX instruction set support.
winchip2IDT Winchip2 CPU, dealt in same way as i486 with additional MMX and 3DNow! instruction set support.
c3Via C3 CPU with MMX and 3DNow! instruction set support. (No scheduling is implemented for this chip.)
c3-2Via C3-2 CPU with MMX and SSE instruction set support. (No scheduling is implemented for this chip.)
geodeEmbedded AMD CPU with MMX and 3DNow! instruction set support.
While picking a specific cpu-type will schedule things appropriately for that particular chip, the compiler will not generate any code that does not run on the default machine type without the -march=cpu-type option
being used. For example, if GCC is configured for i686-pc-linux-gnu then -mtune=pentium4 will generate code that is tuned for Pentium4 but will still run on i686 machines.
`387'Use the standard 387 floating-point coprocessor present on the majority of chips and emulated otherwise. Code compiled with this option runs almost everywhere. The temporary results are computed in 80-bit precision instead of the precision specified by
the type, resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.
This is the default choice for i386 compiler.
`sse'Use scalar floating-point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only
single-precision arithmetic, thus the double and extended-precision arithmetic are still done using 387. A later version, present only in Pentium4 and the future AMD x86-64 chips, supports double-precision arithmetic too.
For the i386 compiler, you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions
are enabled by default.
The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80 bits.
This is the default choice for the x86-64 compiler.
`sse,387'`sse+387'`both'Attempt to utilize both instruction sets at once. This effectively double the amount of available registers and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because
the GCC register allocator does not model separate functional units well resulting in instable performance.
be done directly in cross-compilation. You must make your own arrangements to provide suitable library functions for cross-compilation.
On machines where a function returns floating-point results in the 80387 register stack, some floating-point opcodes may be emitted even if -msoft-float is used.
The usual calling convention has functions return values of types
The option -mno-fp-ret-in-387 causes such values to be returned in ordinary CPU registers instead.
default on FreeBSD, OpenBSD and NetBSD. This option is overridden when -march indicates that the target CPU will always have an FPU and so the instruction will not need emulation. As of revision 2.6.1, these instructions are not generated unless
you also use the -funsafe-math-optimizations switch.
on a two-word boundary produces code that runs somewhat faster on a `Pentium' at the expense of more memory.
On x86-64, -malign-double is enabled by default.
Warning: if you use the -malign-double switch, structures containing the above types will be aligned differently than the published application binary interface specifications for the 386 and will not be binary compatible with
structures in code compiled without that switch.
the default in 32-bit mode.
Modern architectures (Pentium and newer) prefer
a 16-byte boundary by padding the
In the x86-64 compiler, -m128bit-long-double is the default choice as its ABI specifies that
Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a
Warning: if you override the default value for your target ABI, the structures and arrays containing
be modified. Hence they will not be binary compatible with arrays or structures in code compiled without that switch.
linked into the binary and defaults to 65535.
in the caller since there is no need to pop the arguments there.
You can specify that an individual function is called with this calling sequence with the function attribute `stdcall'. You can also override the -mrtd option by using the function attribute `cdecl'. See Function
Attributes.
Warning: this calling convention is incompatible with the one normally used on Unix, so you cannot use it if you need to call libraries compiled with the Unix compiler.
Also, you must provide function prototypes for all functions that take variable numbers of arguments (including
In addition, seriously incorrect code will result if you call a function with too many arguments. (Normally, extra arguments are harmlessly ignored.)
function attribute `regparm'. See Function Attributes.
Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.
Attributes.
Warning: if you use this switch then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.
with Studio 12 Update 1) follow the ABI used by other x86 targets, which is the default on Solaris 10 and later. Only use this option if you need to remain compatible with existing code produced by those previous compiler versions or older versions
of GCC.
to 24 bits (single precision); -mpc64 rounds the significands of results of floating-point operations to 53 bits (double precision) and -mpc80 rounds the significands of results of floating-point operations to 64 bits (extended double
precision), which is the default. When this option is used, floating-point operations in higher precisions are not available to the programmer without setting the FPU control word explicitly.
Setting the rounding of floating-point operations to less than the default 80 bits can speed some programs by 2% or more. Note that some mathematical libraries assume that extended-precision (80-bit) floating-point operations are enabled by default; routines
in such libraries could suffer significant loss of accuracy, typically through so-called "catastrophic cancellation", when this option is used to set the precision to less than extended precision.
that keep a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility. See also the attribute
be used.
On Pentium and PentiumPro,
(SSE) data type
To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled
with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.
This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.
in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM, BMI, BMI2, LZCNT or 3DNow! extended instruction sets. These extensions are also available as built-in functions: see X86
Built-in Functions, for details of the functions enabled and disabled by these switches.
To have SSE/SSE2 instructions generated automatically from floating-point code (as opposed to 387 instructions), see -mfpmath=sse.
GCC depresses SSEx instructions when -mavx is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed.
These options will enable GCC to use these extended instructions in generated code, even without -mfpmath=sse. Applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate
flags. In particular, the file containing the CPU detection code should be compiled without these options.
the ABI specifies the DF flag to be cleared on function entry, some operating systems violate this specification by not clearing the DF flag in their exception dispatchers. The exception handler can be invoked with the DF flag set, which leads to wrong direction
mode when string instructions are used. This option can be enabled by default on 32-bit x86 targets by configuring GCC with the --enable-cld configure option. Generation of
option in this case.
be updated by multiple processors (or cores). This instruction is generated as part of atomic built-in functions: see __sync Builtins or __atomic
Builtins for details.
and SAHF are load and store instructions, respectively, for certain status flags. In 64-bit mode, SAHF instruction is used to optimize
Builtins for details.
instruction.
variants) for single-precision floating-point arguments. These instructions are generated only when -funsafe-math-optimizations is enabled together with -finite-math-only and -fno-trapping-math. Note that while the throughput
of the sequence is higher than the throughput of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
Note that GCC implements
Also note that GCC emits the above sequence with additional Newton-Raphson step for vectorized single-float division and vectorized
doesn't need -mrecip.
enable all estimate instructions,
for scalar division,
So for example, -mrecip=all,!sqrt would enable all of the reciprocal approximations, except for square root.
of interfacing. GCC will currently emit calls to
corresponding function type when -mveclibabi=svml is used and
corresponding function type when -mveclibabi=acml is used. Both -ftree-vectorize and -funsafe-math-optimizations have to be enabled. A SVML or ACML ABI compatible library will have to be specified at link time.
ABI when targeting Windows. On all other systems, the default is the SYSV ABI. You can control this behavior for a specific function by using the function attribute `ms_abi'/`sysv_abi'. See Function
Attributes.
compile- and run-time requirements that cannot be satisfied on all systems.
cases disabling it may improve performance because of improved scheduling and reduced dependencies.
and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies -mno-push-args.
when linking, it links in a special thread helper library -lmingwthrdwhich cleans up per thread exception handling data.
that depends on fast memcpy, strlen and memset for short lengths.
expanding using i386
The option -fomit-frame-pointer removes the frame pointer for all functions, which might make debugging harder.
Whether or not this is legal depends on the operating system, and whether it maps the segment to cover the entire TLS area.
For systems that use GNU libc, the default is on.
255, 8-bit unsigned integer divide is used instead of 32-bit/64-bit integer divide.
These `-m' switches are supported in addition to the above on AMD x86-64 processors in 64-bit environments.
The -m64 option sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The -mx32 option sets int, long and pointer to 32 bits and generates code for AMD's x86-64 architecture. For darwin
only the -m64 option turns off the -fno-pic and -mdynamic-no-pic options.
and therefore can be used for temporary data without adjusting the stack pointer. The flag -mno-red-zone disables this red zone.
code model.
into large data or bss sections and can be located above 2GB. Programs can be statically or dynamically linked.
-mtune=cpu-typeTune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. The choices for cpu-type are:
genericProduce code optimized for the most common IA32/AMD64/EM64T processors. If you know the CPU on which your code will run, then you should use the corresponding -mtune option instead of -mtune=generic. But, if you do not know exactly
what CPU users of your application will have, then you should use this option.
As new processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, the code generated option will change to reflect the processors that were most common when that version of GCC
was released.
There is no -march=generic option because -march indicates the instruction set the compiler can use, and there is no generic instruction set applicable to all processors. In contrast, -mtune indicates the processor
(or, in this case, collection of processors) for which the code is optimized.
nativeThis selects the CPU to tune for at compilation time by determining the processor type of the compiling machine. Using -mtune=nativewill produce code optimized for the local machine under the constraints of the selected instruction set. Using -march=native will
enable all instruction subsets supported by the local machine (hence the result might not run on different machines).
i386Original Intel's i386 CPU.
i486Intel's i486 CPU. (No scheduling is implemented for this chip.)
i586, pentiumIntel Pentium CPU with no MMX support.
pentium-mmxIntel PentiumMMX CPU based on Pentium core with MMX instruction set support.
pentiumproIntel PentiumPro CPU.
i686Same as
generic, but when used as
marchoption, PentiumPro instruction set will be used, so the code will run on all i686 family chips.
pentium2Intel Pentium2 CPU based on PentiumPro core with MMX instruction set support.
pentium3, pentium3mIntel Pentium3 CPU based on PentiumPro core with MMX and SSE instruction set support.
pentium-mLow power version of Intel Pentium3 CPU with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks.
pentium4, pentium4mIntel Pentium4 CPU with MMX, SSE and SSE2 instruction set support.
prescottImproved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction set support.
noconaImproved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support.
core2Intel Core2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.
corei7Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 instruction set support.
corei7-avxIntel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support.
core-avx-iIntel Core CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C instruction set support.
atomIntel Atom CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.
k6AMD K6 CPU with MMX instruction set support.
k6-2, k6-3Improved versions of AMD K6 CPU with MMX and 3DNow! instruction set support.
athlon, athlon-tbirdAMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow! and SSE prefetch instructions support.
athlon-4, athlon-xp, athlon-mpImproved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow! and full SSE instruction set support.
k8, opteron, athlon64, athlon-fxAMD K8 core based CPUs with x86-64 instruction set support. (This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow! and 64-bit instruction set extensions.)
k8-sse3, opteron-sse3, athlon64-sse3Improved versions of k8, opteron and athlon64 with SSE3 instruction set support.
amdfam10, barcelonaAMD Family 10h core based CPUs with x86-64 instruction set support. (This supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit instruction set extensions.)
bdver1AMD Family 15h core based CPUs with x86-64 instruction set support. (This supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
bdver2AMD Family 15h core based CPUs with x86-64 instruction set support. (This supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
btver1AMD Family 14h core based CPUs with x86-64 instruction set support. (This supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit instruction set extensions.)
winchip-c6IDT Winchip C6 CPU, dealt in same way as i486 with additional MMX instruction set support.
winchip2IDT Winchip2 CPU, dealt in same way as i486 with additional MMX and 3DNow! instruction set support.
c3Via C3 CPU with MMX and 3DNow! instruction set support. (No scheduling is implemented for this chip.)
c3-2Via C3-2 CPU with MMX and SSE instruction set support. (No scheduling is implemented for this chip.)
geodeEmbedded AMD CPU with MMX and 3DNow! instruction set support.
While picking a specific cpu-type will schedule things appropriately for that particular chip, the compiler will not generate any code that does not run on the default machine type without the -march=cpu-type option
being used. For example, if GCC is configured for i686-pc-linux-gnu then -mtune=pentium4 will generate code that is tuned for Pentium4 but will still run on i686 machines.
-march=cpu-typeGenerate instructions for the machine type cpu-type. The choices for cpu-type are the same as for -mtune. Moreover, specifying -march=cpu-type implies -mtune=cpu-type.
-mcpu=cpu-typeA deprecated synonym for -mtune.
-mfpmath=unitGenerate floating-point arithmetic for selected unit unit. The choices for unit are:
`387'Use the standard 387 floating-point coprocessor present on the majority of chips and emulated otherwise. Code compiled with this option runs almost everywhere. The temporary results are computed in 80-bit precision instead of the precision specified by
the type, resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.
This is the default choice for i386 compiler.
`sse'Use scalar floating-point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only
single-precision arithmetic, thus the double and extended-precision arithmetic are still done using 387. A later version, present only in Pentium4 and the future AMD x86-64 chips, supports double-precision arithmetic too.
For the i386 compiler, you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions
are enabled by default.
The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80 bits.
This is the default choice for the x86-64 compiler.
`sse,387'`sse+387'`both'Attempt to utilize both instruction sets at once. This effectively double the amount of available registers and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because
the GCC register allocator does not model separate functional units well resulting in instable performance.
-masm=dialectOutput asm instructions using selected dialect. Supported choices are `intel' or `att' (the default one). Darwin does not support `intel'.
-mieee-fp
-mno-ieee-fpControl whether or not the compiler uses IEEE floating-point comparisons. These handle correctly the case where the result of a comparison is unordered.
-msoft-floatGenerate output containing library calls for floating point. Warning: the requisite libraries are not part of GCC. Normally the facilities of the machine's usual C compiler are used, but this can't
be done directly in cross-compilation. You must make your own arrangements to provide suitable library functions for cross-compilation.
On machines where a function returns floating-point results in the 80387 register stack, some floating-point opcodes may be emitted even if -msoft-float is used.
-mno-fp-ret-in-387Do not use the FPU registers for return values of functions.
The usual calling convention has functions return values of types
floatand
doublein an FPU register, even if there is no FPU. The idea is that the operating system should emulate an FPU.
The option -mno-fp-ret-in-387 causes such values to be returned in ordinary CPU registers instead.
-mno-fancy-math-387Some 387 emulators do not support the
sin,
cosand
sqrtinstructions for the 387. Specify this option to avoid generating those instructions. This option is the
default on FreeBSD, OpenBSD and NetBSD. This option is overridden when -march indicates that the target CPU will always have an FPU and so the instruction will not need emulation. As of revision 2.6.1, these instructions are not generated unless
you also use the -funsafe-math-optimizations switch.
-malign-double
-mno-align-doubleControl whether GCC aligns
double,
long double, and
long longvariables on a two-word boundary or a one-word boundary. Aligning
doublevariables
on a two-word boundary produces code that runs somewhat faster on a `Pentium' at the expense of more memory.
On x86-64, -malign-double is enabled by default.
Warning: if you use the -malign-double switch, structures containing the above types will be aligned differently than the published application binary interface specifications for the 386 and will not be binary compatible with
structures in code compiled without that switch.
-m96bit-long-double
-m128bit-long-doubleThese switches control the size of
long doubletype. The i386 application binary interface specifies the size to be 96 bits, so -m96bit-long-double is
the default in 32-bit mode.
Modern architectures (Pentium and newer) prefer
long doubleto be aligned to an 8- or 16-byte boundary. In arrays or structures conforming to the ABI, this is not possible. So specifying -m128bit-long-double aligns
long doubleto
a 16-byte boundary by padding the
long doublewith an additional 32-bit zero.
In the x86-64 compiler, -m128bit-long-double is the default choice as its ABI specifies that
long doubleis to be aligned on 16-byte boundary.
Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a
long double.
Warning: if you override the default value for your target ABI, the structures and arrays containing
long doublevariables will change their size as well as function calling convention for function taking
long doublewill
be modified. Hence they will not be binary compatible with arrays or structures in code compiled without that switch.
-mlarge-data-threshold=numberWhen -mcmodel=medium is specified, the data greater than threshold are placed in large data section. This value must be the same across all object
linked into the binary and defaults to 65535.
-mrtdUse a different function-calling convention, in which functions that take a fixed number of arguments return with the
retnuminstruction, which pops their arguments while returning. This saves one instruction
in the caller since there is no need to pop the arguments there.
You can specify that an individual function is called with this calling sequence with the function attribute `stdcall'. You can also override the -mrtd option by using the function attribute `cdecl'. See Function
Attributes.
Warning: this calling convention is incompatible with the one normally used on Unix, so you cannot use it if you need to call libraries compiled with the Unix compiler.
Also, you must provide function prototypes for all functions that take variable numbers of arguments (including
printf); otherwise incorrect code will be generated for calls to those functions.
In addition, seriously incorrect code will result if you call a function with too many arguments. (Normally, extra arguments are harmlessly ignored.)
-mregparm=numControl how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the
function attribute `regparm'. See Function Attributes.
Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.
-msseregparmUse SSE register passing conventions for float and double arguments and return values. You can control this behavior for a specific function by using the function attribute `sseregparm'. See Function
Attributes.
Warning: if you use this switch then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.
-mvect8-ret-in-memReturn 8-byte vectors in memory instead of MMX registers. This is the default on Solaris 8 and 9 and VxWorks to match the ABI of the Sun Studio compilers until version 12. Later compiler versions (starting
with Studio 12 Update 1) follow the ABI used by other x86 targets, which is the default on Solaris 10 and later. Only use this option if you need to remain compatible with existing code produced by those previous compiler versions or older versions
of GCC.
-mpc32
-mpc64
-mpc80Set 80387 floating-point precision to 32, 64 or 80 bits. When -mpc32 is specified, the significands of results of floating-point operations are rounded
to 24 bits (single precision); -mpc64 rounds the significands of results of floating-point operations to 53 bits (double precision) and -mpc80 rounds the significands of results of floating-point operations to 64 bits (extended double
precision), which is the default. When this option is used, floating-point operations in higher precisions are not available to the programmer without setting the FPU control word explicitly.
Setting the rounding of floating-point operations to less than the default 80 bits can speed some programs by 2% or more. Note that some mathematical libraries assume that extended-precision (80-bit) floating-point operations are enabled by default; routines
in such libraries could suffer significant loss of accuracy, typically through so-called "catastrophic cancellation", when this option is used to set the precision to less than extended precision.
-mstackrealignRealign the stack at entry. On the Intel x86, the -mstackrealign option will generate an alternate prologue and epilogue that realigns the run-time stack if necessary. This supports mixing legacy codes
that keep a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility. See also the attribute
force_align_arg_pointer, applicable to individual functions.
-mpreferred-stack-boundary=numAttempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).
-mincoming-stack-boundary=numAssume the incoming stack is aligned to a 2 raised to num byte boundary. If -mincoming-stack-boundary is not specified, the one specified by-mpreferred-stack-boundary will
be used.
On Pentium and PentiumPro,
doubleand
long doublevalues should be aligned to an 8-byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension
(SSE) data type
__m128may not work properly if it is not 16-byte aligned.
To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled
with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.
This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.
-mmmx
-mno-mmx
-msse
-mno-sse
-msse2
-mno-sse2
-msse3
-mno-sse3
-mssse3
-mno-ssse3
-msse4.1
-mno-sse4.1
-msse4.2
-mno-sse4.2
-msse4
-mno-sse4
-mavx
-mno-avx
-mavx2
-mno-avx2
-maes
-mno-aes
-mpclmul
-mno-pclmul
-mfsgsbase
-mno-fsgsbase
-mrdrnd
-mno-rdrnd
-mf16c
-mno-f16c
-mfma
-mno-fma
-msse4a
-mno-sse4a
-mfma4
-mno-fma4
-mxop
-mno-xop
-mlwp
-mno-lwp
-m3dnow
-mno-3dnow
-mpopcnt
-mno-popcnt
-mabm
-mno-abm
-mbmi
-mbmi2
-mno-bmi
-mno-bmi2
-mlzcnt
-mno-lzcnt
-mtbm
-mno-tbmThese switches enable or disable the use of instructions
in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM, BMI, BMI2, LZCNT or 3DNow! extended instruction sets. These extensions are also available as built-in functions: see X86
Built-in Functions, for details of the functions enabled and disabled by these switches.
To have SSE/SSE2 instructions generated automatically from floating-point code (as opposed to 387 instructions), see -mfpmath=sse.
GCC depresses SSEx instructions when -mavx is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed.
These options will enable GCC to use these extended instructions in generated code, even without -mfpmath=sse. Applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate
flags. In particular, the file containing the CPU detection code should be compiled without these options.
-mcldThis option instructs GCC to emit a
cldinstruction in the prologue of functions that use string instructions. String instructions depend on the DF flag to select between autoincrement or autodecrement mode. While
the ABI specifies the DF flag to be cleared on function entry, some operating systems violate this specification by not clearing the DF flag in their exception dispatchers. The exception handler can be invoked with the DF flag set, which leads to wrong direction
mode when string instructions are used. This option can be enabled by default on 32-bit x86 targets by configuring GCC with the --enable-cld configure option. Generation of
cldinstructions can be suppressed with the -mno-cld compiler
option in this case.
-mvzeroupperThis option instructs GCC to emit a
vzeroupperinstruction before a transfer of control flow out of the function to minimize AVX to SSE transition penalty as well as remove unnecessary zeroupper intrinsics.
-mcx16This option will enable GCC to use CMPXCHG16B instruction in generated code. CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword) data types. This is useful for high resolution counters that could
be updated by multiple processors (or cores). This instruction is generated as part of atomic built-in functions: see __sync Builtins or __atomic
Builtins for details.
-msahfThis option will enable GCC to use SAHF instruction in generated 64-bit code. Early Intel CPUs with Intel 64 lacked LAHF and SAHF instructions supported by AMD64 until introduction of Pentium 4 G1 step in December 2005. LAHF
and SAHF are load and store instructions, respectively, for certain status flags. In 64-bit mode, SAHF instruction is used to optimize
fmod,
dremor
remainderbuilt-in functions: see Other
Builtins for details.
-mmovbeThis option will enable GCC to use movbe instruction to implement
__builtin_bswap32and
__builtin_bswap64.
-mcrc32This option will enable built-in functions,
__builtin_ia32_crc32qi,
__builtin_ia32_crc32hi.
__builtin_ia32_crc32siand
__builtin_ia32_crc32dito generate the crc32 machine
instruction.
-mrecipThis option will enable GCC to use RCPSS and RSQRTSS instructions (and their vectorized variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase precision instead of DIVSS and SQRTSS (and their vectorized
variants) for single-precision floating-point arguments. These instructions are generated only when -funsafe-math-optimizations is enabled together with -finite-math-only and -fno-trapping-math. Note that while the throughput
of the sequence is higher than the throughput of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
Note that GCC implements
1.0f/sqrtf(x
)in terms of RSQRTSS (or RSQRTPS) already with -ffast-math (or the above option combination), and doesn't need -mrecip.
Also note that GCC emits the above sequence with additional Newton-Raphson step for vectorized single-float division and vectorized
sqrtf(x
)already with -ffast-math (or the above option combination), and
doesn't need -mrecip.
-mrecip=optThis option allows to control which reciprocal estimate instructions may be used. opt is a comma separated list of options, which may be preceded by a
!to invert the option:
all:
enable all estimate instructions,
default: enable the default instructions, equivalent to -mrecip,
none: disable all estimate instructions, equivalent to -mno-recip,
div: enable the approximation
for scalar division,
vec-div: enable the approximation for vectorized division,
sqrt: enable the approximation for scalar square root,
vec-sqrt: enable the approximation for vectorized square root.
So for example, -mrecip=all,!sqrt would enable all of the reciprocal approximations, except for square root.
-mveclibabi=typeSpecifies the ABI type to use for vectorizing intrinsics using an external library. Supported types are
svmlfor the Intel short vector math library and
acmlfor the AMD math core library style
of interfacing. GCC will currently emit calls to
vmldExp2,
vmldLn2,
vmldLog102,
vmldLog102,
vmldPow2,
vmldTanh2,
vmldTan2,
vmldAtan2,
vmldAtanh2,
vmldCbrt2,
vmldSinh2,
vmldSin2,
vmldAsinh2,
vmldAsin2,
vmldCosh2,
vmldCos2,
vmldAcosh2,
vmldAcos2,
vmlsExp4,
vmlsLn4,
vmlsLog104,
vmlsLog104,
vmlsPow4,
vmlsTanh4,
vmlsTan4,
vmlsAtan4,
vmlsAtanh4,
vmlsCbrt4,
vmlsSinh4,
vmlsSin4,
vmlsAsinh4,
vmlsAsin4,
vmlsCosh4,
vmlsCos4,
vmlsAcosh4and
vmlsAcos4for
corresponding function type when -mveclibabi=svml is used and
__vrd2_sin,
__vrd2_cos,
__vrd2_exp,
__vrd2_log,
__vrd2_log2,
__vrd2_log10,
__vrs4_sinf,
__vrs4_cosf,
__vrs4_expf,
__vrs4_logf,
__vrs4_log2f,
__vrs4_log10fand
__vrs4_powffor
corresponding function type when -mveclibabi=acml is used. Both -ftree-vectorize and -funsafe-math-optimizations have to be enabled. A SVML or ACML ABI compatible library will have to be specified at link time.
-mabi=nameGenerate code for the specified calling convention. Permissible values are: `sysv' for the ABI used on GNU/Linux and other systems and `ms' for the Microsoft ABI. The default is to use the Microsoft
ABI when targeting Windows. On all other systems, the default is the SYSV ABI. You can control this behavior for a specific function by using the function attribute `ms_abi'/`sysv_abi'. See Function
Attributes.
-mtls-dialect=typeGenerate code to access thread-local storage using the `gnu' or `gnu2' conventions. `gnu' is the conservative default; `gnu2' is more efficient, but it may add
compile- and run-time requirements that cannot be satisfied on all systems.
-mpush-args
-mno-push-argsUse PUSH operations to store outgoing parameters. This method is shorter and usually equally fast as method using SUB/MOV operations and is enabled by default. In some
cases disabling it may improve performance because of improved scheduling and reduced dependencies.
-maccumulate-outgoing-argsIf enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling
and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies -mno-push-args.
-mthreadsSupport thread-safe exception handling on `Mingw32'. Code that relies on thread-safe exception handling must compile and link all code with the -mthreads option. When compiling, -mthreads defines -D_MT;
when linking, it links in a special thread helper library -lmingwthrdwhich cleans up per thread exception handling data.
-mno-align-stringopsDo not align destination of inlined string operations. This switch reduces code size and improves performance in case the destination is already aligned, but GCC doesn't know about it.
-minline-all-stringopsBy default GCC inlines string operations only when the destination is known to be aligned to least a 4-byte boundary. This enables more inlining, increase code size, but may improve performance of code
that depends on fast memcpy, strlen and memset for short lengths.
-minline-stringops-dynamicallyFor string operations of unknown size, use run-time checks with inline code for small blocks and a library call for large blocks.
-mstringop-strategy=algOverwrite internal decision heuristic about particular algorithm to inline string operation with. The allowed values are
rep_byte,
rep_4byte,
rep_8bytefor
expanding using i386
repprefix of specified size,
byte_loop,
loop,
unrolled_loopfor expanding inline loop,
libcallfor always expanding library call.
-momit-leaf-frame-pointerDon't keep the frame pointer in a register for leaf functions. This avoids the instructions to save, set up and restore frame pointers and makes an extra register available in leaf functions.
The option -fomit-frame-pointer removes the frame pointer for all functions, which might make debugging harder.
-mtls-direct-seg-refs
-mno-tls-direct-seg-refsControls whether TLS variables may be accessed with offsets from the TLS segment register (
%gsfor 32-bit,
%fsfor 64-bit), or whether the thread base pointer must be added.
Whether or not this is legal depends on the operating system, and whether it maps the segment to cover the entire TLS area.
For systems that use GNU libc, the default is on.
-msse2avx
-mno-sse2avxSpecify that the assembler should encode SSE instructions with VEX prefix. The option -mavx turns this on by default.
-mfentry
-mno-fentryIf profiling is active -pg put the profiling counter call before prologue. Note: On x86 architectures the attribute
ms_hook_prologueisn't possible at the moment for -mfentry and -pg.
-m8bit-idiv
-mno-8bit-idivOn some processors, like Intel Atom, 8-bit unsigned integer divide is much faster than 32-bit/64-bit integer divide. This option generates a run-time check. If both dividend and divisor are within range of 0 to
255, 8-bit unsigned integer divide is used instead of 32-bit/64-bit integer divide.
-mavx256-split-unaligned-load
-mavx256-split-unaligned-storeSplit 32-byte AVX unaligned load and store.
These `-m' switches are supported in addition to the above on AMD x86-64 processors in 64-bit environments.
-m32
-m64
-mx32Generate code for a 32-bit or 64-bit environment. The -m32 option sets int, long and pointer to 32 bits and generates code that runs on any i386 system.
The -m64 option sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. The -mx32 option sets int, long and pointer to 32 bits and generates code for AMD's x86-64 architecture. For darwin
only the -m64 option turns off the -fno-pic and -mdynamic-no-pic options.
-mno-red-zoneDo not use a so called red zone for x86-64 code. The red zone is mandated by the x86-64 ABI, it is a 128-byte area beyond the location of the stack pointer that will not be modified by signal or interrupt handlers
and therefore can be used for temporary data without adjusting the stack pointer. The flag -mno-red-zone disables this red zone.
-mcmodel=smallGenerate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default
code model.
-mcmodel=kernelGenerate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.
-mcmodel=mediumGenerate code for the medium model: The program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put
into large data or bss sections and can be located above 2GB. Programs can be statically or dynamically linked.
-mcmodel=largeGenerate code for the large model: This model makes no assumptions about addresses and sizes of sections.
相关文章推荐
- Intel 386 and AMD x86-64 Options for GCC
- Intel® 64 and IA-32 Architectures Software Developer's Manuals
- Intel 64 and IA-32 架构的软件开发手册
- Compiling Xen-4.4 From Source And Installing It On Ubuntu Server (Amd-64)
- linux exec /bin/sh shellcode x86 and x86_64
- 核多力量大:明年AMD核 64 PK Intel 56核
- Intel pin 2.14/CentOS 6 X86-64/安装
- x86-64_register_and_function_frame.html
- Ubuntu12.04.1LTS x86-64上安装ROR and redmine
- Intel-x86-System-Programming-Guide, Part 1,Chapter 2.3 SYSTEM FLAGS AND FIELDS IN THE EFLAGS REGISTER
- Intel x86-32 x86-64 IA-64
- Intel and AMD CPU标识
- 【翻译mos文章】Linux x86 and x86-64 系统中的SHMMAX最大值
- 64 位下 jmp 指令 Intel 与 AMD 的实现
- emulator: ERROR: x86 emulation currently requires hardware acceleration!Please ensure Intel HAXM is properly installed and usable.CPU acceleration status: HAX kernel module is not installed!
- AMD x2 ubuntu12.04 64bit平台 adt-bundle-linux-x86_64-20131030的Android源码工程调试
- Assembly-Level Representation of Programs on IA32 and x86-64
- Intel 64 and IA-32 架构,软件开发人员手册,2A&2B,指令集,A-Z(CHM版)[
- Maximum SHMMAX values for Linux x86 and x86-64 (Doc ID 567506.1)
- Windows Data Alignment on IPF, x86, and x86-64