您的位置:首页 > 其它

Studying note of GCC-3.4.6 source (21)

2010-03-25 11:42 369 查看

3.3. Handling switches

3.3.1. Options related to optimization

Back to decode_options, at line 480, for C++, initialize_diagnostics in lang_hooks points to cxx_initialize_diagnostics. It setups diagnostics facility which will give out adequate and correct error message. We skip it as it is not close relate to compilation.

decode_options (continue)

489 /* Scan to see what optimization level has been specified. That will
490 determine the default value of many flags. */
491 for (i = 1; i < argc; i++)
492 {
493 if (!strcmp (argv[i], "-O"))
494 {
495 optimize = 1;
496 optimize_size = 0;
497 }
498 else if (argv[i][0] == '-' && argv[i][1] == 'O')
499 {
500 /* Handle -Os, -O2, -O3, -O69, ... */
501 const char *p = &argv[i][2];
502
503 if ((p[0] == 's') && (p[1] == 0))
504 {
505 optimize_size = 1;
506
507 /* Optimizing for size forces optimize to be 2. */
508 optimize = 2;
509 }
510 else
511 {
512 const int optimize_val = read_integral_parameter (p, p - 2, -1);
513 if (optimize_val != -1)
514 {
515 optimize = optimize_val;
516 optimize_size = 0;
517 }
518 }
519 }
520 }

In GCC, following switches indicate level of optimization[8]
-O: The compiler attempts to reduce both code size and execution time, but not to make modifications that would cause difficulties with debugging. Turns on the options -fno_optimize_size, -fdefer_pop, -fthread_jumps, -jguess_branch_prob, -cprop-registers, and -fdelayed_branch. The -fomit_frame_pointer flag is set only if the debugger is able to work without it on this platform.
-O0: The default. Disables all optimizations. Turns off all size optimization and sets -fno-merge-constants.
-O1: The same as -O.
-O2: This level turns on all optimizations that do not involve size and speed trade-offs. In addition to the options turned on for -O, this level turns on -foptimize-sibling-calls, -fcse-follow-jumps, -fcse-skip-blocks, -fgcse, -fexpensive-optimizations, -fstrength-reduce, -frerun-loop-opt, -fschedule-insns, -fdelete-null-pointer-checks, -fschedule-insn-after-reload, -frerun-cse-after-loop, -fpeephole2, -fforce-mem, -fcaller-saves, -fstruct-aliasing, -fregmove, and -freorder-blocks. This level does no loop unrolling, inlining, nor register renaming.
-O3: In addition to the options turned on for -O2, this level turns on -finline-functions and -frename-registers.
-Os: Optimizes for size. All of the -O2 options flags are set. The -falign-loops, -falign-jumps, -falign-labels, and -falign-functions are all set to 1, which prevents any space being inserted for alignment.

decode_options (continue)

522 if (!optimize)
523 {
524 flag_merge_constants = 0;
525 }
526
527 if (optimize >= 1)
528 {
529 flag_defer_pop = 1;
530 flag_thread_jumps = 1;
531 #ifdef DELAY_SLOTS
532 flag_delayed_branch = 1;
533 #endif
534 #ifdef CAN_DEBUG_WITHOUT_FP
535 flag_omit_frame_pointer = 1;
536 #endif
537 flag_guess_branch_prob = 1;
538 flag_cprop_registers = 1;
539 flag_loop_optimize = 1;
540 flag_if_conversion = 1;
541 flag_if_conversion2 = 1;
542 }
543
544 if (optimize >= 2)
545 {
546 flag_crossjumping = 1;
547 flag_optimize_sibling_calls = 1;
548 flag_cse_follow_jumps = 1;
549 flag_cse_skip_blocks = 1;
550 flag_gcse = 1;
551 flag_expensive_optimizations = 1;
552 flag_strength_reduce = 1;
553 flag_rerun_cse_after_loop = 1;
554 flag_rerun_loop_opt = 1;
555 flag_caller_saves = 1;
556 flag_force_mem = 1;
557 flag_peephole2 = 1;
558 #ifdef INSN_SCHEDULING
559 flag_schedule_insns = 1;
560 flag_schedule_insns_after_reload = 1;
561 #endif
562 flag_regmove = 1;
563 flag_strict_aliasing = 1;
564 flag_delete_null_pointer_checks = 1;
565 flag_reorder_blocks = 1;
566 flag_reorder_functions = 1;
567 flag_unit_at_a_time = 1;
568 }
569
570 if (optimize >= 3)
571 {
572 flag_inline_functions = 1;
573 flag_rename_registers = 1;
574 flag_unswitch_loops = 1;
575 flag_web = 1;
576 }
577
578 if (optimize < 2 || optimize_size)
579 {
580 align_loops = 1;
581 align_jumps = 1;
582 align_labels = 1;
583 align_functions = 1;
584
585 /* Don't reorder blocks when optimizing for size because extra
586 jump insns may be created; also barrier may create extra padding.
587
588 More correctly we should have a block reordering mode that tried
589 to minimize the combined size of all the jumps. This would more
590 or less automatically remove extra jumps, but would also try to
591 use more short jumps instead of long jumps. */
592 flag_reorder_blocks = 0;
593 }

Above at line 531, macro DELAY_SLOTS is output by tool genattr according to the presence of define_delay pattern in machine description file. And INSN_SCHEDULING at line 558 is defined by genattr too.
Besides, there are long list of variables that we need first understand their usage.
flag_merge_constants (-fmerge-constants, -fmerge-all-constants), it will attempt to merge identical constant across constant sections, if 1 only string constants and constants from constant pool, if 2 also constant variables.
flag_defer_pop (–fdef-pop), if nonzero, the arguments that were pushed onto the stack to make a function call are not popped off immediately after the return of the function, but are allowed to accumulate along with the arguments of several function calls, and the stack is later cleared of them all.
flag_thread_jumps (-fthread-jumps), if nonzero, if the value of the conditional expression of a jump goes to a location where the values are such that another jump will also be taken, the original jump is redirected to the final destination.
flag_omit_frame_pointer (-fomit-frame-pointer), if nonzero, doesn’t store the frame pointer in a register for functions that don’t need one, thus omitting the code to store and retrieve the address as well as making another register available for general use. This flag is automatically set for all levels of -O optimization, but only if the debugger can be run without a frame pointer. If the debugger cannot be run with this setting you will have to set it explicitly. Some platforms have no frame pointer and this flag will have no effect.
flag_guess_branch_prob, if nonzero, will try to guess branch probabilities.
flag_cprop_registers, after register allocation and post-register allocation instruction splitting, we perform a copy-propagation pass to try to reduce scheduling dependencies and occasionally eliminate the copy.
flag_rename_registers, if nonzero, registers should be renamed.
flag_loop_optimize, if nonzero, means perform loop optimizer.
flag_if_conversion, if nonzero, means perform if conversion.
flag_if_conversion2, if nonzero, means perform if conversion after reload.
flag_crossjumping, if nonzero, means perform crossjumping.
flag_optimize_sibling_calls, if nonzero, allows GCC to optimize sibling and tail recursive calls.
flag_cse_follow_jumps (-fcse-follow-jumps), if nonzero, when the target of a jump cannot be reached any other way except by the jump being taken, the common subexpression elimination scan follows the path of the jump. That is, any values that exist before the jump is taken will always exist at the point of the destination of the jump and can be used there. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-cse-follow-jumps.
flag_cse_skip_blocks (-fcse-skip-blocks), if nonzero, if the body of an if statement is simple enough that it does not contain code that would disrupt the previously calculated values, the common subexpression analysis flow skips over the if statement and is applied to the statements that follow it. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-cse-skip-blocks.
flag_gcse, if nonzero, means perform global common subexpression elimination (CSE).
flag_expensive_optimizations (-fexpensive-optimizations), if nonzero, enables a few optimizations that are effective but cost in terms of compile time. For example, common subexpression elimination is run again following global common subexpression elimination. Some of the other optimizations are carried out in more depth when this flag is set. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-expensive-optimizations.
flag_strength_reduce (-fstrength-reduce), if nonzero, performs loop strength reduction and elimination variables being used inside loops. This is the process of replacing time-consuming operations, such as multiply and divide, with simpler and faster operations, such as add and subtract. This option is always set by -funroll_loops and -funroll-all-loops. It is also set by -O2, -O3, and –Os but can be overridden by -fno-strength-reduce. As a simple example, the following loop uses a temporary variable to contain a calculated index:
for(int i=0; i<10; i++) {
index = i * 2;
frammis(valarr[index]);
}
The internal variable index can be eliminated, and the multiplication can be changed to a simple shift resulting in the following:
for(int i=0; i<10; i++) {
frammis(valarr[i << 1]);
}
Shifting the loop counter one position to the left effectively doubles it, and the value is then used directly as the index on the array without being stored in a temporary variable.
flag_rerun_cse_after_loop (-frerun-cse-after-loop), if nonzero, will cause the common subexpression optimization to be applied again following loop optimizations. This is done because it is possible that loop optimization creates the presence of new subexpressions. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-rerun-cse-after-loop. This increases compilation time about 20% and picks up a few more common expressions.
flag_rerun_loop_opt (-frerun-loop-opt), if nonzero, runs the loop optimization twice. The second time does not unroll loops, but it does analyze the loops again with the instructions from the first optimization pass removed. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-rerun-loop-opt.
flag_caller_saves (-fcaller-saves), if nonzero, extra instructions are included to save registers before a function call and then restore them afterward. The registers can then be used in the function call and inside the function itself. Only registers that contain useful values are saved, and then only if it seems better to save and restore than it does to reload the value later, when it is needed again. This option is enabled by default on some machines and is always enabled by -O2, -O3, and -Os, but can be overridden by -fno-caller-saves.
flag_force_mem (-fforce-mem), if nonzero, values must be copied into registers to have arithmetic performed on them. This improves the generated code because values needed will often have been previously loaded into a register and do not need to be loaded again. This flag is set by -O2, -O3, and –Os.
flag_peephole2 (-fpeephole2), if nonzero, enables RTL peephole optimization after registers have been allocated but before scheduling. The optimization is a machine specific translation of one specific set of instructions into another. This option is platform dependent and may have no effect. There is no effect unless optimization is also specified. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-peephole2.
flag_schedule_insns (-fscedule-insns), if nonzero, on machines that have relatively slow floating point or memory access operations when compared to other operations, and on machines that support the execution of more than one instruction at a time, an attempt is made to change the order of the instructions to eliminate stalling. Other instructions are executed during the time the slower instruction is being executed. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-scedule-insns.
flag_schedule_insns_after_reload (-fscedule-insns2), if nonzero, this is the same as -fschedule_insns except that it is performed after allocation of both the global registers and the local registers for each function. This can be effective on machines with a small number of registers and relatively slow instructions to load registers. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-scedule-insns2.
flag_regmove (-foptimize-register-move, -fregmove), if nonzero, register allocation is optimized by changing the assignment of registers used in operations that move data from one memory location to another. This is especially effective on machines that have instructions that can move data directly from one memory location to another. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-optimize-register-move.
flag_strict_aliasing (-fstrict-aliasing), if nonzero, the strictest aliasing rules are applied depending on the language being compiled.With strict aliasing in C, for example, an int cannot be the alias of a double or a pointer, but it can be the alias of an unsigned int. Even with strict aliasing there is not a problem with union members as long as the references are through the union and not through a pointer to the address of a union member. The following code could cause a problem:
int *iptr;
union {
int ivalue;
double dvalue;
} migs;
. . .
migs.ivalue = 45;
iptr = &migs.ivalue;
frammis(*iptr);
migs.dvalue = 88.6;
frammis(*iptr);
In this example is possible that strict aliasing would not recognize that the value pointed to by iptr had changed between the two function calls. However, referring to the union members directly would not cause a problem.
flag_delete_null_pointer_checks (-fdelete-null-pointer-checks), if nonzero, the code that checks for an attempt to dereference a null pointer is removed if dataflow analysis indicates that the pointer cannot be null. In some environments it is possible to process the result of an attempt to dereference a null pointer, so this option should not be used in programs that rely on these checks. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-delete-null-pointer-checks.
flag_reorder_blocks, if nonzero, basic blocks should be reordered.
flag_reorder_functions, if nonzero, functions should be reordered.
flag_unit_at_a_time, if nonzero, we perform whole unit at a time compilation.
flag_inline_functions (-finline-functions), if nonzero, the compiler is allowed to select certain simple functions to be expanded in line at the point of the function call. If the function is declared in such a way that all calls to it are known (for example, a static function in a C source file cannot be addressed from outside the file) the body of the function is omitted because it is never actually called. This option is automatically turned on by -O3 unless the -fno-inline-functions flag is specified.
flag_unswitch_loops, if nonzero, enables loop unswitching.
flag_web, if nonzero, means performs web construction pass.
align_loops (-falign-loop[=number]), aligns the top of loops to a boundary that is a power of 2 equal to or greater than number, but only if it is not necessary to skip more than number bytes to do it. For example, if number is 20, the resulting alignment is on a 32-byte boundary as long as no more than 20 bytes must be skipped to place it there. This option could make code larger because of the insertion of dummy instructions to bring about alignment, but, depending on the machine, the loop could execute faster because of branching to an aligned location from the bottom of each iteration. If number is not specified, the machine default is used, which is normally 1. Specifying number as 1 is equivalent to -fno-align-loops, and no alignment takes place.
align_jumps (–falign-jumps[=number]), aligns branch targets that cannot be reached any other way to a boundary that is a power of 2 equal to or greater than number, but only if it is not necessary to skip no more than number bytes to do it. For example, if number is 20 the resulting alignment is on a 32-byte boundary as long as no more than 20 bytes must be skipped to place it there. Unlike the similar option -falign-labels, this option does not require the insertion of dummy instructions before the branch target. If number is not specified the machine default is used, which is normally 1. Specifying the number as 1 is equivalent to -fno-align-jumps and no alignment takes place.
align_labels (–align-labels[=number]), aligns the targets of all branches to a boundary that is a power of 2 equal to or greater than number, but only if it is not necessary to skip no more than number bytes to do it. For example, if number is 20, the resulting alignment is on a 32-byte boundary as long as no more than 20 bytes must be skipped to place it there. This option can make code slower and larger because of the insertion of dummy instructions before the branch target. For a similar, but cheaper, version of this option see -falign-jumps. If -falign-loops or -falign-jumps is used, with a greater value than number, the greater value is used here. If number is not specified, the machine default is used, which is normally 1. Specifying number as 1 is equivalent to -fno-align-labels and no alignment takes place.
align_functions (–align-functions[=number]), aligns the starting address of functions on a boundary that is a power of 2 equal to or greater than number, but only if it is necessary to skip no more than number bytes to do it. For example, if number is 20, the resulting alignment is on a 32 byte boundary as long as no more than 20 bytes must be skipped to place it there. Setting number to a power of 2 causes all functions to be aligned to the boundary. If the number is not specified the machine default is used. For some machines the number is rounded up to a power of 2 thus aligning all functions. Specifying number as 1 is equivalent to -fno-align-functions and no alignment will take place.

decode_options (continue)

595 /* Initialize whether `char' is signed. */
596 flag_signed_char = DEFAULT_SIGNED_CHAR;
597 #ifdef DEFAULT_SHORT_ENUMS
598 /* Initialize how much space enums occupy, by default. */
599 flag_short_enums = DEFAULT_SHORT_ENUMS;
600 #endif
601
602 /* Initialize target_flags before OPTIMIZATION_OPTIONS so the latter can
603 modify it. */
604 target_flags = 0;
605 set_target_switch ("");
606
607 /* Unwind tables are always present in an ABI-conformant IA-64
608 object file, so the default should be ON. */
609 #ifdef IA64_UNWIND_INFO
610 flag_unwind_tables = IA64_UNWIND_INFO;
611 #endif
612
613 #ifdef OPTIMIZATION_OPTIONS
614 /* Allow default optimizations to be specified on a per-machine basis. */
615 OPTIMIZATION_OPTIONS (optimize, optimize_size);
616 #endif

At line 596, DEFAULT_SIGNED_CHAR is defined as 1 if `char' should by default be signed; else as 0. And DEFAULT_SHORT_ENUMS is only defined for DSP1600 chip.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: