您的位置:首页 > 其它

AT&T Assembly Language Note

2015-06-28 14:54 459 查看

1. Some Tools

Unlike a high-level language environment in which you can purchase a complete development environment, you often have to piece together an assembly language development environment. At a minimum you should have the following:

An assembler
A linker
A debugger

Additionally, to create assembly routines fro other high-level language programs, you should also have these tools:

A compiler for the high-level language
An object code disassembler
A profiling tool for optimization

1.1 The GNU Assembler

The assembler produces the instruction codes from source code created by the programmer. There are three components to an assembly language source code program:

Opcode mnemonics
Data sections
Directives

The opcode mnemonics are closely related to processor instruction codes.
Directives help the programmer define specific functions, such as declaring data types, and define memory regions within the program. The directives instruct the assembler how to construct the instruction code program.

An example of converting the assembly language program
test.s to the object file test.o would be as follows:
as -o test.o test.s
This creates an object file test.o containing the instruction codes for the assembly language program.

1.2 The GNU Linker

The GNU linker, ld, is used to link object files into either executable program files or library files.
For the simplest case, to create an executable file from an object file generated from the assembler, you would use the following command:
ld -o test test.o
This command creates the executable file
test from the object code file test.o.

1.3 The GNU Compiler

Most professional programmers attempt to write as much of the application as possible in a high-level language, such as C or C++, and concentrate on optimizing the trouble spots using assembly language programming. To do this, you must have the proper
compiler for your high-level language.
The GNU Comiler Collection(gcc) is the most popular development system for UNIX systems.
Some command-line parameters should note:

-c Compile or assemble code, but not link
-S Stop after compiling, but do not assemble
-E Stop after preprocessing, but not compile.
-o Specific the output filename to use
-g Produce debugging information
-pg Producing extra code used by the gprof for profiling
-I Specify directories for include files
-L Specify directories for library files

1.4 The GNU Debugger Program

The GNU debugger command-line program is called
gdb.
To use the debugger, the executable file must have compiled or assembled with the-gstabs option, which includes the necessary information in the executable file for the debugger to know where in the source code
file the instruction codes relate.
gcc -gstabs -o test test.c
gdb test

1.5 The GNU Objdump Program

Often it is necessary to view the instruction codes generated by the compiler in the object code files. Theobjdump program will display not only the assembly language code, but the raw instruction codes generated
as well.
For the assembly language programmer, the -d parameter is the most interesting, as it displays the disassembled object code file. for example:



I have something to say about above:

The C solution for passing input values to functions is to use the stack.
The stack consists of memory locations reserved at the end of the memory area allocated to the program. The ESP register is used to point to the top of the stack in memory. Data can be placed only on the top of the stack, and it can be removed only from
the top of the stack.
#1 C style requires placing parameters on the stack in reverse order from the prototype for the function.
#2 when the CALL function is executed, it places the return address from the calling program onto the top of the stack as well.
#3 The stack pointer(ESP) points to the top of the stack, where the return address is located. To avoid changing the location of the ESP stack pointer and throw off the indirect addressing values for accessing the parameters in the stack, it is common practice
to copy the ESP register value to the EBP register when entering the function.
To avoid corrupting the original EBP register if it is used in the main program, before the ESP register value is copied, the EBP register value is also placed on the stack.
Now the EBP contains the location of the start of the stack(which is now the old EBP register value). The first input parameter from the main program is located at the indirect addressing location 8(%esp).

The technique of using the stack to reference input data for the function has created a standard set of instructions that are found in all functions written using the C function style technique. This code snippet demonstrates what instructions are used
for the start and end of the function code:



1.6 The GNU Profiler Program

The GNU profiler(gprof) is another program included in thebinutils package. This program is used to analyse program execution and determine where "hot spots" are
in the application.
The application hot spots are functions that requires the most amount of processing time as the program runs. Often, they are most mathematically intensive functions, but that is not always the case. Functions that are I/O intensive can also increase processing
time.
Summary
The assembler is used to convert the assembly language code into instruction code for the specific processor used to run the application. The linker is then used to convert the raw instruction code into an executable program by combining any necessary
libraries, and resolving any memory references necessary for memory storage.
The debugger enables you to step through your program, watching how each instruction modifies registers and memory locations. The disassembler enables you to view the instruction codes in an object file generated by either an assmbly language program or
a high-level language program.
If you plan on using high-level languages with your assembly code, you will also need a compiler to build the executable code from the high-level language source code.
Another final tool that is useful for programmer is a profiler. The profiler is used to analyze the performance of an application. By examining which functions consume the most processing time, you can determine which ones are worth trying to optimize
to increase the performance of the application.

2. Some Knowledge

2.1 Defining the starting point

When the assembly language program is converted to an executable file, the linker must know what the starting point in your instruction code. To solve this problem, the GNU assembler declares a default label, or identifier, that should be used for the
entry point of the application. The _start label is used to indicate the instruction from which the program should start running.

2.2 Assembling using a compiler

With the assembly language source code program saved as
cpuid.s, you can build the executable program using the GNU assembler and GNU linker as follows:
as -o cpuid.o cpuid.s
ld -o cpuid cpuid.o
Because the GNU Common Compiler(gcc) uses the GNU assembler to compile C code, you can also use it to assemble and link your assembly language program in a single step.
There is a problem when using gcc to assemble your programs, While the GNU linker looks the_start label to determine the begining of the program,gcc
looks for the main label(you might recognize that from C or C++ programming). You must change both the_start label and the .global directive defining the label in your program.
2.3 Debugging the Program
In order to debug the assembly language program, you must first reassemble the source code using the-gstabs parameter:
as -gstabs -o cpuid.o cpuid.s
ld -o cpuid cpuid.o
Because the -gstabs parameter adds additional information to the executable program file, the resulting file becomes larger than it needs to just to run the application.

2.3 Linking with C library functions

When you use C library functions in your assembly language program, you must link the C library files with the program object code.
In order to link the C function libraries, they must be available on your system. On Linux system, there are two ways to link C functions to your assembly language program.
The first method is called static linking. Static linking links function object directly into your application executable program file. This creates huge executable programs, and wastes memory if multiple instance of the program are run
at the same time(each instance has its own copy of the same functions).
The second method is called dynamic linking. Dynamic linking uses libraries that enable programmers to reference the functions in their applications, but not link the function codes in the executable program file. Instead, dynamic libraries
are called at the program's runtime by the operating system, and can be shared by multiple programs.
On Linux systems, the standard C dynamic library is located in the file
libc.so.x, where x is a value representing the version of the library.
This file is automatically linked to C programs when using
gcc. You must manually link it to your program object code for the C functions to operate. To link thelibc.so file, you must use the-l parameter of the GNU linker.
When using the-l parameter, you do not need to specify the complete library name, The linker assumes that the library will be in a file:
/lib/libx.so
where the x is the library name specified on the command-line parameter--in this case, the letter.Thus the command to assemble and link the program would be as follows:
as -o cpuid.o cpuid.s
ld -dynamic-linker /lib/ld-linux.so.2 -o cpuid -lc cpuid.o
It is also possible to use the gcc compiler to assemble and link the assembly language program and C library functions. In fact, in this case it's a lot easier. The gcc compiler automatically links in the necessary C libraries without you having to do
anything special.

2.4 Defining static symbols

Although the data section is intended primarily for defining variable data, you can also declare static data symbols here as well. The.equ directive is used to set a constant value to a symbol that can be used
in the text section, as shown in the following examples:
.equ factor, 3
.equ LINUX_SYS_CALL, 0x80
Once set, the data symbol value cannot be changed within the program. The
.equ directive can appear anywhere in the data section,.
To reference the static data element, you must use a dollar sign before the label name. For example, the instruction
movl $LINUX_SYS_CALL, %eax

2.5 The bss section

Defining data elements in the bss section is somewhat different from defining them in the data section. Instead of declaring specific data types, you just declare raw segments of memory that are reserved for
whatever purpose you need them for.
The format for directives is
.comm symbol, length
where symbol is a label assigned to the memory area, and length is the number of bytes contained in the memory area.
One benefit to declaring data in the bss section is that the data is not included in the executable program. When data is defined in the data section, it must be included in the executable program, since it
must be initialized with a specific value. Because the data areas declared in thebss section are not initialized with program data, the memory areas are reserved at runtime, and do not have to be included in the
final program.

2.6 Moving data elements

The data elements are located in memory, and many of the processor instructions utilize registers, the first step to handling data elements is to be able to move them around between memory and registers.
The GNU assembler(uses AT&T style syntax) adds another dimension to the
MOV instruction, in that the size of the data element moved must be declared. The size is declared by adding an additional character to theMOV mnemonic. Thus the instruction becomes
movx
where x can be the following:

l for a 32-bit long word value
w for a 16-bit word value
b for an 8-bit byte value

There are very specific rules for using the
MOV instruction. Only certain things can be moved to other things.
Each value must be preceded by a dollar sign to indicate that it is an immediate value.
Moving data from one processor register to another is the quickest way to move data with the processor. It is often advisable to keep data in processor registers as mus as possible to decrease the amount of time spent trying to access memory locations.

2.7 Indexed memory locations

When referencing data in the array, you must use an index system to determine which value you are accessing.
The way this is done is called indexed memory mode. The memory location is determined by the following:

A base address
An offset address to add to the base address
The size of the data element
An index to determine which data element to select

The format of the expression is

base_address(offset_address, index, size)
The data value retrieved is located at
base_address + offset_address + index * size
If any of the values are zero, they can be omitted(but the commas are still required as placeholders). Theoffset_address andindex value must be registers, but thesize
value can be a numerical value.

2.8 Cleaning out the stack

There is just one more detail to consider when using C style function calling. Before the function is called, the calling program places all of the required input values onto the stack. When the function returns, those values still on the stack(since the
function accessed them without popping them off of the stack). If the main program uses the stack for other things, most likely it will want to remove the old input values from the stack to get the stack back to where it was before the function call.
While you can use the POP instruction to do this, you can also just move the ESP stack pointer back to the original location before the function call. Adding back the size of the data elements pushed onto the
stack using the ADD instruction does this.

2.9 Using indirect addressing with registers

Besides holding data, registers can also be used to hold memory addresses. When a register holds a memory address, it is referred to as apointer. Accessing the data stored in the memory location using the pointer is called indirect
addressing
.
While using a label references the data value contained in the memory location, you can get the memory location address of the data value by placing a dollar sign($) in front of the label in the instruction.
movl $output, %edi
This instructions moves the memory address of output label to the
EDI register. The dollar sign($) before the label name instructs the assembler to use the memory address, and not the data value located at the address.
movl %ebx, (%edi)
Without the parentheses around the EDI register, the instruction would just load the value in theEBXregister to the
EDI register. With the parentheses around theEDI register, the instruction instead moves the value in theEBX register to the
memory location contained in theEDI
register.

2.10 FPU(Floatint-point Unit)

The FPU is a self-contained unit that handles floating-point operations using a set of registers that are set apart from the standard processor registers. The additional FPU registers include eight 80-bit data registers, and three 16-bit registers called
the control,
status, and tag registers.
The FPU is independent of the main processor, it does not normally use the EFLAGS register to indicate results and determine behavior.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: