您的位置:首页 > 其它

SUN编译器的选择--一个链接问题

2012-06-30 22:40 148 查看
Article


Reducing Symbol Scope with Sun Studio C/C++


Print-friendly
Version
By Giri Mandalika, May, 2005 (revised March 22, 2006)

Hiding non-interface symbols of a library within the library makes the library more robust and less vulnerable to symbol collisions from outside the library. This symbol scope reduction also improves the
performance of an application by reducing the runtime relocation costs of the dynamic libraries. To indicate the appropriate linker scoping in a source program, you can now use language extensions built into the Sun Studio C/C++ compilers as described here.
Contents:
Linker Scoping

Linker Scoping with Sun Studio Compilers

Examples

Windows Compatibility With __declspec

Automatic Data Imports

Benefits of Linker Scoping

Appendix

Resources


Introduction

Until the release of the Sun Studio 8 compilers, linker mapfiles were the only way to change the default symbol processing by the linker. With the help of mapfiles, all non-interface1 symbols
of an object can be hidden within a load module2, thereby making the object more robust and less vulnerable to symbol
collisions. This symbol scope reduction helps improving the performance of an application by reducing the runtime relocation costs of the dynamic objects. The other reason for symbol scoping is to ensure that clients only use the intended interface to the
library, and not the functions that are internal to the library.
The mapfile mechanism is useful with languages such as C, but difficult to exploit with languages such as C++. There are two major hurdles:

The link-editor3 only processes symbols in their mangled form. For example,
even a simple interface such as

void printstring(char *str)
has a C++ symbolic representation something like
__1cLprintstring6Fpc_v_
.
As no tool exists that can determine a symbol's mangled name other than the compilers themselves, trying to establish definitions of this sort within a mapfile, is not a simple task.
Also, changes to a function's signature, or to a
typedef
that a function signature uses can invalidate the mapfile which
was produced because the mangled name of the symbols could have changed. For versioned libraries, this invalidation is a good thing because the function signature has changed. The fact that the mapfiles survive changes in parameter types in C, is a problem.

Compilers can generate some implicit symbol definitions. These implementation interfaces must often remain global within a group of similar dynamic objects, as one interface must interpose on all the others for
the correct execution of the application. As users generally are not aware of what implementation symbols the compiler creates, they can blindly demote these symbols to local when applying any interface definitions with a mapfile.

We can avoid the specification problems in mapfiles by specifying the linker symbol scope within the source program. Sun Studio 8 introduced new syntax for specifying this scope, and a new option for controlling
default scope behavior. With these new features, programmers do not need mapfiles for linker scoping.
There are reasons programmers might still need mapfiles. The primary reason is library versioning. For other reasons, see the Linker
and Libraries Guide for the full list of mapfile capabilities. Compiler-assisted linker scoping also helps with construction of mapfiles, because the set of globally visible symbols in a candidate library becomes the actual set of globally visible symbols,
and the only remaining task is to assign symbols to versions.
This article introduces linker scoping with simple examples, and outlines the benefits of this feature for developing end-user applications. All the content of this article is equally applicable to C and C++, unless
otherwise specified. Note that the terms shared library, dynamic library, load module and dynamic module are used interchangeably throughout the article.


Linker Scoping

Sun added linker scoping as a language extension with the release of the Sun Studio 8 C/C++ compilers. Using this feature, the programmer can indicate the appropriate symbol
scoping within the source program. The following paragraphs briefly explain the need for such a feature.
Default Behavior of the Solaris Linker
With Solaris (and UNIX in general), external names (symbols) will have global scope by default, in a dynamic object. This is due to the fact that the static linker makes all symbols global in scope without linker
scoping mechanism. That is, it puts all the symbols into the dynamic symbol table of the resulting binary object, so other binary modules can access those symbols. Such symbols are called external or exported
symbols.
At program startup, the dynamic linker4 (also referred to as the runtime linker)
loads up all dynamic libraries specified at link time before starting execution of the application. Because shared libraries are not available to the executable until runtime, shared library calls get special treatement in executable objects. To do this, the
dynamic linker maintains a linked list of the link maps in the address space of the executing process, one for each dynamically linked object. The symbol search mechanism traverses this list to bind the objects of an application. The Procedure Linkage Table
(PLT) facilitates this binding.
Relocation Processing
The PLT can be used to redirect function calls between the executable and a shared object, or between different shared objects and is purely an optimization strategy designed to permit lazy symbol resolution at
run time.
Once all the dependencies for the executable were loaded, the runtime linker updates the memory image of the executable and its dependencies to reflect the real addresses for data and function references. This is
also known as relocation processing.
The dynamic relocations that the dynamic linker performs are only necessary for the global (sometimes referred to as external or exported) symbols. The static linker resolves references to local
symbols (for example, names of static functions) statically when it links the binary. So, when an application is made out of dynamic libraries with the default global scoping, it will pay some penalty during application startup time and the performance may
suffer during runtime due to the overhead of PLT processing.
A considerable amount of startup time is spent performing symbolic relocations5.
Generally a lot more time is spent relocating symbols from dependency objects than relocating symbols from the executable itself. To gain noticeable reduction in startup time, we have to somehow decrease the amount of relocation processing.
As stated earlier, the dynamic linker maintains a linked list of the link maps in the memory of the executing process, one for each dynamically linked object. So, the symbol search mechanism requires the runtime
linker to traverse the whole link-map list, looking in each object's symbol table to find the required symbol definition. This is known as a symbolic relocation. Because there can be many link maps containing many symbols, symbolic relocations are
time consuming and expensive. The process of looking up symbol values needs to be done only for symbolic relocations that reference data. Symbolic entries from the
.plt
section
are not relocated at startup because they are relocated on demand. However, non-symbolic relocations do not require a lookup and thus are not expensive and do not affect the application startup time. Because relocation processing can be the most expensive
operation during application startup, it is desirable to have fewer symbols that can be relocated. See Appendix for instructions to estimate the number of relocations on a library.
It can be summarized as follows:

Each global symbol has a run-time overhead for binding the symbol. This overhead may occur for all symbols at program startup, or it may occur only for referenced symbols upon first reference. In addition, each use of a symbol will have a run-time overhead
for the indirection of the binding tables.
A symbol that needs binding is visible in the library as a relocation. Reducing the number of relocations will reduce both forms of overhead, and yield faster libraries.
Reducing the number of relocations
One way of reducing the relocations is to have fewer symbols visible outside the application or library. This can be done by declaring locally used functions and global data private to the application/library. Using
static
keyword
as a function type in C/C++ programs, makes the function local to the module and the symbol will not appear in the dynamic symbol table (
.dynsym
).
elfdump
6 or
nm
7 utilities
can be used to examine the symbol table of an object file.
Another way is to use the mapfile option to control the scope of functions and symbols. But due to the overhead of maintaining map files with the changes in source code and compiler versions explained earlier, it
is not a preferable scheme to be used with C++ applications.
Yet another way is to indicate the appropriate linker scoping within the source program with the help of language extensions in the Sun Studio C/C++ compilers. The following paragraphs explain the linker scoping
in detail with some examples.


Linker Scoping with Sun Studio compilers

With the release of Sun Studio 8 compilers, C and C++ are now capable of describing symbol visibility. Although the symbol visibility is specified in the source file, it actually defines how a symbol can be accessed
once it has become part of an executable or shared object. The default visibility of symbol is specified by the symbol's binding type.

By using a combination of linker scope specifier directives and command line options, the programmer can define the runtime interface of a C/C++ object. These definitions are then encoded in the symbol table, and used by link-editor in a similar manner as reading
definitions from a mapfile. With this interface definition technique, the compilation method can greatly reduce the number of symbols that would normally be employed in runtime relocations. In addition, as the compiler knows what implementation symbols must
remain global within the final object, these symbols are given the appropriate visibility attribute to insure their correct usage.
Language/Compiler Extensions
There is a new compiler flag:
-xldscope=
{
global
|
symbolic
|
hidden
}

-xldscope
accepts one of the values:
global
,
symbolic
,
or
hidden
. This command line option sets the default linker scope for user-defined external symbols. The compiler issues an error if you specify
-xldscope
without
an argument. Multiple instances of this option on the command line override each other until the rightmost instance is reached. Symbols with explicit linker scope qualifiers, declarations of external symbols, static symbols, and local symbols are not affected
by the -xldscope option.

There is a new C/C++ source language interface:

__global
,
__symbolic
, and
__hidden
declaration
specifiers were introduced to specify symbol visibility at declarations and definitions of external symbols and class types. These specifiers are applicable to external functions, variables and classes; and these specifiers takes precedence over the command
line (
-xldscope
) option.

With no specifier, the symbol linker scoping remains unchanged from any prior declarations. If the symbol has no prior declaration, the symbol will have the default linker scoping.

Global Scoping
The
__global
specifier can be used to make the symbol definition global in linker scope. This is the default scoping for
extern
symbols.
With global scope, all references to the symbol bind to the definition in the first dynamic load module (shared library) that defines the symbol. To make all symbols global in scope, the programmer need not use any special flags, as it is the default. Note
that
-xldscope=global
is the default assumed by the compiler; so, specifying
-xldscope=global
explicitly
on the command line has no additional effect beyond overriding a previous
-xldscope
on the same command line.
Symbolic Scoping
Symbolic scoping (also known as protected) is more restrictive than global linker scoping; all references within a library that match definitions within the library will bind to those definitions. Outside of the library, the symbol appears
as though it was global. That is, at first the link-editor tries to find the definition of the symbol within the library. If found, the symbol will be bound to the definition during link time; otherwise the search continues outside the library as the case
with global symbols. For variables, there is an extra complication of copy relocations8

Symbolic scoping ensures that the library uses its own versions of specific functions, no matter what might appear elsewhere in the program. There are times when symbolic scoping of a set of symbols is exactly what we want. For instance, symbolic scoping fits
well in a scenario, where there is an encryption function, with the requirement that it must not be overridden by any other function from any other library irrespective of the link order of those libraries during link time.

On the downside, we lose the flexibility of library interposition, as the resulting symbols are non-interposable. Library interposition is a useful technique for tuning performance, collecting runtime statistics, or debugging applications.For example, if
libc
was
built with symbolic scoping, then we cannot take advantage of faster memory allocator libraries like
libmtmalloc.so
for multi-threaded applications, by simply preloading
libmtmalloc
and
interpose
malloc
. To do so, the symbol
malloc
must be interposable with
global binding.

With the
__symbolic
specifier, symbol definitions will have symbolic linker scope. With
-xldscope=symbolic
on
command line and without any linker scoping specifiers in the source code, all the symbols of the library get symbolic scoping. This linker scoping corresponds to the linker option,
-Bsymbolic
.

Be aware that with symbolic scoping, you can wind up with multiple copies of an object or function in a program when only one should be present. For example, suppose a symbol X is defined in library L scoped symbolic.
If X is also defined in the main program or another library that is linked ahead of L, library L will use its own copy of X, but everthing else in the program will use a different copy of X. When using
-Bsymbolic
linker
option or
-xldscope=symbolic
compiler option, this potential problem extends to every symbol defined in the library, not just the ones you intend to be symbolic.
Which one to choose:
-Bsymbolic
Or compiler supported symbolic mechanism?

Some interfaces created by languages such as C++, provide implementation details of the language itself. These implementation interfaces often must remain global within a group of similar dynamic objects, as one
interface must interpose on all the others for the correct execution of the application. As users generally are not aware of what implementation symbols are created, they can blindly demote these symbols to local with options like
-Bsymbolic
.
For this reason,
-Bsymbolic
has never been supported with C++, and its use was discouraged with C++.
Using linker scoping specifiers is the preferred way to specify symbolic scoping to a symbol. If the source code changes are not feasible, compile the source with
-xldscope=symbolic
.
-xldscope=symbolic
is
considerably safer than
-Bsymbolic
at link time.
-Bsymbolic
is a big hammer
that affects every non-local symbol. With the compiler option, certain compiler-generated symbols that need to be global remain global. Also the compiler options do not break exception handling, where as the linker
-Bsymbolic
option
can break exception handling.

Linker map file is an alternative solution. Check the introductory paragraphs for the problems associated with linker map files.
Hidden Scoping
Symbols with
__hidden
specifier will have hidden linker scoping. Hidden linker scoping is the most restrictive scope of all. All references within a dynamic
load module bind to a definition within that module and the symbol will not be visible outside of the module. That is, the symbol will be local to the library in which it was defined and other libraries may not know the existence of such symbol.

Using
-xldscope=hidden
requires using at least
__global
or
__symbolic
declaration
specifier. Otherwise the instructions result in a library that is completely unusable. The mixed use of
-xldscope=hidden
and
__symbolic
will
yield the same effect as
__declspec(dllexport)
in DLLs on Windows (explained in the later part of the article).
Summary of linker scoping:

Declaration Specifier
-xldscope value
Reference BindingVisibility of Definitions
__global
global
First Module
All Modules
__symbolic
symbolic
Same Module
All Modules
__hidden
hidden
Same Module
Same Module only
The linker will choose the most restrictive scoping specified for all definitions.
Linker scoping specifiers are applicable to
struct, class
, and
union
declarations
and definitions. Consider the following example:

__hidden struct__symbolic BinaryTree node;


The declaration specifier before the
struct
keyword applies to variable
node
.
The class key modifier after the
struct
keyword applies to the type
BinaryTree
.
Quick Note:

Make sure that the compilers were patched with all latest compiler patches from SunSolve. Many of the linker scoping bugs were already fixed and distributed as patches since the release of Sun Studio 8. All these patches are freely downloadable from the SunSolve web
site.
Rules for using these specifiers

A symbol definition may be redeclared with a more restrictive specifier, but may not be redeclared with a less restrictive specifier. This definition corresponds well with the ELF9 definition,
which says that the symbol scoping chosen is the most restrictive.
A symbol may not be declared with a different specifier once the symbol has been defined. This is due to the fact that C++ class members cannot be redeclared. In C++, an entity must be defined exactly once; repeated
definitions of the same entity in separate translation units result in a error.
All virtual functions must be visible to all compilation units that include the class definition because the declaration of virtual functions affects the construction and interpretation of virtual tables.

Additional Notes:

Declaration specifiers apply to all declarations as well as definitions.
Function and variable declarations are unaffected with
-xldscope
flag, only the definitions are affected.
With Sun Studio 8 compilers, out-of-line inline functions were static, and thus always hidden. With Sun Studio 9, out-of-line inline functions are global by default, and are affected by linker scoping specifiers
and
-xldscope.

C does not have (or need)
struct
linker scoping.
Library functions declared with the
__hidden
or
__symbolic
specifiers
can be generated inline when building the library. They are not supposed to be overridden by clients. If you intend to allow a client to override a function in a library, you must ensure that the function is not generated inline in the library.

The compiler inlines a function if you:

specify the function name with
-xinline

compile at
-xO4
or higher in which case inlining can happen automatically
use the
inline
specifier, or
use the
#pragma inline


Library functions declared with the
__global
specifier, should not be declared inline, and should be protected from inlining
by use of the
-xinline
compiler option.
-xldscope
option does not apply to tentative10 definitions;
tentative definitions continue to have global scope.
If the source file with static symbols is compiled with
-xldscope=symbolic
, and if the same object file is used in building
more than one library, dynamically loading/unloading, referencing the common symbols from those libraries may lead to a crash during run-time, due to the possible symbol conflict. This is due to the globalization of static symbols to support "fix and continue"
debugging. These global names must be interposable for "fix and continue" to work.

If the same object file say x.o, has to be linked in creating more than one library, use object file (x.o) with a different timestamp each time you build a new library ie., compile the original source again just before building a new library. Or compile the
original source to create object files with different names say x_1.o, x_2.o, etc., and use those unique object file names in building new libraries.

The scoping restraints that we specify for a static archive or an object file will not take effect until the file is linked into a shared library or an executable. This behavior can be seen in the following C program
with mixed specifiers:
% cat employee.c
__global const float lversion = 1.2;
__symbolic int taxrate;

__hidden struct employee {
 int empid;
 char *name;
} Employee;

__global void createemployee(int id, char *name) { }
__symbolic void deleteemployee(int id) { }
__hidden void modifyemployee(int id) { }

% cc -c employee.c
% elfdump -s employee.o | egrep -i "lver|tax|empl" | grep -v "employee.c"
 [5] 0x00000004 0x00000004 OBJT GLOB P 0 COMMON taxrate
 [6] 0x00000004 0x00000008 OBJT GLOB H 0 COMMON Employee
 [7] 0x00000068 0x00000018 FUNC GLOB H 0 .text modifyemployee
 [8] 0x00000040 0x00000018 FUNC GLOB P 0 .text deleteemployee
 [9] 0x00000010 0x0000001c FUNC GLOB D 0 .text createemployee
 [10] 0x00000000 0x00000004 OBJT GLOB D 0 .rodata lversion

In this example, though different visibility was specified for all the symbols, scoping restraints were not in affect in the ELF relocatable object. Due to this, all symbols have global (
GLOB
)
binding. However the object file is holding the corresponding ELF symbol visibility attributes for all the symbols according to their binding type.
Variable
lversion
and function
createemployee
have
attribute
D
, which stands for
DEFAULT
visibility
(that is,
__global
). So those two symbols are visible outside of the defining component, the executable file or shared object.
taxrate
&
deleteemployee
have
attribute
P
, which stands for
PROTECTED
visibility
(
__symbolic
). A symbol that is defined in the current component is protected, if the symbol is visible in other components, but cannot be preempted. Any reference
to such a symbol from within the defining component must be resolved to the definition in that component. This resolution must occur, even if a symbol definition exists in another component that would interpose by the default rules.
Function
modifyemployee
and structure
Employee
were
HIDDEN
with
attribute
H
(
__hidden
). A symbol
that is defined in the current component is hidden if its name is not visible to other components. Such a symbol is necessarily protected. This attribute is used to control the external interface of a component. An object named by such a symbol can
still be referenced from another component if its address is passed outside.
A hidden symbol contained in a relocatable object is either removed or converted to local (
LOCL
) binding when the object
is included in an executable file or shared object. It can be seen in the following example:
% cc -G -o libempl.so employee.o
% elfdump -sN.dynsymlibempl.so | egrep -i "lver|tax|empl"
[5] 0x00000298 0x00000018 FUNC GLOB P 0 .text deleteemployee
[6] 0x00010360 0x00000004 OBJT GLOB P 0 .bss taxrate
[9] 0x000002f4 0x00000004 OBJT GLOB D 0 .rodata lversion
[11] 0x00000268 0x0000001c FUNC GLOB D 0 .text createemployee

% elfdump -sN.symtab libempl.so | egrep -i "lver|tax|empl" \
| grep -v "libempl.so" | grep -v "employee.c"
[19] 0x000002c0 0x00000018 FUNC LOCL H 0 .text modifyemployee
[20] 0x00010358 0x00000008 OBJT LOCL H 0 .bss Employee
[36] 0x00000298 0x00000018 FUNC GLOB P 0 .text deleteemployee
[37] 0x00010360 0x00000004 OBJT GLOB P 0 .bss taxrate
[40] 0x000002f4 0x00000004 OBJT GLOB D 0 .rodata lversion
[42] 0x00000268 0x0000001c FUNC GLOB D 0 .text createemployee

Because of the
__hidden
specifier,
Employee
and
modifyemployee
were
locally bound (
LOCL
) with hidden (
H
) visibility and didn't show up in
dynamic symbol table; hence
Employee
and
modifyemployee
can not go into
the procedure linkage table (PLT), and the run-time linker need only deal with four out of six symbols.

Default Scope
At this point, it is worth looking at the default scope of symbols without linker scoping mechanism in force, to practically observe the things we learned so far:
% cat employee.c
const float lversion = 1.2;
int taxrate;

struct employee {
int empid;
char *name;
} Employee;

void createemployee(int id, char *name) { }
void deleteemployee(int id) { }
void modifyemployee(int id) { }

% cc -c employee.c
% elfdump -s employee.o | egrep -i "lver|tax|empl" | grep -v "employee.c"
[5] 0x00000004 0x00000004 OBJT GLOB D 0 COMMON taxrate
[6] 0x00000004 0x00000008 OBJT GLOB D 0 COMMON Employee
[7] 0x00000068 0x00000018 FUNC GLOB D 0 .text modifyemployee
[8] 0x00000040 0x00000018 FUNC GLOB D 0 .text deleteemployee
[9] 0x00000010 0x0000001c FUNC GLOB D 0 .text createemployee
[10] 0x00000000 0x00000004 OBJT GLOB D 0 .rodata lversion

% cc -G -o libempl.so employee.o
% elfdump -sN.dynsymlibempl.so | egrep -i "lver|tax|empl"
[1] 0x00000344 0x00000004 OBJT GLOB D 0 .rodata lversion
[4] 0x000103a8 0x00000008 OBJT GLOB D 0 .bss Employee
[6] 0x000002e8 0x00000018 FUNC GLOB D 0 .text deleteemployee
[9] 0x00000310 0x00000018 FUNC GLOB D 0 .text modifyemployee
[11] 0x000103b0 0x00000004 OBJT GLOB D 0 .bss taxrate
[13] 0x000002b8 0x0000001c FUNC GLOB D 0 .text createemployee

% elfdump -sN.symtab libempl.so | egrep -i "lver|tax|empl" \
| grep -v "libempl.so" | grep -v "employee.c"

[30] 0x00000344 0x00000004 OBJT GLOB D 0 .rodata lversion
[33] 0x000103a8 0x00000008 OBJT GLOB D 0 .bss Employee
[35] 0x000002e8 0x00000018 FUNC GLOB D 0 .text deleteemployee
[38] 0x00000310 0x00000018 FUNC GLOB D 0 .text modifyemployee
[40] 0x000103b0 0x00000004 OBJT GLOB D 0 .bss taxrate
[42] 0x000002b8 0x0000001c FUNC GLOB D 0 .text createemployee

From the above
elfdump
output, all the six symbols were having global binding. So, PLT will be holding atleast six symbols.
Suggestions on establishing an object interface

Define all interface symbols using the
__global
directive, and reduce all other symbols to local using the
-xldscope=hidden
compiler
option. This model provides the most flexibility. All global symbols are interposable, and allow for any copy relocations to be processed correctly. Or,

Define all interface symbols using the
__symbolic
directive, data objects using the
__global
directive
and reduce all other symbols to local using the
-xldscope=hidden
compiler option. Symbolic symbols are globally visible, but have been internally bound to. This means
that these symbols do not require symbolic runtime relocation, but can not be interposed upon, or have copy relocations against them. Note that the problem of copy relocations only applies to data, but not to functions. This mixed model in which functions
are symbolic and data objects are global will yield more optimization opportunities in the compiler.

In short: if we do not want a user to interpose upon our interfaces, and don't export data items, the second model ie., mixed model with
__symbolic
&
__global
,
is the best. If in doubt, better stick to the more flexible use of
__global
(first model).


Examples

Exporting of symbols in dynamic libraries can be controlled with the help of
__global
,
__symbolic
,
and
__hidden
declaration specifiers. Look at the following header file:
% [code]cat tax.h

int taxrate = 33;
float calculatetax(float);[/code]
If the
taxrate
is not needed by any code outside of the module, we can hide it with
__hidden
specifier
and compile with
-xldscope=global
option. Or leave
taxrate
to the default
scope, make
calculatetax()
visible outside of the module by adding
__global
or
__symbolic
specifiers
and compile the code with
-xldscope=hidden
option. Let's have a look at both approaches.
First Approach:

% [code]more tax.h

__hidden int taxrate = 33;
float calculatetax(float);

%
more tax.c

#include "tax.h"

float calculatetax(float amount) {
return ((float) ((amount * taxrate)/100));
}

%
cc -c -KPIC tax.c

%
cc -G -o libtax.so tax.o

% elfdump -s tax.o | egrep "tax"
[8] 0x00000010 0x00000068 FUNC GLOB D 0 .text calculatetax
[9] 0x00000000 0x00000004 OBJT GLOB H 0 .data taxrate

% elfdump -s libtax.so | egrep "tax"
[6] 0x00000240 0x00000068 FUNC GLOB D 0 .text calculatetax
[23] 0x00010350 0x00000004 OBJT LOCL H 0 .data taxrate
[42] 0x00000240 0x00000068 FUNC GLOB D 0 .text calculatetax
[/code]

Second Approach:
% [code]more tax.h

int taxrate = 33;
__global float calculatetax(float);

%
more tax.c

#include "tax.h"

float calculatetax(float amount) {
return ((float) ((amount * taxrate)/100));
}

%
cc -c-xldscope=hidden -Kpic tax.c

%
cc -G -o libtax.so tax.o


% elfdump -s tax.o | egrep "tax"
[8] 0x00000010 0x00000068 FUNC GLOB D 0 .text calculatetax
[9] 0x00000000 0x00000004 OBJT GLOB H 0 .data taxrate

% elfdump -s libtax.so | egrep "tax"
[6] 0x00000240 0x00000068 FUNC GLOB D 0 .text calculatetax
[23] 0x00010350 0x00000004 OBJT LOCL H 0 .data taxrate
[42] 0x00000240 0x00000068 FUNC GLOB D 0 .text calculatetax

[/code]
Now it is clear that, the same effect of symbol visibility can be achieved by changing either the specifier and/or the command line interface through
-xldscope
flag.
(Appendix (1) shows the binding types with all possible source interfaces (specifiers) and command line option,
-xldscope
)
Let's try to build a driver that invokes
calculatetax()
function. But at first, let's modify
tax.h
slightly
to make
calculatetax()
non-interposable (refer to bullet 2, in "Suggestions on establishing an object interface" section); and build
libtax.so

% cat tax.h
int taxrate = 33;
__symbolic float calculatetax(float);

% cc -c-xldscope=hidden -Kpic tax.c
% cc -G -o libtax.so tax.o
% cat driver.c
#include <stdio.h>
#include "tax.h"

int main() {
printf("** Tax on $2525 = %0.2f **\n", calculatetax(2525));
return (0);
}

% cc -R. -L. -o driver driver.c -ltax
Undefined first referenced
symbol in file
calculatetax driver.o (symbol scope specifies local binding)
ld: fatal: Symbol referencing errors. No output written to driver

[code]% elfdump -s driver.o | egrep "calc"

[7] 0x00000000 0x00000000 FUNC GLOB P 0 UNDEF calculatetax
[/code]
Building the driver program failed, because even the client program (
driver.c
) is trying to export (
__symbolic
)
the definition of
calculatetax()
, instead of importing (
__global
) it.
The declarations within header files shared between the library and the clients must ensure that clients and implementation have different values for the linker scoping of public symbols. So, the simple fix is to export the symbol while the library being built
and import it when the client program needs it. This can be done by either copying the
tax.h
to another file and changing the specifier to
__global
,
or by using preprocessor conditionals (with
-D
option) to alter the declaration depending on whether the header file is used in building the library or by a client.

Using separate header files for clients and library, leads to code maintenance problems. Even though using a compiler directive eases the pain of writing and maintaining two header files, unfortunately it places lots of implementation details in the public
header file. The following example illustrates this by introducing the compiler directive
BUILDLIB
for building the library.
% [code]more tax.h

int taxrate = 33;

#ifdef BUILDLIB
__symbolic float calculatetax(float);
#else
__global float calculatetax(float);
#endif[/code]
When the library was built, the private compiler directive defines
BUILDLIB
to be non-zero; so, the symbol
calculatetax
will
be exported. While building a client program with the same header file, the
BUILDLIB
variable is set to zero, and
calculatetax
will
be made available to the client ie., the symbol will be imported.

You may want to emulate this system by defining macros for your own libraries. This implies that you have to define a compiler switch (analogous to
BUILDLIB
) yourself.
This can be done with -D flag of Sun Studio C/C++ compilers. Using -D option at command line is equivalent to including a
#define
directive at the beginning of the
source. Set the switch to non-zero when you're building your library, and then set it to zero when you publish your headers for use by library clients.
Let's continue with the example by adding the directive
BUILDLIB
to the compile line that builds
libtax.so
.
% [code]make

Compiling tax.c ..
cc -c-xldscope=hidden -Kpic tax.c

Building libtax library ..
cc -G -DBUILDLIB -o libtax.so tax.o

Building driver program ..
cc -ltax -o driver driver.c

Executing driver ..
./driver
** Tax on $2525 = 833.25 **

[/code]
The following is an alternative implementation for the above example, with simple interface in public header file. The idea behind this approach is to use a second header file that redeclares symbols with a more
restrictive linker scope for use within the library.
% cat tax_public.h
float calculatetax(float);

% cat tax_private.h
#include "tax_public.h"
int taxrate = 33;

% cat tax_private.c
#include "tax_private.h"

__symbolic float calculatetax(float amount) {
 return ((float) ((amount * taxrate)/100));
}

This code makes the symbol
calculatetax
symbolic when compiling the tax_private.c; and compiler makes optimizations knowing that
calculatetax
is
symbolic and is only available within the one object file. To make these optimizations known to the entire library, the function would be redeclared in the private header.
% cat tax_private.h
#include "tax_public.h"
int taxrate = 33;
__symbolic float calculatetax(float);

To export the symbol
calculatetax
, private header should be used while building the library.
% cat tax_private.c
#include "tax_private.h"

float calculatetax(float amount) {
return ((float) ((amount * taxrate)/100));
}

% cc -c-xldscope=hidden -KPIC tax_private.c
% cc -G -o libtax_private.so tax_private.o

Public header should be used while building the client program, so the client can access
calculatetax
, since it will have
global visibility.
% cat driver.c
#include <stdio.h>
#include "tax_public.h"

int main() {
 printf("** Tax on $2525 = %0.2f **\n", calculatetax(2525));
 return (0);
}

% cc -ltax_private -o driver driver.c
% ./driver
** Tax on $2525 = 833.25 **

The trade-off with this alternate approach is that we need two sets of header files, one for exporting the symbols and the other for importing them.


Windows compatibility with __declspec

Sun Studio 9 compilers introduced a new keyword called
__declspec
and supports
dllexport
and
dllimport
storage-class
attributes (or specifiers) to facilitate the porting of applications developed using Microsoft Windows compilers to Solaris.

Syntax:
storage... __declspec( dllimport ) type declarator...
 storage... __declspec( dllexport ) type declarator...

On Windows, these attributes define the symbols exported (the library as a provider) and imported (the library as a client).
On Sun Solaris,
__declspec(dllimport)
maps to
__global
and
the
__declspec(dllexport)
maps to the
__symbolic
specifier. Note that
the semantics of these keywords are somewhat different on Microsoft and Sun platforms. So, the applications being developed natively on Sun platform are strongly encouraged to stick to Sun specified syntax, instead of using Microsoft specific extensions to
C/C++.

Just to present the syntax, the following sections, especially
__declspec
(
dllexport
)
and
__declspec(dllimport)
, assume that we are going to use
__declspec
keyword
in place of linker scoping specifiers.
__declspec(dllexport)
While building a shared library, all the global symbols of the library should be explicitly exported using the
__declspec
keyword.
To export a symbol, the declaration will be like:
__declspec(dllexport) type name

where "
_declspec(dllexport)
" is literal, and type and name declare the symbol.

for example,
__declspec(dllexport) char *printstring();
 class __declspec(dllexport) MyClass {...}

Data, functions, classes, or class member functions from a shared library can be exported using the
__declspec(dllexport)
keyword.
When building a library, we typically create a header file that contains the function prototypes and/or classes we are exporting, and add
__declspec(dllexport)
to
the declarations in the header file.
Sun Studio compilers map
__declspec(dllexport)
to
__symbolic
;
hence the following two declarations are equivalent:
__symbolic void printstring();
__declspec(dllexport) void printstring();

__declspec(dllimport)
To import the symbols that were exported with
__declspec(dllexport)
, a client, that wants to use the library must reverse
the declaration by replacing
dllexport
with
dllimport

for example,
__declspec(dllimport) char *printstring();
 class __declspec(dllimport) MyClass{...}

A program that uses public symbols defined by a shared library is said to be importing them. While creating header files for applications that use the libraries to build with,
__declspec(dllimport)
should
be used on the declarations of the public symbols.
Sun Studio compilers map
__declspec(dllimport)
to
__global
;
hence the following two declarations are equivalent:
__global void printstring();
__declspec(dllimport) void printstring();


Automatic data imports

Windows C/C++ compilers may accept code that is declared with
__declspec(dllexport)
, but actually imported. Such code
will not compile with Sun Studio compilers. The dllexport/dllimport attributes must be correct. This constraint increases the effort necessary to port existing Windows code to Solaris esp. for large C++ libraries and applications.

For example, the following code may compile and run on Windows, but doesn't compile on Sun.
% cat util.h
__declspec(dllexport) long multiply (int, int);

% cat util.cpp
#include "util.h"

long multiply (int x, int y) {
 return (x * y);
}

% cat test.cpp
#include <stdio.h>
#include "util.h"

int main() {
 printf(" 25 * 25 = %ld", multiply(25, 25));
 return (0);
}

% CC -G -o libutil.so util.cpp

% CC -o test test.cpp -L. -R. -lutil 
Undefined first referenced
 symbol in file
multiply test.o (symbol scope specifies local binding)
ld: fatal: Symbol referencing errors. No output written to test

% elfdump -CsN.symtab test.o | grep multiply
 [3] 0x00000000 0x00000000 FUNC GLOB P 0 UNDEF long multiply(int,int)

The normal mapping for __declspec(dllexport) is__symbolic, which requires that the symbol be inside the library, which is not true for imported symbols. The correct solution is for customers to change their Microsoft code to use __declspec(dllimport) to declare
imported symbols. (The solution with an example, was already discussed in the previous paragraphs)

To port such code, Sun provided a solution with an undocumented compiler option, that does not involve changing the source base.
-xldscoperef
is the new compiler option,
and it accepts two values
global
, or
keyword
. This option is available
with Sun Studio 9 and later versions by default; and to Sun Studio 8, as a patch 112760-01 (or later) and 112761-01 (or later) for SPARC and x86 platforms respectively
-xldscoperef={global|keyword}

The value
keyword
indicates that a symbol will have the linker scoping specified by any keywords given with the symbol's
declarations. This value is the current behavior and the default. The value
global
indicates that all undefined symbols should have global linkage11,
even if a linker scoping keyword says otherwise. Note that
__declspec(dllimport)
and
__declspec(dllexport)
still
map to
__global
and
__symbolic
respectively; and only undefined symbols
were affected with
-xldscoperef=global
option.
So, compiling the code from the above example with
-xldscoperef=global
would succeed and produce the desired result. Since
-xldscoperef
is
not a first class option, it has to be passed to the front end with the help of
-Qoption
option of C++ compiler and with
-W
option
of C compiler. (
-Qoption ccfe -xldscoperef=global
for C++ and
-W0,-xldscoperef=global
for
C)
%  CC -Qoption ccfe -xldscoperef=global -lutil -o test test.cpp

% ./test
 25 * 25 = 625

% elfdump -CsN.symtab test.o | grep multiply
 [3] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF long multiply(int,int)

Let's conclude by stating some of the benefits of reduced linker scoping.


Benefits of Linker Scoping

The following paragraphs explain some of the benifits of linker scoping feature. We can take advantage of most of the benefits listed, just by reducing the scope of all or most of the symbols in our application
from global to local.

Less chance for name collisions with other libraries:

With C++, namespaces are the preferred method for avoiding name collisions. But applications that rely heavily on C style programming and doesn't use namespace mechanism, are vulnerable to name collisions.

Name collisions are hard to detect and debug. Third party libraries can create havoc when some of their symbol names coincide with those in the application. For example, if a third-party shared library uses a global symbol with the same name as a global symbol
in one of the application's shared libraries, the symbol from the third-party library may interpose on ours and unintentionally change the functionality of the application without any warning. With symbolic scoping, we can make it hard to interpose symbols
and ensure the correct symbol being used during run-time.

Improved performance:

Reducing the exported interfaces of shared objects greatly reduces the runtime overhead of processing these objects and improves the application startup time & the runtime performance. Due to the reduced symbol visibility, the symbol count is reduced, hence
less overhead in runtime symbol lookup, and the relocation count is reduced, hence less overhead in fixing up the objects prior to their use.

Thread-Local Storage (TLS)

Access to thread-local storage can be significantly faster as the the compiler knows the inter-object, intra-linker-module relationship between a reference to a symbol and the definition of that symbol. If the backend knows that a symbol will not be exported
from a dynamic library or executable it can perform optimizations which it couldn't perform before when it only knew the scope relative to the relocatable object being built.

Position Independent Code with
-Kpic


With most symbols hidden, there are fewer symbols in the library, and the library may be able to use the more efficient -Kpic rather than the less efficient -KPIC.

The PIC-compiled code allows the linker to keep a read-only version of the text (code) segment for a given shared library. The dynamic linker can share this text segment among all running processes, referencing it at a given time. PIC helps reducing the number
of relocations.

Improved Security:

The
strip(1)
utility is not enough to hide the names of the application's routines and data items; stripping eliminates the local symbols but not the global symbols.

Dynamically linked binaries (both executables and shared libraries) use two symbol tables: the static symbol table and the dynamic symbol table. The dynamic symbol table is used by the runtime linker. It has to be there even in stripped executables, or else
the dynamic linker can not find the symbols it need. The
strip
utility can only remove the static symbol table.

By making most of the symbols of the application local in scope, the symbol information for such local symbols in a stripped binary is really gone and are not available at runtime; so no one can extract it.

Note even though linker scoping is an easier mechanism to use, it is not the only one and the same could be done with mapfiles too.

Better alignment with the supported interface of the library:

By reducing scope of symbols, the linker symbols that the client can link to are aligned with the supported interface of the library; and the client cannot link to functions that are not supported and may do damage to the operation of the library.

Reduced application binary sizes:

ELF's exported symbol table format is quite a space hog. Due to the reduced linker scope, there will be a noticeable drop in the sizes of the binaries being built

Overcoming 64-bit PLT limit of 32768 (Solaris 8 or previous versions only):

In the 64-bit mode, the linker on Solaris 8 or previous versions currently has a limitation: It can only handle up to 32768 PLT entries. This means that we can't link very large shared libraries in the 64-bit mode. Linker throws the following error message
if the limit is exceeded:
Assertion failed: pltndx < 0x8000

The linker needs PLT entries only for the global symbols. If we use linker scoping to reduce the scope of most of the symbols to local, this limitation is likely to become irrelevant.

Substitution to the difficult to use/manage mapfiles mechanism:

The use of linker mapfiles for linker scoping is difficult with C++ because the mapfiles require linker names, which are not the same names used in the program source (explained in the introductory paragraphs). Linker scoping is a viable alternative to mapfiles
for reducing the scope of symbols. With linker scoping, the header files of the library need not change. The source files may be compiled with the -xldscope flag to indicate the default linker scoping, and individual symbols that wish another linker scoping
are specified in the source.

Note that linker mapfiles provide many features beyond linker scoping, including assigning addresses to symbols and internal library versioning.


Appendix

Linker scope specifiers (
__global
,
__symbolic
,
and
__hidden
) will have priority over the command-line
-xldscope
option.
The following table shows the resulting binding and visibility when the code was compiled with the combination of specifiers and command line option:

Declaration Specifier
-xldscope=global
-xldscope=symbolic
-xldscope=hidden
no -xldscope
__global
GLOB DEFAULT
GLOB DEFAULT
GLOB DEFAULT
GLOB DEFAULT
__symbolic
GLOB PROTECTED
GLOB PROTECTED
GLOB PROTECTED
GLOB PROTECTED
__hidden
LOCL HIDDEN
LOCL HIDDEN
LOCL HIDDEN
LOCL HIDDEN
no specifier
GLOB DEFAULT
GLOB PROTECTED
LOCL HIDDEN
GLOB DEFAULT
Consider a library with a narrow external interface, but with a wide internal implementation. It would typically be compiled with
-xldscope=hidden
and
its interface functions defined with
__global
or
__symbolic
.
% cat external.h
extern void non_library_function();
inline void non_library_inline() {
non_library_function();
}

% cat public.h
extern void interposable();
extern void non_interposable();
struct container {
virtual void method();
void non_virtual();
};

% cat private.h
extern void inaccessible();

% cat library.c
#include "external.h"
#include "public.h"
#include "private.h"
__global void interposable() { }
__symbolic void non_interposable() { }
__symbolic void container::method() { }
__hidden void container::non_virtual() { }
void inaccessible() {
non_library_inline();
}

Compiling
library.c
results in the following linker scopings in library.o.
------------------------------------------------
 function name linker scoping 
 ------------------------------------------------
 non_library_function undefined
 non_library_inline hidden
 interposable global
 non_interposable symbolic
 container::method symbolic
 container::non_virtual hidden
 inaccessible hidden
 ------------------------------------------------


The following example interface shows the usage of symbol visibility specifiers with a class template. With the
__symbolic
specifier
in the template class definition, all members of all instances of class
Stack
, will have
symbolic
scope,
unless overridden.
% cat stack.cpp
template <class Type>
class__symbolic Stack
{
private:
Type items[25];
int top;
public:
Stack();
Bool isempty();
Bool isfull();
Bool push(const Type & item);
Bool pop(Type & item);
};

In order to specify linker scoping to the template class definition, the Sun Studio 8 compilers on SPARC & x86 platforms must be patched with the latest patches of 113817-01 (or later) & 113819-01 (or later) respectively. This facility is available with Sun
Studio 9 or later versions, by default.

A trivial C++ example showing accidental symbol collision with a 3rd party symbol
% [code]cat mylib_public.h

float getlibversion();
int checklibversion();

%
cat mylib_private.h

#include "mylib_public.h"
const float libversion = 2.2;

%
cat mylib.cpp

#include "mylib_private.h"

float getlibversion() {
return (libversion);
}

int checklibversion() {
return ((getlibversion() < 2.0) ? 1 : 0);
}

%
CC -G -o libmylib.so mylib.cpp


%
cat thirdpartylib.h

const float libversion = 1.5;

float getlibversion();

%
cat thirdpartylib.cpp

#include "thirdpartylib.h"

float getlibversion() {
return (libversion);
}

%
CC -G -o libthirdparty.so thirdpartylib.cpp


%
cat versioncheck.cpp

#include <stdio.h>
#include "mylib_public.h"

int main() {
if (checklibversion()) {
printf("\n** Obsolete version being used .. Can\'t proceed further! **\n");
} else {
printf("\n** Met the library version requirement .. Good to Go! ** \n");
}
return (0);
}

%
CC -o vercheck -lthirdparty -lmylib versioncheck.cpp

%
./vercheck


** Obsolete version being used .. Can't proceed further! **[/code]
Since
checklibversion()
and
getlibversion()
are within the same load module,
checklibversion()
of
mylib
library
is expecting the
getlibversion()
to be called from
mylib
library. However
linker picked up the
getlibversion()
from
thirdparty
library since it
was linked before
mylib
, when the executable was built.

To avoid failures like this, it is suggested to bind the symbols to their definition within the module itself with symbolic scoping. Compiling
mylib
library's source
with
-xldscope=symbolic
makes all the symbols of the module to be symbolic in nature. It produces the desired behavior and makes it hard for symbol collisions, by
ensuring that the library will use the local definition of the routine rather than a definition that occurs earlier in the link order:
% [code]CC -G -o libmylib.so-xldscope=symbolic mylib.cpp

%
CC -o vercheck -lthirdparty -lmylib versioncheck.cpp

%
./vercheck


** Met the library version requirement .. Good to Go! **[/code]

Estimating the number of relocations
To get the number of relocations that the linker may perform, run the following commands:
For the total number of relocations:
% elfdump -r <DynamicObject> | grep -v NONE | grep -c R_

For the number of non-symbolic relocations:
% elfdump -r <DynamicObject> | grep -c RELATIVE

For example
[code]% elfdump -r /usr/lib/libc.so | grep -v NONE | grep -c R_

2562

% elfdump -r /usr/lib/libc.so | grep -c RELATIVE

1868
[/code]
The number of symbolic relocations is calculated by subtracting the number of non-symbolic relocations from the total number of relocations. This number also includes the relocations in the procedure linkage table.

Footnoted Definitions

An interface (API) is a specification of functions and use of a software module. In short, it's a set of instructions for other programmers on what all classes, methods, etc., can be used from the module,
which provides the interface.

A source file contains one or more variables, function declarations, function definitions or similar items logically grouped together. From source file, compiler generates the object module, which is the
machine code of the target system. Object modules will be linked with other modules to create the load module, a program in machine language form, ready to run on the system.

static linker,
ld(1)
, also called link-editor, creates load modules from object modules.

dynamic linker,
ld.so.1(1)
performs the runtime linking of dynamic executables and shared libraries. It brings
shared libraries into an executing application and handles the symbols in those libraries as well as in the dynamic executable images. ie., the dynamic linker creates a process image from load modules.

A symbolic relocation is a relocation that requires a lookup in the symbol table. The runtime linker optimizes symbol lookup by caching successive duplicate symbols. These cached relocations are called
"cached symbolic" relocations, and are faster than plain symbolic relocations. A non-symbolic relocationis a simple relative relocation that requires the base address at which the object is mapped to perform the relocation. Non-symbolic relocations
do not require a lookup in the symbol table.

elfdump(1) utility can be used to dump selected parts of an object file, like symbol table, elf header, global offset table.

nm(1) utility displays the symbol table of an ELF object file.

Copy relocations are a technique employed to allow references from non-pic code to external data items, while maintaining the read-only permission of a typical text segment. This relocations use, and overhead,
can be avoided by designing shared objects that do not export data interfaces.

Executable and Linkable Format (ELF) is a portable object file format supported by most UNIX vendors. ELF helps developers by providing a set of binary interface definitions that are cross-platform, and
by making it easier for tool vendors to port to multiple platforms. Having a standard object file format also makes porting of object-manipulating programs easier. Compilers, debuggers, and linkers are some examples of tools that use the ELF format.

Tentative symbols are those symbols that have been created within a file but have not yet been sized, or allocated in storage. These symbols appear as uninitialized C symbols.

The ability of a name in one translation unit to be used as the definition of the same name in another translation unit is called linkage. Linkage can be internal or external. Internal linkage means a
definition can only be used in the translation unit in which it is found. External linkage means the definition can be used in other translation units as well; in other words it can be linked into outside translation units.


Resources:

Sun Studio Interface Specification for Linker Scoping
Sun Studio C++ User's Guide, Language extensions
Solaris Linker and Libraries Guide
Programming Languages - C++, ISO/IEC 14882 International Standard
Enhancing Applications by Directing Linker Symbol Processing by Greg Nakhimovsky
Interface Creation Using Compilers by Rod Evans
Reducing Application Startup Time by Neelakanth Nadgir


Acknowledgments

Thanks to Lawrence Crowl and Steve Clamage of Sun Microsystems, for the extensive feedback on this article.


About The Author

Giri Mandalika is an engineering consultant at Sun Microsystems working with independent software vendors to make sure their products run well on Sun platform. He holds a Master's
degree in Computer Science from The University of Texas at Dallas.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐