您的位置:首页 > 移动开发

Converting 32-bit Applications Into 64-bit Applications: Things to Consider

2012-08-13 18:53 113 查看

Converting 32-bit Applications Into 64-bit Applications: Things to Consider


Print-friendly
Version
By The Sun Studio Team, January 2005

The principal cause of problems when converting 32-bit applications to 64-bit applications is the change in size of the
int
type with respect to the
long
and
pointer types. When converting 32-bit programs to 64-bit programs, only
long
types and pointer types change in size from 32 bits to 64 bits; integers of type
int
stay
at 32 bits in size. This can cause trouble with data truncation when assigning pointer or
long
types to
int
types.
Also, problems with sign extension can occur when assigning expressions using types shorter than the size of an
int
to an
unsigned
long
or a pointer. This article discusses how to avoid or eliminate these problems.

Consider the Differences Between the 32-bit and 64-bit Data Models

The biggest difference between the 32-bit and the 64-bit compilation environments is the change in data-type models. The C data-type model for 32-bit applications is the ILP32 model, so named because the
int
and
long
types,
and pointers, are 32-bit data types. The data-type model for 64-bit applications is the LP64 data model, so named because
long
and pointer types grow to 64 bits. The
remaining C integer types and the floating-point types are the same in both data-type models.
It is not unusual for current 32-bit applications to assume that the
int
type,
long
type,
and pointers are the same size. Because the size of
long
and
pointer
change
in the LP64 data model, this change alone is the principal cause of ILP32-to-LP64 conversion problems.

Use the lint Utility to Detect Problems with 64-bit
long
and
Pointer Types

Use lint to check code that is written for both the 32-bit and the 64-bit compilation environment. Specify the -errchk=longptr64 option
to generate LP64 warnings. Also use the
-errchk=longptr64
flag which checks portability to an environment for which the size of long integers and pointers is 64 bits
and the size of plain integers is 32 bits. The -errchk=longptr64 flag checks assignments of pointer expressions and long integer expressions to plain integers, even when
explicit casts are used.
Use the -errchk=longptr64,signext option to find code where the normal ISO C value-preserving rules allow the extension of
the sign of a signed-integral value in an expression of unsigned-integral type. Use the -xarch=v9 option of lint when
you want to check code that you intend to run in the Solaris 64-bit SPARC compilation environment only. Use -xarch=amd64 when you want to check code you intend to run
in the x86 64-bit environment.
When lint generates warnings, it prints the line number of the offending code, a message that describes the problem, and whether or not a pointer is involved. The warning message also indicates the sizes of the
involved data types. When you know a pointer is involved and you know the size of the data types, you can find specific 64-bit problems and avoid the pre-existing problems between 32-bit and smaller types.
You can suppress the warning for a given line of code by placing a comment of the form
"NOTE(LINTED(<optional message>))"
on
the previous line. This is useful when you want lint to ignore certain lines of code such as casts and assignments. Exercise extreme care when you use the
"NOTE(LINTED(<optional
message>))"
comment because it can mask real problems. When you use
NOTE
, also include
#include<note.h>
.
Refer to the lint man page for more information.

Check for Changes of Pointer Size With Respect to the Size of Plain Integers

Since plain integers and pointers are the same size in the ILP32 compilation environment, 32-bit code commonly relies on this assumption. Pointers are often cast to
int
or
unsigned
int
for address arithmetic. You can cast your pointers to
unsigned long
because
long
and
pointer types are the same size in both ILP32 and LP64 data-type models. However, rather than explicitly using
unsigned long
, use
uintptr_t
instead
because it expresses your intent more closely and makes the code more portable, insulating it against future changes. To use the
uintptr_t
and
intptr_t
you
need to
#include <inttypes.h>
.
Consider the following example:

char *p;

p = (char *) ((int)p & PAGEOFFSET);

% cc ..

warning: conversion of pointer loses bits
The following version will function correctly when compiled to both 32-bit and 64-bit targets:

char *p;

p = (char *) ((uintptr_t)p & PAGEOFFSET);

Check for Changes in Size of Long Integers With Respect to the Size of Plain Integers

Because integers and longs are never really distinguished in the ILP32 data-type model, your existing code probably uses them indiscriminately. Modify any code that uses integers and longs interchangeably so it
conforms to the requirements of both the ILP32 and LP64 data-type models. While an integer and a long are both 32-bits in the ILP32 data-type model, a long is 64 bits in the LP64 data-type model.
Consider the following example:

int waiting;

long w_io;

long w_swap;

...

waiting = w_io + w_swap;

% cc

warning: assignment of 64-bit integer to 32-bit integer

Check for Sign Extensions

Sign extension is a common problem when you convert to the 64-bit compilation environment because the type conversion and promotion rules are somewhat obscure. To prevent sign-extension problems, use explicit casting
to achieve the intended results.
To understand why sign extension occurs, it helps to understand the conversion rules for ISO C. The conversion rules that seem to cause the most sign extension problems between the 32-bit and the 64-bit compilation
environment come into effect during the following operations:

Integral promotion
You can use a char, short, enumerated type, or bit-field, whether signed or unsigned, in any expression that calls for an integer. If an integer can hold all possible values of the original type, the value is converted
to an integer; otherwise, the value is converted to an unsigned integer.

Conversion between signed and unsigned integers
When an integer with a negative sign is promoted to an unsigned integer of the same or larger type, it is first promoted to the signed equivalent of the larger type, then converted to the unsigned value.

When the following example is compiled as a 64-bit program, the addr variable becomes sign-extended, even though both addr and a.base are unsigned types.

%cat test.c

struct foo {

unsigned int base:19, rehash:13;

};

main(int argc, char *argv[])

{

struct foo a;

unsigned long addr;

a.base = 0x40000;

addr = a.base << 13; /* Sign extension here! */

printf("addr 0x%lx\n", addr);

addr = (unsigned int)(a.base << 13); /* No sign extension here! */

printf("addr 0x%lx\n", addr);

}
This sign extension occurs because the conversion rules are applied as follows:

The structure member a.base
is converted from an
unsigned
int
bit field to an
int
because of the integral promotion rule. In other words, because the unsigned 19-bit field fits within a 32-bit integer, the bit field
is promoted to an integer rather than an unsigned integer. Thus, the expression
a.base << 13
is of type
int
.
If the result were assigned to an
unsigned int
, this would not matter because no sign extension has yet occurred.

The expression
a.base << 13
is of type
int
,
but it is converted to a
long
and then to an
unsigned long
before being
assigned to
addr
, because of signed and unsigned integer promotion rules. The sign extension occurs when performing the
int
to
long
conversion.

Thus, when compiled as a 64-bit program, the result is as follows:

% cc -o test64 -xarch=v9 test.c

% ./test64

addr 0xffffffff80000000

addr 0x80000000

%
When compiled as a 32-bit program, the size of an
unsigned long
is the same as the size of an
int
,
so there is no sign extension.

% cc -o test test.c

% ./test

addr 0x80000000

addr 0x80000000

%

Check Structure Packing

Check the internal data structures in an applications for holes; that is, extra padding appearing between fields in the structure to meet alignment requirements. This extra padding is allocated when
long
or
pointer fields grow to 64 bits for the LP64 data-type model, and appear after an
int
that remains at 32 bits in size. Since
long
and
pointer types are 64-bit aligned in the LP64 data-type model, padding appears between the
int
and
long
or
pointer type. In the following example, member
p
is 64-bit aligned, and so padding appears between the member
k
and
member
p
.

struct bar {

int i;

long j;

int k;

char *p;

}; /* sizeof (struct bar) = 32 bytes */
Also, structures are aligned to the size of the largest member within them. Thus, in the above structure, padding appears between member
i
and
member
j
.
When you repack a structure, follow the simple rule of moving the long and pointer fields to the beginning of the structure. Consider the following structure definition:

struct bar {

char *p;

long j;

int i;

int k;

}; /* sizeof (struct bar) = 24 bytes */

Check for Unbalanced Size of Union Members

Be sure to check the members of unions because their fields can change size between the ILP32 and the LP64 data-type models, making the size of the members different. In the following union, member
_d
and
member array
_l
are the same size in the ILP32 model, but different in the LP64 model because
long
types
grow to 64 bits in the LP64 model, but
double
types do not.

typedef union {

double _d;

long _l[2];

} llx_
The size of the members can be rebalanced by changing the type of the
_l
array member from type
long
to
type
int
.

Make Sure Constant Types are Used in Constant Expressions

A lack of precision can cause the loss of data in some constant expressions. Be explicit when you specify the data types in your constant expression. Specify the type of each integer constant by adding some combination
of {
u,U,l,L
}. You can also use casts to specify the type of a constant expression. Consider the following example:

int i = 32;

long j = 1 << i; /* j will get 0 because RHS is integer expression */
The above code can be made to work as intended, by appending the type to the constant,
1
, as follows:

int i = 32;

long j = 1L << i; /* now j will get 0x100000000, as intended */

Check Format String Conversions

Make sure the format strings for
printf
(3S),
sprintf
(3S),
scanf
(3S),
and
sscanf
(3S) can accommodate long or pointer arguments. For pointer arguments, the conversion operation given in the format string should be
%p
to
work in both the 32-bit and 64-bit compilation environments. For
long
arguments, the long size specification,
l
,
should be prepended to the conversion operation character in the format string.
Also, check to be sure that buffers passed to the first argument in
sprintf
contain enough storage to accommodate the
expanded number of digits used to convey long and pointer values. For example, a pointer is expressed by 8 hex digits in the ILP32 data model but expands to 16 in the LP64 data model.

Type Returned by
sizeof()
Operator is an
unsigned
long

In the LP64 data-type model,
sizeof()
has the effective type of an
unsigned
long
. If
sizeof()
is passed to a function expecting an argument of type
int
,
or assigned or cast to an
int
, the truncation could cause a loss of data. This is only likely to be problematic in large database programs containing extremely long
arrays.

Use Portable Data Types or Fixed Integer Types for Binary Interface Data

For data structures that are shared between 32-bit and 64-bit versions of an application, stick with data types that have a common size between ILP32 and LP64 programs. Avoid using
long
data
types and pointers. Also, avoid using derived data types that change in size between 32-bit and 64-bit applications. For example, the following types defined in
<sys/types.h>
change
in size between the ILP32 and LP64 data models:

clock_t
, which represents the system time in clock ticks
dev_t
, which is used for device numbers
off_t
, which is used for file sizes and offsets
ptrdiff_t
, which is the signed integral type for the result of subtracting two pointers
size_t
, which reflects the size, in bytes, of objects in memory
ssize_t
, which is used by functions that return a count of bytes or an error indication
time_t
, which counts time in seconds

Using the derived data types in
<sys/types.h>
is a good idea for internal data, because it helps to insulate the code
from data-model changes. However, preccisely because the size of these types are prone to change with the data model, using them is not recommended in data that is shared between 32-bit and 64-bit applications, or in other situations where the data size must
remain fixed. Nevertheless, as with the sizeof() operator discussed above, before making any changes to the code, consider whether the loss of precision will actually have any practical impact on the program.
For binary interface data, consider using the fixed-width integer types in <inttypes.h>. These types are good for explicit binary representations of the following:

Binary interface specifications
On-disk data
Over the data wire
Hardware registers
Binary data structures

Check for Side Effects

Be aware that a type change in one area can result in an unexpected 64-bit conversion in another area. For example, check all the callers of a function that previously returned an
int
and
now returns an
ssize_t
.

Consider the Effect of
long
Arrays on Performance

Large arrays of
long
or
unsigned
long
types, can cause serious performance degradation in the LP64 data-type model as compared to arrays of
int
or
unsigned
int
types. Large arrays of
long
types cause significantly more cache misses and consume more memory. Therefore, if
int
works
just as well as
long
for the application purposes, it's better to use
int
rather
than
long
. This is also an argument for using arrays of
int
types instead
of arrays of pointers. Some C applications suffer from serious performance degradation after conversion to the LP64 data-type model because they rely on many, large, arrays of pointers.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐