您的位置:首页 > 移动开发 > Android开发

APK的Dex文件结构和它的方法数统计分析

2017-07-06 18:58 363 查看

dex文件结构简介

Android平台中没有直接使用Class文件格式,因为早期的Anrdroid手机内存,存储都比较小,而Class文件显然有很多可以优化的地方,比如每个Class文件都有一个常量池,里边存储了一些字符串。一串内容完全相同的字符串很有可能在不同的Class文件的常量池中存在,这就是一个可以优化的地方。当然,Dex文件结构和Class文件结构差异的地方还很多,但是从携带的信息上来看,Dex和Class文件是一致的。所以,你了解了Class文件(作为Java VM官方Spec的标准),Dex文件结构只不过是一个变种罢了(从学习到什么程度为止的问题来看,如果不是要自己来解析Dex文件,或者反编译/修改dex文件,我觉得大致了解下Dex文件结构的情况就可以了)。图1所示为Dex文件结构的概貌:



有一点需要说明:传统Class文件是一个Java源码文件会生成一个.Class文件,而Android是把所有Class文件进行合并,优化,然后生成一个最终的class.dex,如此,多个Class文件里如果有重复的字符串,当把它们都放到一个dex文件的时候,只要一份就可以了嘛。

dex头部信息中的magic取值为“dex\n035\0”

proto_ids:描述函数原型信息,包括返回值,参数信息。比如“test:()V”

methods_ids:函数信息,包括所属类及对应的proto信息。比如

“Lcom.test.TestMain. test:()V”,.前面是类信息,后面属于proto信息

可以通过dexdump工具来分析Dex,dexdump.exe位于Andorid SDK的build-tools下,

首先看下dexdump的命令行参数:

dexdump: [-c] [-d] [-f] [-h] [-i] [-l layout] [-m] [-t tempfile]
dexfile…

-c : verify checksum and exit
-d : disassemble code sections
-f : display summary information from file header
-h : display file header details
-i : ignore checksum failures
-l : output layout, either ‘plain’ or ‘xml’
-m : dump register maps (and nothing else)
-t : temp file name (defaults to /sdcard/dex-temp-*)


使用-h命令输出的dex文件的Header结构如下图所示:



那么如何统计dex文件的方法数呢,目前经常用来进行方法数统计的是dex-method-counts

现在分析dex-method-counts的源码:

首先解析输入参数,这里获取到了输入参数为dex或者APK文件

开启循环解析每一个APK或者dex

如果是APK文件则作为zip打开,将APK中的所有class*.dex作为RandomAccessFile读取出来,并放在List 中,如果是.dex文件,则作为RandomAccessFile读取出来,并放在List 中

开启循环解析每一个RandomAccessFile

将Dex文件使用DexData数据结构进行解析,主要解析方法在DexData类的load方法中

load方法包含以下几个方法:

parseHeaderItem();
loadStrings();
loadTypeIds();
loadProtoIds();
loadFieldIds();
loadMethodIds();
loadClassDefs();
markInternalClasses();


parseHeaderItem

解析dex的头部信息,创建了一个HeaderItem静态内部类,该类是按照Dex的Header中的信息创建的:

/**
* Holds the contents of a header_item.
*/
static class HeaderItem {
public int fileSize;
public int headerSize;
public int endianTag;
public int stringIdsSize, stringIdsOff;
public int typeIdsSize, typeIdsOff;
public int protoIdsSize, protoIdsOff;
public int fieldIdsSize, fieldIdsOff;
public int methodIdsSize, methodIdsOff;
public int classDefsSize, classDefsOff;

/* expected magic values */
public static final byte[] DEX_FILE_MAGIC = {
0x64, 0x65, 0x78, 0x0a, 0x30, 0x33, 0x36, 0x00 };
public static final byte[] DEX_FILE_MAGIC_API_13 = {
0x64, 0x65, 0x78, 0x0a, 0x30, 0x33, 0x35, 0x00 };
public static final int ENDIAN_CONSTANT = 0x12345678;
public static final int REVERSE_ENDIAN_CONSTANT = 0x78563412;
}


先通过读取前8位读取到mafic,然后通过
seek(8+4+20);  // magic, checksum, signature
,跳转到Dex文件的SileSize的位置,开始解析HeaderItem。

mHeaderItem.fileSize = readInt();
mHeaderItem.headerSize = readInt();
/*mHeaderItem.endianTag =*/ readInt();
/*mHeaderItem.linkSize =*/ readInt();
/*mHeaderItem.linkOff =*/ readInt();
/*mHeaderItem.mapOff =*/ readInt();
mHeaderItem.stringIdsSize = readInt();
mHeaderItem.stringIdsOff = readInt();
mHeaderItem.typeIdsSize = readInt();
mHeaderItem.typeIdsOff = readInt();
mHeaderItem.protoIdsSize = readInt();
mHeaderItem.protoIdsOff = readInt();
mHeaderItem.fieldIdsSize = readInt();
mHeaderItem.fieldIdsOff = readInt();
mHeaderItem.methodIdsSize = readInt();
mHeaderItem.methodIdsOff = readInt();
mHeaderItem.classDefsSize = readInt();
mHeaderItem.classDefsOff = readInt();
/*mHeaderItem.dataSize =*/ readInt();
/*mHeaderItem.dataOff =*/ readInt();


/**
* Reads a signed 32-bit integer, byte-swapping if necessary.
*/
int readInt() throws IOException {
mDexFile.readFully(tmpBuf, 0, 4);

if (isBigEndian) {
return (tmpBuf[3] & 0xff) | ((tmpBuf[2] & 0xff) << 8) |
((tmpBuf[1] & 0xff) << 16) | ((tmpBuf[0] & 0xff) << 24);
} else {
return (tmpBuf[0] & 0xff) | ((tmpBuf[1] & 0xff) << 8) |
((tmpBuf[2] & 0xff) << 16) | ((tmpBuf[3] & 0xff) << 24);
}
}


解析完Header后其实dex中的方法数,变量数就都已经获取到了。方法数存放在
methodIdsSize
中。

但是为了更加详细的分析Dex的类信息,接下来还会按照树状结构将Dex的所有类按照包名输出其中的方法数。

解析String的方法如下所示:

void loadStrings() throws IOException {
int count = mHeaderItem.stringIdsSize;
int stringOffsets[] = new int[count];

//System.out.println("reading " + count + " strings");

seek(mHeaderItem.stringIdsOff);
for (int i = 0; i < count; i++) {
stringOffsets[i] = readInt();
}

mStrings = new String[count];

seek(stringOffsets[0]);
for (int i = 0; i < count; i++) {
seek(stringOffsets[i]);         // should be a no-op
mStrings[i] = readString();
//System.out.println("STR: " + i + ": " + mStrings[i]);
}
}


该方法很简单,主要是通过mHeaderItem.stringIdsOff,先遍历出所有String的位移量,然后再通过一个遍历将所有的String加载到数组中。

其它的加载方法和上诉方法上是相同的,这里就不在详细描述。

需要额外描述的是内部类信息的获取:

/**
* Sets the "internal" flag on type IDs which are defined in the
* DEX file or within the VM (e.g. primitive classes and arrays).
*/
void markInternalClasses() {
for (int i = mClassDefs.length -1; i >= 0; i--) {
mTypeIds[mClassDefs[i].classIdx].internal = true;
}

for (int i = 0; i < mTypeIds.length; i++) {
String className = mStrings[mTypeIds[i].descriptorIdx];

if (className.length() == 1) {
// primitive class
mTypeIds[i].internal = true;
} else if (className.charAt(0) == '[') {
mTypeIds[i].internal = true;
}

//System.out.println(i + " " +
//    (mTypeIds[i].internal ? "INTERNAL" : "external") + " - " +
//    mStrings[mTypeIds[i].descriptorIdx]);
}
}


是通过internal表示来遍历获取哪些类是内部类。

接下来就是通过DexData对象来生成具体的结构化信息。可以通过参数来设置是变量的结构化数据还是方法的结构数据。以方法的数据为例:DexFieldCounts是DexCount的子类。

DexCount如下所示:

public abstract class DexCount {

static final PrintStream out = System.out;
final OutputStyle outputStyle;
final Node packageTree;
final Map<String, IntHolder> packageCount;
int overallCount = 0;

DexCount(OutputStyle outputStyle) {
this.outputStyle = outputStyle;
packageTree = this.outputStyle == OutputStyle.TREE ? new Node() : null;
packageCount = this.outputStyle == OutputStyle.FLAT
? new TreeMap<String, IntHolder>() : null;
}

public abstract void generate(
DexData dexData, boolean includeClasses, String packageFilter, int maxDepth, Filter filter);

class IntHolder {

int value;
}

enum Filter {
ALL,
DEFINED_ONLY,
REFERENCED_ONLY
}

enum OutputStyle {
TREE {
@Override
void output(DexCount counts) {
counts.packageTree.output("");
}
},
FLAT {
@Override
void output(DexCount counts) {
for (Map.Entry<String, IntHolder> e : counts.packageCount.entrySet()) {
String packageName = e.getKey();
if (packageName == "") {
packageName = "<no package>";
}
System.out.printf("%6s %s\n", e.getValue().value, packageName);
}
}
};

abstract void output(DexCount counts);
}

void output() {
outputStyle.output(this);
}

int getOverallCount() {
return overallCount;
}

static class Node {

int count = 0;
NavigableMap<String, Node> children = new TreeMap<String, Node>();

void output(String indent) {
if (indent.length() == 0) {
out.println("<root>: " + count);
}
indent += "    ";
for (String name : children.navigableKeySet()) {
Node child = children.get(name);
out.println(indent + name + ": " + child.count);
child.output(indent);
}
}
}

}


主要是定义了通过Node ,以多叉树的方式存储方法的包名信息。其中存放了多叉树的根节点:
Node packageTree
,定义了输出方式的枚举类:
OutputStyle
,该枚举类中定义了
TREE
FLAT
两种输出方式。

通过DexMethodCounts类的generate方法,先将DexDataRe中通过Header解析出的MethodRet按照命令参数进行过滤并返回(如果没有过滤条件则直接返回),然后遍历MethodRet,如果outputStyle是TREE类型,method按照多叉树创建Node并添加到树中,如果的FLAT类型,则只记录包名和包名下的方法数。代码如下所示:

public class DexMethodCounts extends DexCount {

DexMethodCounts(OutputStyle outputStyle) {
super(outputStyle);
}

@Override
public void generate(DexData dexData, boolean includeClasses, String packageFilter, int maxDepth, Filter filter) {
MethodRef[] methodRefs = getMethodRefs(dexData, filter);

for (MethodRef methodRef : methodRefs) {
String classDescriptor = methodRef.getDeclClassName();
String packageName = includeClasses ?
Output.descriptorToDot(classDescriptor).replace('$', '.') :
Output.packageNameOnly(classDescriptor);
if (packageFilter != null &&
!packageName.startsWith(packageFilter)) {
continue;
}
overallCount++;
if (outputStyle == OutputStyle.TREE) {
String packageNamePieces[] = packageName.split("\\.");
Node packageNode = packageTree;
for (int i = 0; i < packageNamePieces.length && i < maxDepth; i++) {
packageNode.count++;
String name = packageNamePieces[i];
if (packageNode.children.containsKey(name)) {
packageNode = packageNode.children.get(name);
} else {
Node childPackageNode = new Node();
if (name.length() == 0) {
// This method is declared in a class that is part of the default package.
// Typical examples are methods that operate on arrays of primitive data types.
name = "<default>";
}
packageNode.children.put(name, childPackageNode);
packageNode = childPackageNode;
}
}
packageNode.count++;
} else if (outputStyle == OutputStyle.FLAT) {
IntHolder count = packageCount.get(packageName);
if (count == null) {
count = new IntHolder();
packageCount.put(packageName, count);
}
count.value++;
}
}
}

private static MethodRef[] getMethodRefs(DexData dexData, Filter filter) {
MethodRef[] methodRefs = dexData.getMethodRefs();
out.println(methodRefs.length);
out.println("Read in " + methodRefs.length + " method IDs.");
if (filter == Filter.ALL) {
return methodRefs;
}

ClassRef[] externalClassRefs = dexData.getExternalReferences();
out.println("Read in " + externalClassRefs.length +
" external class references.");
Set<MethodRef> externalMethodRefs = new HashSet<MethodRef>();
for (ClassRef classRef : externalClassRefs) {
Collections.addAll(externalMethodRefs, classRef.getMethodArray());
}
out.println("Read in " + externalMethodRefs.size() +
" external method references.");
List<MethodRef> filteredMethodRefs = new ArrayList<MethodRef>();
for (MethodRef methodRef : methodRefs) {
boolean isExternal = externalMethodRefs.contains(methodRef);
if ((filter == Filter.DEFINED_ONLY && !isExternal) ||
(filter == Filter.REFERENCED_ONLY && isExternal)) {
filteredMethodRefs.add(methodRef);
}
}
out.println("Filtered to " + filteredMethodRefs.size() + " " +
(filter == Filter.DEFINED_ONLY ? "defined" : "referenced") +
" method IDs.");
return filteredMethodRefs.toArray(
new MethodRef[filteredMethodRefs.size()]);
}
}


最后调用
DexCount
output
方法,即通过
OutputStyle
来按照包名输出方法和方法数。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  android