您的位置:首页 > 其它

.NetCore 解析Zip - 源码解读 ZipFile

2020-07-07 16:27 555 查看

摘要

在之前写的内容中涉及到的内容比较高层次简单,本篇文章更深一步分析ZipFile如何解析zip压缩包中具体文件,然后进行读取解压。

1、.NET Core 文件路径解决方法,统一Linux Window
2、 ZipFIle解压原理

解析主流程

0 获取压缩包对象文档

在上篇文章中说到首先是读取压缩包文件,然后讲压缩包转为对象

ZipArchive

using (ZipArchive archive = ZipFile.Open(sourceArchiveFileName, ZipArchiveMode.Read, entryNameEncoding))
{
archive.ExtractToDirectoryExtension(destinationDirectoryName, overwriteFiles);
}

1 初始化压缩文档的处理模式

ZipFile.Open(string archiveFileName, ZipArchiveMode mode, Encoding? entryNameEncoding) 方法中

  1. 声明文件操作模式为打开(打开、创建、截取等)
  2. 声明对文件的访问方式为读操作(读、写、读并写)
  3. 共享权限
    详细查看读取文件流
    FileStream(String, FileMode, FileAccess, FileShare, Int32, Boolean)

    使用指定的路径、创建模式、读/写和共享权限、缓冲区大小和同步或异步状态初始化 FileStream 类的新实例。
    参考资料:https://docs.microsoft.com/zh-cn/dotnet/api/system.io.filestream?view=netcore-3.1

2 初始化压缩包对象的数据

接下来就是创建ZipArchive对象

public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding? entryNameEncoding)

  1. 打开压缩文件为文件流后转为二进制流,供解析Zip数据量用
  2. 初始化压缩文档集合变量,供获取压缩条目用于储存到本地
  3. 初始化压缩文档字典,根据文档名称获取压缩文档
  4. 其他初始化工作
  5. 读取压缩包核心目录结束标记,用于计算压缩头目录开始位置
public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding? entryNameEncoding)
{
if (stream == null)
throw new ArgumentNullException(nameof(stream));

EntryNameEncoding = entryNameEncoding;
Stream? extraTempStream = null;

try
{
//省略代码
//初始化 ZipArchive 数据
//省略代码
_entriesCollection = new ReadOnlyCollection<ZipArchiveEntry>(_entries);

switch (mode)
{
//省略代码
//。。。
//省略代码
case ZipArchiveMode.Read:
//读取中央目录结束标记
ReadEndOfCentralDirectory();
break;
//省略代码
//。。。
//省略代码
}
}
catch
{
if (extraTempStream != null)
extraTempStream.Dispose();

throw;
}
}

方法

ReadEndOfCentralDirectory

private void ReadEndOfCentralDirectory()
{
// This seeks backwards almost to the beginning of the EOCD, one byte after where the signature would be
// located if the EOCD had the minimum possible size (no file zip comment)
// 向后查找到EOCD的开头,即签名所在位置之后的一个字节
// 查找EOCD具有可能的最小值(没有文件zip注释)
_archiveStream.Seek(-ZipEndOfCentralDirectoryBlock.SizeOfBlockWithoutSignature, SeekOrigin.End);

// If the EOCD has the minimum possible size (no zip file comment), then exactly the previous 4 bytes will contain the signature
// But if the EOCD has max possible size, the signature should be found somewhere in the previous 64K + 4 bytes
//如果EOCD具有尽可能小的大小(没有zip文件注释),那么前面的4个字节将恰好包含签名
//但是如果EOCD有最大的可能大小,签名应该在前面的64K + 4字节中找到
if (!ZipHelper.SeekBackwardsToSignature(_archiveStream,
ZipEndOfCentralDirectoryBlock.SignatureConstant,
ZipEndOfCentralDirectoryBlock.ZipFileCommentMaxLength + ZipEndOfCentralDirectoryBlock.SignatureSize))
throw new InvalidDataException();

//省略代码
//。。。
//省略代码
}

ZipHelper.SeekBackwardsToSignature

internal static bool SeekBackwardsToSignature(Stream stream, uint signatureToFind, int maxBytesToRead)
{
int bufferPointer = 0;
bool signatureFound = false;

//省略代码
//计算核心目录头的偏移量
//省略代码

if (!signatureFound)
{
return false;
}
else
{
//将数据流设置到当前流中‘核心目录头’的位置。
stream.Seek(bufferPointer, SeekOrigin.Current);
return true;
}
}

3 读取压缩包中文档集合

循环读取压缩包中文档

foreach (ZipArchiveEntry entry in source.Entries)
{
entry.ExtractRelativeToDirectoryExtension(destinationDirectoryName, overwriteFiles);
}

读取ZipArchive属性

Entries

public ReadOnlyCollection<ZipArchiveEntry> Entries
{
get
{
if (_mode == ZipArchiveMode.Create)
throw new NotSupportedException();

ThrowIfDisposed();

//确保已经读取过核心目录头,核心目录头:Zip数据协议格式中项,后面补上
//
EnsureCentralDirectoryRead();
return _entriesCollection;
}
}

private void EnsureCentralDirectoryRead()
{
//是否未过读取压缩文档
if (!_readEntries)
{
//没有读取过则开始读取,并转换未ZipArchiveEntry集合
ReadCentralDirectory();
_readEntries = true;
}
}
private void ReadCentralDirectory()
{
try
{
// assume ReadEndOfCentralDirectory has been called and has populated _centralDirectoryStart
//设置当前文件流到核心目录开始的位置
_archiveStream.Seek(_centralDirectoryStart, SeekOrigin.Begin);

long numberOfEntries = 0;

Debug.Assert(_archiveReader != null);
//read the central directory
ZipCentralDirectoryFileHeader currentHeader;
bool saveExtraFieldsAndComments = Mode == ZipArchiveMode.Update;
//循环读取文件核心头
while (ZipCentralDirectoryFileHeader.TryReadBlock(_archiveReader,
saveExtraFieldsAndComments, out currentHeader))
{
//转为ZipArchiveEntry 并添加到集合中
AddEntry(new ZipArchiveEntry(this, currentHeader));
numberOfEntries++;
}

if (numberOfEntries != _expectedNumberOfEntries)
throw new InvalidDataException();
}
catch (EndOfStreamException ex)
{
throw new InvalidDataException();
}
}

3.1 读取压缩目录文件头

public static bool TryReadBlock(BinaryReader reader, bool saveExtraFieldsAndComments, out ZipCentralDirectoryFileHeader header)
{
header = default;

if (reader.ReadUInt32() != SignatureConstant)
return false;
header.VersionMadeBySpecification = reader.ReadByte();
header.VersionMadeByCompatibility = reader.ReadByte();
header.VersionNeededToExtract = reader.ReadUInt16();
header.GeneralPurposeBitFlag = reader.ReadUInt16();
header.CompressionMethod = reader.ReadUInt16();
header.LastModified = reader.ReadUInt32();
header.Crc32 = reader.ReadUInt32();
uint compressedSizeSmall = reader.ReadUInt32();
uint uncompressedSizeSmall = reader.ReadUInt32();
header.FilenameLength = reader.ReadUInt16();
header.ExtraFieldLength = reader.ReadUInt16();
header.FileCommentLength = reader.ReadUInt16();
ushort diskNumberStartSmall = reader.ReadUInt16();
header.InternalFileAttributes = reader.ReadUInt16();
header.ExternalFileAttributes = reader.ReadUInt32();
uint relativeOffsetOfLocalHeaderSmall = reader.ReadUInt32();

//代码省略,读取其他信息

return true;
}
}

文件核心目录头

ZipCentralDirectoryFileHeader
数据格式

字段 类型 含义
SignatureConstant const uint 核心目录头签名
VersionMadeByCompatibility byte 版本规约
VersionMadeBySpecification byte 版本兼容性
VersionNeededToExtract ushort 解压版本
GeneralPurposeBitFlag ushort 通用为编辑
CompressionMethod ushort 压缩方法
LastModified uint 最后修改时间
Crc32 uint crc-校验码
CompressedSize long 压缩后大小
UncompressedSize long 压缩前数据大小
FilenameLength ushort 文件名长度
ExtraFieldLength ushort 扩展域长
FileCommentLength ushort 文件注释长
DiskNumberStart int 文件磁盘开始位置
InternalFileAttributes ushort 内部文件属性
ExternalFileAttributes uint 外部文件属性
RelativeOffsetOfLocalHeader long
相对于本地头的偏移量,用于后面计算读取文件内容的起始位置
Filename byte[] 文件名称
FileComment byte[] 文件注释
ExtraFields List<ZipGenericExtraField> 文件扩展域

3.2 转为压缩文档

while (ZipCentralDirectoryFileHeader.TryReadBlock(_archiveReader,
saveExtraFieldsAndComments, out currentHeader))
{
//转为ZipArchiveEntry 并添加到集合中
AddEntry(new ZipArchiveEntry(this, currentHeader));
numberOfEntries++;
}
->
internal ZipArchiveEntry(ZipArchive archive, ZipCentralDirectoryFileHeader cd)
{
_archive = archive;

_originallyInArchive = true;

_diskNumberStart = cd.DiskNumberStart;
_versionMadeByPlatform = (ZipVersionMadeByPlatform)cd.VersionMadeByCompatibility;
_versionMadeBySpecification = (ZipVersionNeededValues)cd.VersionMadeBySpecification;
_versionToExtract = (ZipVersionNeededValues)cd.VersionNeededToExtract;
//省略代码
//初始化 ZipArchiveEntry
}

4 返回文档集合

前面步骤处理完成后返回文档集合

5 遍历文档条目集合

public static void ExtractToFileExtension(this ZipArchiveEntry source, string destinationFileName, bool overwrite)
{
//省略代码
using (Stream fs = new FileStream(destinationFileName, fMode, FileAccess.Write, FileShare.None, bufferSize: 0x1000, useAsync: false))
{
//source.Open 根据文档条目目录文件头信息读取数据
using (Stream es = source.Open())
es.CopyTo(fs);  //保存
}

//设置最后修改时间
File.SetLastWriteTime(destinationFileName, source.LastWriteTime.DateTime);
}

6 读取数据

public Stream Open()
{
ThrowIfInvalidArchive();

switch (_archive.Mode)
{
case ZipArchiveMode.Read:
//我们只看这行
return OpenInReadMode(checkOpenable: true);
case ZipArchiveMode.Create:
return OpenInWriteMode();
case ZipArchiveMode.Update:
default:
Debug.Assert(_archive.Mode == ZipArchiveMode.Update);
return OpenInUpdateMode();
}
}
private Stream OpenInReadMode(bool checkOpenable)
{
//省略代码

//_archive.ArchiveStream 压缩包二进制流
//OffsetOfCompressedData 当前文件目录相对于头的偏移量,也就是文件内容读取的开始位置
//_compressedSize 文档内容长度,这个长度是压缩后的长度
Stream compressedStream = new SubReadStream(_archive.ArchiveStream, OffsetOfCompressedData, _compressedSize);
return GetDataDecompressor(compressedStream);
}

//开始读取
private Stream GetDataDecompressor(Stream compressedStreamToRead)
{
Stream? uncompressedStream = null;
//CompressionMethod 压缩方式,根据不同的压缩算法获取文件内容
// Stored = 0x0, Deflate = 0x8, Deflate64 = 0x9, BZip2 = 0xC, LZMA = 0xE
switch (CompressionMethod)
{
case CompressionMethodValues.Deflate:
uncompressedStream = new DeflateStream(compressedStreamToRead, CompressionMode.Decompress, _uncompressedSize);
break;
case CompressionMethodValues.Deflate64:
uncompressedStream = new DeflateManagedStream(compressedStreamToRead, CompressionMethodValues.Deflate64, _uncompressedSize);
break;
case CompressionMethodValues.Stored:
default:
// we can assume that only deflate/deflate64/stored are allowed because we assume that
// IsOpenable is checked before this function is called
Debug.Assert(CompressionMethod == CompressionMethodValues.Stored);

uncompressedStream = compressedStreamToRead;
break;
}

return uncompressedStream;
}

读取到文件内容后则可以进行保存工作

总结

整个解析Zip过程比较清晰,但是内容较多需要慢慢消化,而

Zip
数据协议格式接下来会补上。

补充
查看Zip数据协议格式

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: