malware detection and machine learning(EMBER)
2022-05-29 17:24
1431 查看
EMBER
https://github.com/elastic/ember\ paper: https://arxiv.org/abs/1804.04637
特征
9个特征组,可以分为两大部分
文件结构无关特征
字节直方图
字节熵直方图
可打印字符串统计
{'numstrings': 3967, 'avlength': 16.07159062263675, 'printabledist': [3729,65,……], 'printables': 63756, 'entropy': 5.877838134765625, 'paths': 4, 'urls': 26, 'registry': 0, 'MZ': 11}
文件结构相关特征
- general
- file header
- sections
- imports
- exports
- datadirections
分别如下:
general
# 直接使用数值作为特征数值 {'size': 1237896, 'vsize': 1241088, 'has_debug': 1, 'exports': 0, 'imports': 314, 'has_relocations': 1, 'has_resources': 1, 'has_signature': 1, 'has_tls': 1, 'symbols': 0}
file header
coff header- option header
# 数值保持原始;文本进行hash {'coff': {'timestamp': 1639042586, 'machine': 'I386', 'characteristics': ['CHARA_32BIT_MACHINE', 'EXECUTABLE_IMAGE']}, 'optional': {'subsystem': 'WINDOWS_GUI', 'dll_characteristics': ['DYNAMIC_BASE', 'NX_COMPAT', 'TERMINAL_SERVER_AWARE'], 'magic': 'PE32', 'major_image_version': 0, 'minor_image_version': 0, 'major_linker_version': 14, 'minor_linker_version': 29, 'major_operating_system_version': 6, 'minor_operating_system_version': 0, 'major_subsystem_version': 6, 'minor_subsystem_version': 0, 'sizeof_code': 368640, 'sizeof_headers': 1024, 'sizeof_heap_commit': 4096}}
sections
# 数值+hash {'entry': '.text', 'sections': [{'name': '.text', 'size': 368640, 'entropy': 6.463957857941052, 'vsize': 368140, 'props': ['CNT_CODE', 'MEM_EXECUTE', 'MEM_READ']}, {'name': '.rdata', 'size': 104960, 'entropy': 4.837026560868303, 'vsize': 104760, 'props': ['CNT_INITIALIZED_DATA', 'MEM_READ']}, {'name': '.data', 'size': 28672, 'entropy': 0.6108592144000272, 'vsize': 32760, 'props': ['CNT_INITIALIZED_DATA', 'MEM_READ', 'MEM_WRITE']}, {'name': '.rsrc', 'size': 703488, 'entropy': 5.868256562445707, 'vsize': 703408, 'props': ['CNT_INITIALIZED_DATA', 'MEM_READ']}, {'name': '.reloc', 'size': 22016, 'entropy': 6.754089624508025, 'vsize': 21584, 'props': ['CNT_INITIALIZED_DATA', 'MEM_DISCARDABLE', 'MEM_READ']}]}
imports
# dll+导入函数名: hash {'NETAPI32.dll': ['NetUserGetGroups', 'NetUserGetLocalGroups'], 'RPCRT4.dll': ['UuidFromStringW'], 'VERSION.dll': ['GetFileVersionInfoW', 'GetFileVersionInfoSizeW', 'VerQueryValueW'], 'KERNEL32.dll': ['FindFirstFileExW', 'FindClose', 'GetConsoleOutputCP', 'SetFilePointerEx', 'GetFileSizeEx', 'ReadConsoleW', 'ReadConsoleInputW', 'SetConsoleMode', ……}
exports
# 导出函数: hash
datadirectories
# 直接使用 size 和 virtual_address 数值作为特征数值 [{'name': 'EXPORT_TABLE', 'size': 0, 'virtual_address': 0}, {'name': 'IMPORT_TABLE', 'size': 300, 'virtual_address': 470148}, {'name': 'RESOURCE_TABLE', 'size': 703408, 'virtual_address': 512000}, {'name': 'EXCEPTION_TABLE', 'size': 0, 'virtual_address': 0}, {'name': 'CERTIFICATE_TABLE', 'size': 9096, 'virtual_address': 1228800}, {'name': 'BASE_RELOCATION_TABLE', 'size': 21584, 'virtual_address': 1216512}, {'name': 'DEBUG', 'size': 112, 'virtual_address': 452584}, {'name': 'ARCHITECTURE', 'size': 0, 'virtual_address': 0}, {'name': 'GLOBAL_PTR', 'size': 0, 'virtual_address': 0}, {'name': 'TLS_TABLE', 'size': 24, 'virtual_address': 452928}, {'name': 'LOAD_CONFIG_TABLE', 'size': 64, 'virtual_address': 452696}, {'name': 'BOUND_IMPORT', 'size': 0, 'virtual_address': 0}, {'name': 'IAT', 'size': 1368, 'virtual_address': 372736}, {'name': 'DELAY_IMPORT_DESCRIPTOR', 'size': 0, 'virtual_address': 0}, {'name': 'CLR_RUNTIME_HEADER', 'size': 0, 'virtual_address': 0}]
模型
lightgbm
params = { "boosting": "gbdt", "objective": "binary", "num_iterations": 1000, "learning_rate": 0.05, "num_leaves": 2048, "max_depth": 15, "min_data_in_leaf": 50, "feature_fraction": 0.5 }
malconv
maxlen = 2**20 # 1MB embedding_size = 8 # define model structure inp = Input( shape=(maxlen,)) emb = Embedding( input_dim, embedding_size )( inp ) filt = Conv1D( filters=128, kernel_size=500, strides=500, use_bias=True, activation='relu', padding='valid' )(emb) attn = Conv1D( filters=128, kernel_size=500, strides=500, use_bias=True, activation='sigmoid', padding='valid')(emb) gated = Multiply()([filt,attn]) feat = GlobalMaxPooling1D()( gated ) dense = Dense(128, activation='relu')(feat) outp = Dense(1, activation='sigmoid')(dense) basemodel = Model( inp, outp )
相关文章推荐
- Programming Exercise 8: Anomaly Detection and Recommender Systems Machine Learning
- Machine Learning by Andrew Ng --- Anomaly Detection and Recommender systems
- Machine Learning week 9 quiz: programming assignment-Anomaly Detection and Recommender Systems
- 今天开始学模式识别与机器学习Pattern Recognition and Machine Learning (PRML),章节1.2,Probability Theory (下)
- 机器学习(Machine Learning and Data Mining)CS 5751——final复习记录(3)
- Software and Language about Machine Learning
- 【Pattern Recognition and Machine Learning】p8-9 preface
- What is the difference between data mining and machine learning?
- 机器学习与物理科学(四)(Machine learning and the physical sciences)
- MapReduce and MachineLearning——笔记二(配置三个结点SSH无密码登录)
- NLP领域最先使用Attention的论文Neural Machine Translation By Jointy Learning To Align And Translate解读
- 【医学+深度论文:F11】2018 A deep learning model for the detection of both advanced and early glaucoma using
- The mathematics in computer vision and machine learning
- Note for video Machine Learning and Data Mining——training vs Testing
- 有效和高效的恶意软件检测方法-Effective and Efficient Malware Detection at the End Host
- 今天开始学模式识别与机器学习Pattern Recognition and Machine Learning (PRML),章节1.2,Probability Theory (下)
- Pattern recognition and machine learning 疑难处汇总
- 机器学习(Machine Learning and Data Mining)CS 5751——final复习记录(2)
- pattern recognition and machine learning基本思想1:最大似然估计
- 行人重识别(Person Re-ID)【三】:论文笔记——Joint Detection and Identification Feature Learning for Person Search