论文标题
拓扑感知的哈希用于有效控制流程图相似性分析
Topology-Aware Hashing for Effective Control Flow Graph Similarity Analysis
论文作者
论文摘要
控制流程图(CFG)相似性分析是各种安全分析任务的重要技术,包括恶意软件检测和恶意软件群集。即使已经开发了各种算法,但现有的CFG相似性分析方法仍然受到有限的效率,准确性和可用性的影响。在本文中,我们提出了一种新颖的模糊散列方案,称为拓扑感知哈希(TAH),以进行有效有效的CFG相似性分析。给定根据程序二进制文件构建的CFG,我们提取CFG的混合N-gram图形特征,将图形特征编码为数字向量(称为图形签名),然后通过比较图形签名来测量图形相似性。我们进一步采用模糊的散列技术将数字图标记转换为较小的固定大小模糊哈希签名以进行有效的相似性计算。我们的全面评估表明,与现有的CFG比较技术相比,TAH更有效。为了证明TAH对实际安全分析任务的适用性,我们基于TAH开发了二进制相似性分析工具,并表明它在执行恶意软件集群时优于现有的相似性分析工具。
Control Flow Graph (CFG) similarity analysis is an essential technique for a variety of security analysis tasks, including malware detection and malware clustering. Even though various algorithms have been developed, existing CFG similarity analysis methods still suffer from limited efficiency, accuracy, and usability. In this paper, we propose a novel fuzzy hashing scheme called topology-aware hashing (TAH) for effective and efficient CFG similarity analysis. Given the CFGs constructed from program binaries, we extract blended n-gram graphical features of the CFGs, encode the graphical features into numeric vectors (called graph signatures), and then measure the graph similarity by comparing the graph signatures. We further employ a fuzzy hashing technique to convert the numeric graph signatures into smaller fixed-size fuzzy hash signatures for efficient similarity calculation. Our comprehensive evaluation demonstrates that TAH is more effective and efficient compared to existing CFG comparison techniques. To demonstrate the applicability of TAH to real-world security analysis tasks, we develop a binary similarity analysis tool based on TAH, and show that it outperforms existing similarity analysis tools while conducting malware clustering.
