本文将会总结过去几篇有关 APT 攻击检测相关的论文中提到的相关工作,从而能够较为清晰地认识当前在该领域内的前人工作和前沿技术,具体会总结以下几篇论文中的相关工作:

威胁狩猎方法(DeepHunter)

Poirot [1] 是跟 DeepHunter 相关的工作,在相关工作中提及,并在文章的 7.2 节中详细介绍后进行了性能对比。该章节中提到, Poirot 是当前威胁狩猎方法的前沿技术之一。Poirot 是一种基于启发式图特征匹配的算法,可以计算查询图和溯源图之间的图对齐分数。Poirot 根据查询图中的信息流在溯源图中搜索对齐的节点,在搜索的过程中忽略了攻击者不可能采取的路径。

RapSheet [2] 是一种可以使用溯源图分析提升 EDR 威胁狩猎能力的方法,但是 RapSheet 需要攻击者攻击的完全路径都被保存在溯源图上进行告警关联。显然,断开连接的攻击溯源图无疑会降低 RapSheet 的性能。

HOLMES [12] 和 NoDoze [3] 都采用了溯源图告警关联的方式进行 APT 检测和调查。

为了检测隐蔽的恶意软件,ProvDetector [4] 提出了一种图表示学习方法来进行溯源图上的进程正常行为建模。

然而以上这些方法,都会假设一个精确的正常行为数据库来减少错误告警。我们知道,随着良性行为的概念漂移,正常行为模型中有出现中毒攻击风险。此外,所有这些方法都是基于路径的方法,他们的健壮性可能会受到攻击溯源图不连通的影响。

有些方法使用 IOC 或者威胁警报作为识别攻击行为的线索,如 zero-day [5] 和 C&C [6],但是这些方法忽略了指示器或者警报之间的关系,因此,可能带来高误报率。

溯源图分析(DeepHunter)

溯源图分析广泛应用于 APT 攻击检测 [7],取证分析 [8] 和攻击场景重建 [9, 10] 等领域。

近期的工作有 [11, 12] 想要寻找关联低层系统事件和高层行为之间语义的方法。

很多近期的工作,如 Morse [8], BEEP [13], MPI [14], and OmegaLog [11] 等,是为了解决溯源图中依赖爆炸的问题。

SteamSpot [15] 将溯源图视为具有类型化节点和边的时态图,然后提出一种用于异常检测的图绘制算法。

图匹配算法(DeepHunter)

图特征匹配和图相似度计算已经在许多实际应用中得到了应用,如二元函数相似性搜索 [16] 和硬件安全 [17]。在过去的几十年中,定义了许多图匹配度量,如图编辑距离/图同构等。其他基于 GNN 的图匹配算法有:MatchGNet [18], SimGNN [19] and GMN [16],在 DeepHunter 的第 7 章中有跟此三种算法的对比评估。

DeepHunter 设计的图嵌入网络表示溯源图,并采用 GCN [20] 来表示溯源图。最后使用了一个强大的关系学习网络 NTN [21],来学习计算匹配分数的度量。通过这种新的设计实践,可以建立用用于图模式匹配的 GNN 模型,该模型对威胁搜索中查询图和溯源图之间不同程度的不一会具有识别健壮性。

ATLAS

近期工作,从审计日志构建因果依赖图的研究成果有 [13, 22],使用查询系统定位关键攻击阶段,如受损进程和恶意负载 [8, 12]。还有一些研究志于将机器学习技术应用在从日志中提取特征或者序列来进行自动攻击检测和失败接测 [10, 23],而其他人也构建了通过事件关联发现不同日志事件之间的关联 [44]。

因果分析 (ATLAS)

基于审计日志的因果方面,已经有很多先前的工作了,包括优化溯源图和报告一个简短的攻击故事 [13, 22, 25]. 这些研究方法需要在运行时通过源代码插装/静态二进制插装和动态程序插装进行系统修改。

由于软件许可证的原因,源码级别的检测适用于专有软件,而静态和动态检测会给用户系统带来额外的开销,最近的工作提出了无需插装的方法 [2, 3, 11, 12, 26],不需要对用户系统进行任何更改以进行起源跟踪。然而,这些方法中的大多数都是启发式的或者基于规则的,这需要花费大量的精力来开发和维护规则或者启发式算法。

Holmes 和 RapSheet 依赖于对抗战术/技术和程序(TTP)的知识库 [27],相比之下, ATLAS只需要攻击训练数据,就可以通过时序序列了解攻击步骤的共现情况。

基于异常检测的分析(ATLAS)

基于异常的方法 [3, 4, 28, 29] 学习正常系统的异常系统行为来识别异常行为。基于异常的方法可以检测到未知攻击,但由于用户行为随时间变化以及缺乏足够的训练数据,他们很容易出现误报。例如:

一个基于主机的入侵检测框架 UNICORN [30] 从正常的溯源图中学习到了正常行为模型来检测异常。

PrioTracker [29] 使用统计信息对节点重要性进行排序,以更准确地报告真实的攻击事件。

NoDoze [3] 通过在依赖关系图中计算和传播异常分数来减少假警报。

Winnower [28] 通过注意多个集群实例之间的差异,为集群审计提供威胁警报。

ProvDetector[4]通过从来源图中学习应用程序的正常执行路径序列来识别隐蔽恶意软件。

Deeplog [23] 将现有审计日志建模为自然语言序列,并检测异常事件。

最后,Log2vec [31] 提出了一个聚类框架,用于从系统日志中识别未发现的异常序列。

ATLAS 类似于只学习用户行为的基于异常的方法,它学习攻击和非攻击(用户)序列,并利用它们的时间和因果关系来减少误报和漏报。

基于学习的分析(ATLAS)

基于学习的攻击调查方法 [10, 32, 33] 使用机器学习技术对日志中的攻击事件进行建模。

HERCULE [10] 使用社区检测算法关联攻击事件。与ATLAS相似,最近的一些著作 [32, 33] 使用单词 embedding 将文本信息(即序列)转换为向量,以促进其学习过程。但是,这些方法仅限于在日志中识别和报告单个攻击事件。与这些方法不同,ATLAS旨在定位攻击实体,并通过将每个实体与其事件关联来构建攻击故事。

UNICORN

最近很多工作 [12, 15, 34, 35, 36, 37] 指出对于 APT 检测来说,溯源图可能是一种更好的数据源。

事实上,现在也有可用框架来进行溯源图构建 [38],但是由于并发性,这种 post-hoc 方法很难确保图的正确性 [39]. 由于很容易绕过基于库包装器的系统调用捕获机制 [41],而用户空间机制,例如ptrace,会产生不可接受的运行时性能开销 [40],并且容易受到竞争条件的影响 [41]。同样的竞争条件问题也困扰着内核机制(如,systrace [42],janus [43]),导致从检查到使用时间(TOCTOU)/从审计到使用时间(TOATOU)以及从更换到使用时间(TORTTOU)的错误。

整个系统源代码收集在操作系统级别运行,捕获所有系统活动以及他们之间的交互 [39]。Hi-Fi [39]/ LPM [44] 和 CamFlow [36]等操作系统出处系统为信息流捕获提供了强大的安全性和完整性保证。这种完整性在 APT 场景中十分重要,因为它捕获了远距离因果关系,从而支持上下文分析,即使恶意代理操纵安全敏感内核对象来进行隐蔽,也能够根据上下文来进行关联发现。

文章参照了 CamFlow [36] 作为参考实施,在第六章中可以知道,UNICORN 也可以和其他捕获机制一同工作。CanFlow 接受 Linux Security Modules(LSM) 框架[45] 的数据来确保高质量/可靠地记录数据对象之间的信息流[46, 47]。LSM通 过在内核内部而不是在系统调用接口处放置中介点来消除竞争条件(例如TOCTTOU攻击)。

基于图的异常检测(UNICORN)

Akoglu 等人根据图的属性(即静态与动态、普通与属性化)通过分类图确定异常检测的图或子图相似性[48]。

Ding 等人 [30] 采用基于割点识别网络流量图中的恶意网络源识别方法,使用相似性度量(如介数)检测跨社区通信行为。

Liu 等人 [50] 构建了一个软件行为图来描述程序执行,并使用支持向量机(SVM)根据闭合子图和频繁子图对非崩溃错误(不会导致程序崩溃的逻辑错误)进行分类。

这些系统和许多其他图形挖掘算法 [51]、[52] 以及图形相似性度量(例如,图形内核[53])仅针对静态图形设计,难以适应流式设置。

Papadimitriou 等人 [54] 提出了五种用于动态网络图的相似性方案,NetSimile [55] 使用分布矩来聚合基于 egonet 的特征(例如,邻居的数量),从而对社交网络进行聚类。Aggarwal 等人 [56] 使用结构连接性模型来定义异常值,并设计一种储层取样方法,该方法能够可靠地维护同质图形流的结构abstract。

然而,这些方法和其他面向流的方法 [57, 58, ],要么是特定领域的(例如,书目网络的结构与来源图不同),要么主要适用于同质图.

在恶意软件分类和入侵检测领域,Classy [59] 对调用图流进行聚类,以便于基于图形编辑距离(GED)[60] 对使用改进版模拟退火的图形对进行恶意软件分析。尽管其运行时复杂性适合于图流,但经验评估仅限于顶点不超过3000个的图;实际系统执行产生的图形数量级更大[47]。

StreamSpot [15] 分析流式信息流图以检测异常活动。然而,StreamSpot的图形功能受到局部约束,在 UNICORN 的包含在执行上下文中。情境化图分析对检测性能有很大影响。此外,StreamSpot 只对每个训练图的一个快照建模,在测试期间动态维护其集群。但是,它会导致大量错误警报,从而为攻击者创建一个合适的时间窗口。我们也考虑这样的方法在 APT 场景中是不合适的,其中持久攻击者可以操作模型来逐渐地和缓慢地改变系统行为以避免检测。UNICORN充分利用其不断总结演进图的能力,对其监控的系统执行的相应演进进行建模。FRAPpuccino [61] 是另一种基于图形的入侵检测尝试。它使用一种窗口化方法来实现高效的图形分析。自然地,以这种方式分割溯源图会产生更有限的系统执行视图,不适合跨窗口的长期检测。

基于溯源图的安全分析

各种与安全相关的应用程序利用溯源图,主要用途有取证分析和攻击归因 [48]。

BackTracker [62] 使用溯源图分析入侵,以识别入侵的入口点,而 Priorttracker [29] 优化了这一过程,并启用了前向跟踪功能,以便及时进行攻击因果关系分析。

HERCULE [10] 通过发现溯源图中嵌入的攻击社区来分析入侵。

Winnower [28] 通过对溯源图的语法推理来加速系统入侵调查,同时在不影响出处数据质量的情况下减少存储和网络开销。

NoDoze [3] 在溯源图中执行攻击分类,以识别异常路径。Bates等人 [44] 是第一个使用源代码来防止数据丢失的人,Park等人 [63] 正式提出了基于源代码的访问控制(PBAC)的概念。Ma等人 [22] 设计了一个轻量级起源跟踪系统 ProTracer,以缓解依赖性爆炸问题,减少空间和运行时开销,促进基于起源的实际攻击调查。Pasquier等人 [47] 介绍了一种称为CamQuery的通用框架,该框架支持内联实时来源分析,展示了未来基于来源的安全应用的巨大潜力。

最近,随着APT攻击日益突出,许多系统利用数据来源进行APT攻击分析。Holmes [12] 和 Sleuth [9] 主要关注利用数据来源提供的信息流进行攻击重建。他们的方法类似于Tariq等人 [64] 提出的一种架构,该架构使用数据源关联网格应用程序中的异常活动。异常检测模块本身使用一个简单的预定义模型,该模型依赖于现有APT杀伤链的专家知识,将可能利用漏洞的先验规范与图中的局部组件相匹配。

Poirot [1] 产生了另一种形式的攻击重建。它将一系列折衷指标(由其他系统发现)关联起来,以识别APT。根据现有网络威胁情报报告和危害描述中的专家知识构建攻击查询图,Poirot在来源图上执行离线图模式匹配,以发现潜在的APT。例如,它使用红队的攻击描述手动绘制查询图,以关联第六章节评估中使用的DARPA数据集中的异常。这是一个关键的限制,因为编写一个新的APT类别的足够详细的描述需要大量的取证工作 [34]。

UNICORN与这些基于规则的系统的不同之处在于,它是一个基于异常的系统,不需要APT攻击模式和行为的先验专家知识。尽管基于规则的方法与当今的商业实践密切相关(即,它们基本上是企业安全供应商提供的端点检测和响应(EDR)工具的源代码版本),但先前的研究表明,基于规则的EDR系统是“威胁疲劳问题”的主要贡献者。最近关于起源分析的工作(例如,NoDoze[3])表明,历史上下文文本对于缓解这一问题至关重要;相反,UNICORN研究如何将上下文作为重要部分纳入 HIDS,而不是仅仅作为分类工具。

Gao等人 [65] 利用复杂的事件处理平台,设计了一种特定于领域的查询语言 SAQL 来分析大规模流式来源数据。该系统结合了各种异常模型(例如,基于规则的异常)并跨多台主机聚合数据流,但最终需要专家领域知识来识别与查询匹配的元素/模式。我们还注意到,我们的溯源图分析能够隐式地(无需领域知识)合并其大多数异常模型(例如,基于不变量、时间序列和基于离群值)。Barre等人[34]挖掘数据来源以检测异常。他们的工作目标主要是识别可能与APT攻击相关的重要流程特征(例如,流程的生命周期和路径信息)。使用随机森林模型和手工挑选的过程特征,他们的系统的检测率只有50%左右。如此低的性能表明,在不考虑图拓扑的情况下对溯源图进行简单的特征工程,不足以检测隐藏的APT攻击。Berrada等人[35]提出了分数聚集技术,将来自不同异常检测器的异常分数结合起来,以提高检测性能。尽管他们的工作目标是APT检测的溯源图,但它与UNICORN(或任何其他检测器)的工作是相关的,因为它只作为现有异常检测系统的聚合器。

告警关联(Holmes)

从历史上看,智能决策支持系统往往会产生对操作员来说数量太多、级别太低的警报。需要开发技术来总结这些低级警报,并大大减少它们的数量。

有几种方法使用报警相关性,通过聚类相似报警和识别报警之间的因果关系来执行检测[66]、[67]、[68]、[69]、[70。例如,BotHunter[71]采用基于异常的方法来关联网络中内部和外部主机之间的对话。HERCULE [10] 使用社区发现技术关联可能分散在多个日志中的攻击步骤.

此外,业界使用类似的方法来构建SIEMs [72]、[73]、[74],以便基于来自不同数据源的日志进行警报关联和实施。这些方法依赖于在用户空间中运行的第三方应用程序生成的日志。此外,基于警报时间戳等统计特征的警报相关性无助于准确检测多阶段APT攻击,因为它们通常持续时间较长。

与这些方法不同,HOLMES基于存在于不同攻击步骤之间的信息流来进行警报关联。在这种情况下使用内核审计数据是在[75]中首次提出的。然而,与HOLMES不同的是,这项工作纯粹是基于误用的,其重点是利用事件之间的相关性来检测IDS遗漏的攻击步骤。

HOLMES使用了相同的内核审计数据,但采用了不同的方法,即构建内存占用率低的主内存依赖关系图,然后根据TTP的高级规范推导HSG以引发警报,最后根据警报之间的信息流关联警报。关于警报相关性的另一项工作依赖于警报在时间上的接近程度 [76]。相比之下,HOLMES依靠信息流和因果关系连接来关联警报,因此甚至能够检测执行步骤非常缓慢的攻击。

场景重建(Holmes)

大量研究工作集中于在取证分析、调查和恢复中生成和使用系统调用级日志[13]、 [14]、[22]、[29]、[38]、[39]、[44] 、[55]、[77]、[78]、[79]、[80]、[81]。

大多数取证分析方法可追溯到给定的攻击事件,以确定该攻击的原因。其中,BEEP[13]、ProTracer[22]和MPI[14]使用培训和代码工具和注释将流程执行划分为更小的单元,以解决依赖性爆炸问题,并提供更好的取证分析。PrioTracker[29]通过量化事件罕见性的概念来进行及时的因果关系分析,以确定异常因果依赖性调查的优先级。相比之下,HOLMES使用系统事件跟踪来执行实时检测,在检测框架中集成了取证功能,以高级攻击步骤的形式进行,而不需要仪器。

最近的研究[9]、[47]、[82]使用系统调用级别日志进行实时分析。SLEUTH[9]介绍了用于攻击检测和现场取证的基于标签的技术。HOLMES在侦查方面取得了几项重大进展。首先,它展示了如何通过使用最小祖先覆盖的概念来解决依赖爆炸问题,并为其增量计算开发了一个有效的算法。其次,SLEUTH的场景图与溯源图处于同一抽象级别,对于许多分析师来说,溯源图的抽象级别可能太低,而且缺乏HSGs中的可操作信息。第三,在长时间运行的攻击中,SLEUTH的图可能变得太大,而HOLMES通过使用降噪和优先级技术生成紧凑的HSG。

攻击粒度

有时,审计日志的粗粒度可能会限制对信息流的推理。例如,如果具有先前加载的敏感文件的进程被破坏,攻击者可以在其内存区域内搜索敏感内容,而无需使用系统调用。但是,当这些信息被过滤时,HOLMES会将过滤与该进程敏感文件(读取)的其他操作关联起来,并最终引发异常。此外,HOLMES还可以通过附加指令[83]、[84]或去耦污点跟踪[85]、[86]、[87]、[88],利用附加工作,以更精细的粒度跟踪信息流。这种细粒度的信息流跟踪可以提供更精确的起源信息,但代价是性能开销。

参考文献

[1] Milajerdi, S.M., Eshete, B., Gjomemo, R., Venkatakrishnan, V.: Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. pp. 1795–1812 (2019)

[2] Hassan, W.U., Bates, A., Marino, D.: Tactical provenance analysis for endpoint detection and response systems. In: Proceedings of the IEEE Symposium on Security and Privacy (2020)

[3] Hassan, W.U., Guo, S., Li, D., Chen, Z., Jee, K., Li, Z., Bates, A.: Nodoze: Combatting threat alert fatigue with automated provenance triage. In: NDSS (2019)

[4] Wang, Q., Hassan, W.U., Li, D., Jee, K., Yu, X., Zou, K., Rhee, J., Chen, Z., Cheng, W., Gunter, C., et al.: You are what you do: Hunting stealthy malware via data provenance analysis. In: Proc. of the Symposium on Network and Distributed System Security (NDSS) (2020)

[5] Sun, X., Dai, J., Liu, P., Singhal, A., Yen, J.: Using bayesian networks for proba- bilistic identification of zero-day attack paths. IEEE Transactions on Information Forensics and Security 13(10), 2506–2521 (2018)

[6] Oprea, A., Li, Z., Yen, T.F., Chin, S.H., Alrwais, S.: Detection of early-stage en- terprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. pp. 45–56. IEEE (2015)

[7] Xiong, C., Zhu, T., Dong, W., Ruan, L., Yang, R., Chen, Y., Cheng, Y., Cheng, S., Chen, X.: Conan: A practical real-time apt detection system with high accuracy and efficiency. IEEE Transactions on Dependable and Secure Computing (2020)

[8] Hossain, M.N., Sheikhi, S., Sekar, R.: Combating dependence explosion in forensic analysis using alternative tag propagation semantics. In: 2020 IEEE Symposium on Security and Privacy (SP). IEEE (2020)

[9] Hossain, M.N., Milajerdi, S.M., Wang, J., Eshete, B., Gjomemo, R., Sekar, R., Stoller, S., Venkatakrishnan, V.: {SLEUTH}: Real-time attack scenario recon- struction from {COTS} audit data. In: 26th {USENIX} Security Symposium ({USENIX} Security 17). pp. 487–504 (2017)

[10] Pei, K., Gu, Z., Saltaformaggio, B., Ma, S., Wang, F., Zhang, Z., Si, L., Zhang, X., Xu, D.: Hercule: Attack story reconstruction via community discovery on correlated log graph. In: Proceedings of the 32Nd Annual Conference on Computer Security Applications. pp. 583–595 (2016)

[11] Hassan, W.U., Noureddine, M.A., Datta, P., Bates, A.: Omegalog: High-fidelity attack investigation via transparent multi-layer log analysis. In: Proc. NDSS (2020)

[12] Milajerdi, S.M., Gjomemo, R., Eshete, B., Sekar, R., Venkatakrishnan, V.: Holmes: real-time apt detection through correlation of suspicious information flows. In: 2019 IEEE Symposium on Security and Privacy (SP). pp. 1137–1152. IEEE (2019)

[13] Lee, K.H., Zhang, X., Xu, D.: High accuracy attack provenance via binary-based execution partition. In: NDSS (2013)

[14] Ma, S., Zhai, J., Wang, F., Lee, K.H., Zhang, X., Xu, D.: {MPI}: Multiple per- spective attack investigation with semantic aware execution partitioning. In: 26th {USENIX} Security Symposium ({USENIX} Security 17). pp. 1111–1128 (2017)

[15] Manzoor, E., Milajerdi, S.M., Akoglu, L.: Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1035–1044 (2016)

[16] Li, Y., Gu, C., Dullien, T., Vinyals, O., Kohli, P.: Graph matching networks for learning the similarity of graph structured objects. In: Chaudhuri, K., Salakhutdi- nov, R. (eds.) Proceedings of the 36th International Conference on Machine Learn- ing. Proceedings of Machine Learning Research, vol. 97, pp. 3835–3845. PMLR, Long Beach, California, USA (09–15 Jun 2019), http://proceedings.mlr.press/v97/li19d.html

[17] Fyrbiak, M., Wallat, S., Reinhard, S., Bissantz, N., Paar, C.: Graph similarity and its applications to hardware security. IEEE Transactions on Computers 69(4), 505–519 (2019)

[18] Wang, S., Chen, Z., Yu, X., Li, D., Ni, J., Tang, L.A., Gui, J., Li, Z., Chen, H., Yu, P.S.: Heterogeneous graph matching networks for unknown malware detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. pp. 3762–3770. AAAI Press (2019)

[19] Bai, Y., Ding, H., Bian, S., Chen, T., Sun, Y., Wang, W.: Simgnn: A neural network approach to fast graph similarity computation. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. pp. 384–392 (2019)

[20] Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

[21] Socher, R., Chen, D., Manning, C.D., Ng, A.: Reasoning with neural tensor net- works for knowledge base completion. In: Advances in neural information process- ing systems. pp. 926–934 (2013)

[22] Shiqing Ma, Xiangyu Zhang, and Dongyan Xu. Protracer: Towards practical provenance tracing by alternating between logging and tainting. In Network and Distributed Systems Security Symposium, 2016.

[23] Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In ACM SIGSAC Conference on Com- puter and Communications Security, 2017.

[24] Riyanat Shittu, Alex Healing, Robert Ghanea-Hercock, Robin Bloomfield, and Muttukrishnan Rajarajan. Intrusion alert pri- oritisation and attack detection using post-correlation analysis. Computers & Security, 50:1–15, 2015.

[25] Yonghwi Kwon, Fei Wang, Weihang Wang, Kyu Hyung Lee, Wen-Chuan Lee, Shiqing Ma, Xiangyu Zhang, Dongyan Xu, Somesh Jha, Gabriela Ciocarlie, et al. Mci: Modeling-based causality inference in audit logging for attack investigation. In Network and Distributed Systems Security Symposium, 2018.

[26] Runqing Yang, Shiqing Ma, Haitao Xu, Xiangyu Zhang, and Yan Chen. Uiscope: Accurate, instrumentation-free, and visi- ble attack investigation for gui applications. In Network and Distributed Systems Symposium, 2020.

[27] MITRE. Mitre att&ck. https://attack.mitre.org/, 2020. Accessed: 2020-06-06.

[28] Wajih Ul Hassan, Mark Lemay, Nuraini Aguse, Adam Bates, and Thomas Moyer. Towards scalable cluster auditing through grammatical inference over provenance graphs. In Network and Distributed Systems Security Symposium, 2018.

[29] Yushan Liu, Mu Zhang, Ding Li, Kangkook Jee, Zhichun Li, Zhenyu Wu, Junghwan Rhee, and Prateek Mittal. Towards a timely causality analysis for enterprise security. In Network and Distributed Systems Security Symposium, 2018.

[30] Xueyuan Han, Thomas Pasquier, Adam Bates, James Mick- ens, and Margo Seltzer. Unicorn: Runtime provenance- based detector for advanced persistent threats. arXiv preprint arXiv:2001.01525, 2020.

[31] Fucheng Liu, Yu Wen, Dongxue Zhang, Xihe Jiang, Xinyu Xing, and Dan Meng. Log2vec: A heterogeneous graph em- bedding based approach for detecting cyber threats within en- terprise. In Proceedings ofthe 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019.

[32] Yun Shen, Enrico Mariconti, Pierre Antoine Vervier, and Gi- anluca Stringhini. Tiresias: Predicting security events through deep learning. In ACM SIGSAC Conference on Computer and Communications Security, 2018.

[33] Yun Shen and Gianluca Stringhini. Attack2vec: Leveraging temporal word embeddings to understand the evolution of cy- berattacks. In USENIX Security Symposium, 2019.

[34] M. Barre, A. Gehani, and V. Yegneswaran, “Mining data provenance to detect advanced persistent threats,” in 11th International Workshop on Theory and Practice of Provenance (TaPP), 2019.

[35] G. Berrada and J. Cheney, “Aggregating unsupervised provenance anomaly detectors,” in 11th International Workshop on Theory and Practice of Provenance (TaPP), 2019.

[36] T. Pasquier, X. Han, M. Goldstein, T. Moyer, D. Eyers, M. Seltzer, and J. Bacon, “Practical whole-system provenance capture,” in Symposium on Cloud Computing. ACM, 2017, pp. 405–418.

[37] L. Carata, S. Akoush, N. Balakrishnan, T. Bytheway, R. Sohan, M. Seltzer, and A. Hopper, “A primer on provenance,” ACM Queue, vol. 12, no. 3, p. 10, 2014.

[38] A. Gehani anSadegh M. Milajerdi, Birhanu Eshete, Rigel Gjomemo, and V.N. Venkatakrishnan. Propatrol: Attack investigation via extracted high- level tasks. In International Conference on Information Systems Security. Springer, 2018.d D. Tariq, “Spade: support for provenance auditing in distributed environments,” in Middleware Conference. ACM/I- FIP/USENIX, 2012, pp. 101–120.

[39] D. J. Pohly, S. McLaughlin, P. McDaniel, and K. Butler, “Hi-fi: col- lecting high-fidelity whole-system provenance,” in Computer Security Applications Conference. ACM, 2012, pp. 259–268.

[40] T. Garfinkel et al., “Traps and pitfalls: Practical problems in system call interposition based security tools.” in NDSS, vol. 3, 2003, pp. 163–176.

[41] K. Jain and R. Sekar, “User-level infrastructure for system call interposition: A platform for intrusion detection and confinement.” in NDSS, 2000.

[42] N. Provos, “Systrace-interactive policy generation for system calls,” 2006.

[43] I. Goldberg, D. Wagner, R. Thomas, E. A. Brewer et al., “A secure environment for untrusted helper applications: Confining the wily hacker,” in USENIX Security Symposium, vol. 6, 1996, pp. 1–1.

[44] A. M. Bates, D. Tian, K. R. Butler, and T. Moyer, “Trustworthy whole-system provenance for the linux kernel.” in USENIX Security Symposium, 2015, pp. 319–334.

[45] J. Morris, S. Smalley, and G. Kroah-Hartman, “Linux security mod- ules: General security support for the linux kernel,” in USENIX Security Symposium, 2002.

[46] L. Georget, M. Jaume, F. Tronel, G. Piolle, and V. V. T. Tong, “Veri- fying the reliability of operating system-level information flow control systems in linux,” in International Workshop on Formal Methods in Software Engineering. IEEE/ACM, 2017, pp. 10–16.

[47] T. Pasquier, X. Han, T. Moyer, A. Bates, O. Hermant, D. Eyers, J. Ba- con, and M. Seltzer, “Runtime analysis of whole-system provenance,” in Conference on Computer and Communications Security (CCS’18). ACM, 2018.

[48] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,” Data mining and knowledge discovery, vol. 29, no. 3, pp. 626–688, 2015.

[49] Q. Ding, N. Katenka, P. Barford, E. Kolaczyk, and M. Crovella, “Intrusion as (anti) social communication: characterization and detection,” in International Conference on Knowledge Discovery and Data Mining. ACM, 2012, pp. 886–894.

[50] C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining behavior graphs for backtrace of noncrashing bugs,” in International Conference on Data Mining. SIAM, 2005, pp. 286–297.

[51] J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, and J. Han, “On community outliers and their efficient detection in information networks,” in International Conference on Knowledge Discovery and Data Mining. ACM, 2010, pp. 813–822.

[52] B. Perozzi, L. Akoglu, P. Iglesias S´anchez, and E. M¨uller, “Focused clustering and outlier detection in large attributed graphs,” in Interna- tional Conference on Knowledge Discovery and Data Mining. ACM, 2014, pp. 1346–1355.
[53] S. V. N. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt, “Graph kernels,” Journal of Machine Learning Research, vol. 11, no. Apr, pp. 1201–1242, 2010.

[54] P. Papadimitriou, A. Dasdan, and H. Garcia-Molina, “Web graph similarity for anomaly detection,” Journal of Internet Services and Applications, vol. 1, no. 1, pp. 19–30, 2010.

[55] MS. T. King and P. M. Chen, “Backtracking intrusions,” ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 223–236, 2003… Berlingerio, D. Koutra, T. Eliassi-Rad, and C. Faloutsos, “Netsim- ile: A scalable approach to size-independent network similarity,” arXiv preprint arXiv:1209.2684, 2012.

[56] C. C. Aggarwal, Y. Zhao, and S. Y. Philip, “Outlier detection in graph streams,” in International Conference on Data Engineering (ICDE). IEEE, 2011.

[57] M. Gupta, C.Sadegh M. Milajerdi, Birhanu Eshete, Rigel Gjomemo, and V.N. Venkatakrishnan. Propatrol: Attack investigation via extracted high- level tasks. In International Conference on Information Systems Security. Springer, 2018. C. Aggarwal, J. Han, and Y. Sun, “Evolutionary clustering and analysis of bibliographic networks,” in Conference on Advances in Social Networks Analysis and Mining. IEEE, 2011, pp. 63–70.

[58] M. Mongiovi, P. Bogdanov, R. Ranca, E. E. Papalexakis, C. Faloutsos, and A. K. Singh, “Netspot: Spotting significant anomalous regions on dynamic networks,” in International Conference on Data Mining. SIAM, 2013, pp. 28–36.

[59] O. Kostakis, “Classy: fast clustering streams of call-graphs,” Data mining and knowledge discovery, vol. 28, no. 5-6, pp. 1554–1585, 2014.
[60] X. Gao, B. Xiao, D. Tao, and X. Li, “A survey of graph edit distance,” Pattern Analysis and applications, vol. 13, no. 1, pp. 113–129, 2010.
[61] X. Han, T. Pasquier, T. Ranjan, M. Goldstein, and M. Seltzer, “Frappuccino: fault-detection through runtime analysis of provenance,” in Workshop on Hot Topics in Cloud Computing (HotCloud’17). USENIX Association, 2017.

[62] S. T. King and P. M. Chen, “Backtracking intrusions,” ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 223–236, 2003.

[63] J. Park, D. Nguyen, and R. Sandhu, “A provenance-based access control model,” in International Conference on Privacy, Security and Trust. IEEE, 2012, pp. 137–144.

[64] D. Tariq, B. Baig, A. Gehani, S. Mahmood, R. Tahir, A. Aqil, and F. Zaffar, “Identifying the provenance of correlated anomalies,” in Symposium on Applied Computing. ACM, 2011, pp. 224–229.

[65] P. Gao, X. Xiao, D. Li, Z. Li, K. Jee, Z. Wu, C. H. Kim, S. R. Kulkarni, and P. Mittal, “Saql: A stream-based query system for real- time abnormal system behavior detection,” in 27th USENIX Security Symposium (USENIX Security 18), 2018, pp. 639–656.

[66] Herv´e Debar and Andreas Wespi. Aggregation and correlation of intrusion-detection alerts. In RAID. Springer, 2001.

[67] Peng Ning and Dingbang Xu. Learning attack strategies from intrusion alerts. In CCS. ACM, 2003.

[68] Correlating intrusion events and building attack scenarios through attack graph distances.

[69] Xinzhou Qin and Wenke Lee. Statistical causality analysis of infosec alert data. In RAID. Springer, 2003.

[70] Wei Wang and Thomas E Daniels. A graph based approach toward network forensics analysis. Transactions on Information and System Security (TISSEC), 2008.

[71] Guofei Gu, Phillip Porras, Vinod Yegneswaran, and Martin Fong. Bothunter: Detecting malware infection through ids-driven dialog cor- relation. In 16th USENIX Security Symposium (USENIX Security 07). USENIX Association, 2007.

[72] IBM QRadar SIEM.
https://www.ibm.com/us-en/marketplace/
ibm-qradar-siem.
[73 ] Logrhythm, the security intelligence company. https://logrhythm.com/. [
[74] SIEM, AIOps, Application Management, Log Management, Machine Learning, and Compliance. https://www.splunk.com/.
[75] Yan Zhai, Peng Ning, and Jun Xu. Integrating ids alert correlation and os-level dependency tracking. In International Conference on Intelligence and Security Informatics, pages 272–284. Springer, 2006.

[76] Christopher Kruegel, Fredrik Valeur, and Giovanni Vigna. Intrusion de- tection and correlation: challenges and solutions, volume 14. Springer Science & Business Media, 2004.

[77] Momeni Milajerdi, S., Gjomemo, R., Eshete, B., Sekar, R., & Venkatakrishnan, V. N. (2019). HOLMES: Real-time APT detection through correlation of suspicious information flows. Proceedings - IEEE Symposium on Security and Privacy, 2019-May, 1137–1152. https://doi.org/10.1109/SP.2019.00026

[78] Ashvin Goel, Kenneth Po, Kamran Farhadi, Zheng Li, and Eyal de Lara. The taser intrusion recovery system. SIGOPS Oper. Syst. Rev., 2005.

[79] Samuel T King, Zhuoqing Morley Mao, Dominic G Lucchetti, and Peter M Chen. Enriching intrusion alerts through multi-host causality. In NDSS, 2005.

[80] Sadegh M. Milajerdi, Birhanu Eshete, Rigel Gjomemo, and V.N. Venkatakrishnan. Propatrol: Attack investigation via extracted high- level tasks. In International Conference on Information Systems Security. Springer, 2018.

[81] Qi Wang, Wajih Ul Hassan, Adam Bates, and Carl Gunter. Fear and logging in the internet of things. In Network and Distributed Systems Symposium, 2018.

[82] Xiaokui Shu, Frederico Araujo, Douglas L. Schales, Marc Ph. Stoecklin, Jiyong Jang, Heqing Huang, and Josyula R. Rao. Threat intelligence computing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS ’18, pages 1883–1898, New York, NY, USA, 2018. ACM.

[83] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexan- dre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. SIGPLAN Not., 2014.
[84] Vasileios P. Kemerlis, Georgios Portokalidis, Kangkook Jee, and Ange- los D. Keromytis. Libdft: Practical Dynamic Data Flow Tracking for Commodity Systems. SIGPLAN Not., 2012.

[85] Jim Chow, Tal Garfinkel, and Peter M Chen. Decoupling dynamic program analysis from execution in virtual environments. In USENIX 2008 Annual Technical Conference on Annual Technical Conference, pages 1–14, 2008.

[86] Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee. Rain: Refinable attack investigation with on-demand inter-process information flow tracking. In Proceedings ofthe 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 377–390. ACM, 2017.

[87] Yonghwi Kwon, Fei Wang, Weihang Wang, Kyu Hyung Lee, Wen- Chuan Lee, Shiqing Ma, Xiangyu Zhang, Dongyan Xu, Somesh Jha, Gabriela Ciocarlie, et al. Mci: Modeling-based causality inference in audit logging for attack investigation. In Proc. of the 25th Network and Distributed System Security Symposium (NDSS18), 2018.

[88] Jiang Ming, Dinghao Wu, Jun Wang, Gaoyao Xiao, and Peng Liu. Straighttaint: Decoupled offline symbolic taint analysis. In Proceedings ofthe 31st IEEE/ACM International Conference on Automated Software Engineering, pages 308–319. ACM, 2016.