•  
  •  
 

Abstract

Security Operations Centers (SOCs) have been built in many institutions for intrusion detection and incident response. A SOC employs various cyber defense technologies to continually monitor and control network traffic. Given the voluminous monitoring data, cyber security analysts need to identify suspicious network activities to detect potential attacks. As the network monitoring data are generated at a rapid speed and contain a lot of noise, analysts are so bounded by tedious and repetitive data triage tasks that they can hardly concentrate on in-depth analysis for further decision making. Therefore, it is critical to employ data cleaning methods in cyber situational awareness. In this paper, we investigate the main characteristics and categories of cyber security data with a special emphasis on its heterogeneous features. We also discuss how cyber analysts attempt to understand the incoming data through the data analytical process. Based on this understanding, this paper discusses five categories of data cleaning methods for heterogeneous data and addresses the main challenges for applying data cleaning in cyber situational awareness. The goal is to create a dataset that contains accurate information for cyber analysts to work with and thus achieving higher levels of data-driven decision making in cyber defense.

Share

COinS