What is Threat hunting?
Threat Hunting is the process of proactively looking out for threats in your organization. The traditional threat detection approach comprises analysts looking out for threats based on the alerts being triggered by SIEM or other security devices. This is an alert-driven approach. It is needed in organizations today as security attacks are growing by the day and getting more complex and difficult to detect.
To tackle such issues, we need to continuously search within our network for any kind of indicators, behavioral or static, to look out for the threats. This is where Threat Hunting comes in. It is a useful defense against threats that are not identified by your existing security solutions or the attacks that bypass your solutions.
Traditionally, manual hunting was employed, which was dependent on the hunter’s skills and expertise. We can strengthen its potency, by adding the edge of automation to our hunting approach.
Machine Learning along with threat hunting on a manual level will identify not only known threats but also Zero Days and APT, whose behavior is still not known.
Threat Hunting is an approach which needs to be followed efficiently and effectively
Threat Hunting Process
- Create Hypothesis – Hypothesis outlines what you want to look for, like looking for command prompt/PowerShell commands, making a connection to the internet, etc.
- Data Gathering – Based on the hypothesis, look out for the data you need to collect for hunting.
- Simulating hypothesis – Based on gathered data, you can simulate or test the hypothesis.
- Automating tasks – While threat hunting can never be fully automated, you can semi-automate certain key tasks
- Threat Hunting Operationalize – Now, instead of ad-hoc hunting, operationalize your hunting program so that we can perform continuous threat hunting.
Threat Hunting ways
- Manual – Analyst will hunt down the threats within the network based on hunting queries. This requires the threat hunter to be updated about the latest and trending security research.
- Analytics – “Machine Learning” and “UEBA” to inform analysts about potential risks. This helps in providing predictive and prescriptive analytics.
Threat Hunting Techniques
- Searching – use of specialized queries that return results and artifacts.
- Grouping – grouping artifacts together to identify any anomalies
- Stacking – stacking is how many times each unique value of column has occurred, like the least commonly accessed file.
- Clustering – Machine Learning model that can be used to identify any outliers based on data.
MITRE ATT&CK for hunting
MITRE ATT&CK™ is a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations. It consists of TTP’s – Tactics, Techniques, and Procedures.
MITRE has also come up with a project name “CAR” Cyber Analytics Repository. The Mitre team has listed down all those adversary behaviors and attack vectors carried out by an adversary on a victim machine. It uses TTP’s and maps them to the Cyber Kill chain.
The knowledge base consists of many Tactics, corresponding techniques to those tactics, and their underlying procedure to follow. Majorly tactics consist of the following content:-
- Initial access
- Execution
- Persistence
- Privilege escalation
- Credential access
- Defense Evasion
- Discovery
- Lateral movement
- Collection
- Command and control
- Exfiltration
- Impact
Cyber Security Analytics/Machine Learning
Cyber Security Analytics is the process of applying analytics to millions of logs that are collected from various IT infrastructure devices to identify any kind of unusual behavior or threat. In Cyber analytics, we use the Machine Learning model which in the backend learns our data, understands normal behavior and outliers, and gets applied in the real world.
The data which is required for cyber analytics can be collected in various forms such as:-
1.Endpoint logs
2.Network logs
3.User account logs/windows event viewer logs
4.Windows server/application logs
5.Threat Intelligence feeds
Machine Learning plays a very vital role in Cyber analytics. We use multiple algorithms based on our requirement and train our model based on the dataset, which can, later on, be used to provide predictable output.
Mostly, classification and clustering algorithms are used in Cyber security. There is an initial training and test set of data, which can be used to train our model.
We apply this Machine Learning model to our security logs, which makes for an interesting observation of whether such automation can help us to find the outliers.
Undoubtedly, the manual way of Threat Hunting has its own merits like practicality, but what if Machine Learning gives us the headstart to initiate our investigation? That would never hurt us!
Threat Hunting automation
We would focus on windows server logs which can be used as our chunk of data. Applying Machine Learning algorithms to this dataset would be interesting and once our model is trained, Machine Learning would help us identify which traffic is normal and which is anomalous.
1.Creation of dataset –
We could get ready datasets for windows server or web server logs from multiple websites or GitHub.
https://github.com/shramos/Awesome-Cybersecurity-Datasets https://www.kaggle.com/datasets
In case you want a customizable dataset, you can achieve this by gathering data for regular traffic and data for malicious traffic separately. This data can be used to ingest a Machine Learning algorithm. Save the dataset in a CSV file.
2.Identification of field and tokenization (NLP)-
Multiple fields are available in our data set such as UserAgent, URL, bytes, and Referer, but we will be using the Query part of the URL field to perform analytics.
Based on the query parameter and the labeled field, the Machine Learning model would be able to train itself and thus be applied to the test dataset for validation.
The next step involves tokenizing each word in the query field and using NLP (Natural Language Processing) model to convert text to numbers based on weights since the Machine Learning model is based on numbers. We will be using the Word2Vec model to perform these steps.
Applying Machine Learning model (Classification SVM)-
After converting the text to numbers based on their importance, we have data set in numbers. Let’s use the SVM classification model to train our dataset.
Our new file after the second step is renamed as payload_final.csv
We will be splitting our data into the training set and test set
- Confusion Matrix ratio check –
Kindly check the false positive ratio, by using the confusion matrix query in python. - Predictive Analytics –
Start predicting the traffic based on our Machine Learning model as normal or anomalous. This will help us to identify outliers.
Conclusion
In this way, it is clear that many modern-day cyber security solutions focus on using Machine Learning/AI in the back-end work as stated above. In this paper, we have manually followed steps in a way that solutions work.
We could say that attack and defense are two sides of the same coin. There is always a time game that is involved between the two teams. Threat hunters can be looked at such individuals who are capable of detecting advanced threats within the organization. So when we use Machine Learning to assist them in the task of identification of APT and Zero-day threats, it becomes a valuable and time-saving endeavor.
Enjoy Threat Hunting!
References
- www.kaggle.com – Datasets
- www.analyticsvidhya.com – Datasets
- Machine Learning and Cybersecurity – A book by Micah Musser and Ashton Garriott.