Open Threat Research Security Datasets data provider and browser
The OTRF Security Datasets is a project to capture host and network log data that illustrates adversarial attack patterns. Mordor is part of the Open Threat Research Forge created by Roberto Rodriquez and Jose Rodriguez. It was originally named Mordor and the MSTICPy naming still uses that (which we are rather fond of). In this document we will use Mordor and OTRF Security Datasets interchangeably - they both refer to the same thing
The Mordor project provides one of the most comprehensive libraries of attack logs - the captured logs contain not just the events directly related to the attack but also the set of benign events happening at the time of the attack. Each data set is mapped to Mitre ATT&CK techniques and tactics and includes simulation scripts to allow you to produce the same data in your environment. This makes Mordor very useful for testing detection logic - whether simple rules or in more complex machine learning scenarios requiring labelled data.
This library allows you to browse through and query Mordor data sets and query individual data sets in a similar way to other MSTICPy data providers. Like the other providers, the Mordor provider returns results as a pandas DataFrame, allowing it to be used easily in Jupyter notebooks and other Python code. Unlike other providers, it does not support custom queries or a query language. The equivalent built-in queries for the Mordor provider return the entire set of data for that item.
For more information on the OTRF data sets see the OTRF Jupyter Book documentation and the GitHub repository.
For more information on Mitre ATT&CK Techniques and Tactics see Mitre ATT&CK.
You can view a notebook that shows the use of the Mordor provider here MordorData
Using the Data Provider to download datasets
Using the data provider you can download and render event data as a pandas DataFrame.
Note
Mordor includes both host event data and network capture data. Although Capture files can be downloaded and unpacked, MSTICPy currently cannot display them in a pandas DataFrame. Most network datasets use capture (.cap) files. You can view these using tools such as tcpdump, tshark or GUI tools such as WireShark, Brim and others.
Host event data stored in JSON files is retrieved and populated into DataFrames.
To use the Mordor provider, first create a Mordor query provider. Then call the connect() function: this will download metadata from Mordor and Mitre to populate the query set.
Download progress is shown with a progress bar (not shown below).
>>> from msticpy.data import QueryProvider
>>> mdr_data = QueryProvider("Mordor")
>>> mdr_data.connect()
Retrieving Mitre data...
Retrieving Mordor data...
List Queries
Once the metadata is downloaded, the provider is populated with query functions that you can use to retrieve the datasets.
Note
Many Mordor data entries have multiple data sets, so we see more queries than Mordor entries.
You can see a list of available queries with the
list_queries
function. (Only first 15 are shown below)
>>> mdr_data.list_queries()[:15]
['small.aws.collection.ec2_proxy_s3_exfiltration',
'small.windows.collection.host.msf_record_mic',
'small.windows.credential_access.host.covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges',
'small.windows.credential_access.host.empire_dcsync_dcerpc_drsuapi_DsGetNCChanges',
'small.windows.credential_access.host.empire_mimikatz_backupkeys_dcerpc_smb_lsarpc',
'small.windows.credential_access.host.empire_mimikatz_extract_keys',
'small.windows.credential_access.host.empire_mimikatz_logonpasswords',
'small.windows.credential_access.host.empire_mimikatz_lsadump_patch',
'small.windows.credential_access.host.empire_mimikatz_sam_access',
'small.windows.credential_access.host.empire_over_pth_patch_lsass',
'small.windows.credential_access.host.empire_powerdump_sam_access',
'small.windows.credential_access.host.empire_shell_reg_dump_sam',
'small.windows.credential_access.host.empire_shell_rubeus_asktgt_createnetonly',
'small.windows.credential_access.host.empire_shell_rubeus_asktgt_ptt',
'small.windows.credential_access.host.rdp_interactive_taskmanager_lsass_dump']
Retrieving/querying a data set
To retrieve a data set, run the required query. The queries are all available as attributes of the Mordor provider.
Note
The queries support tab-completion, so as you type each segment you can use the tab key to see a list of available options.
>>> mdr_data.small.windows.credential_access.host.covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges().head(3)
https://raw.githubusercontent.com/OTRF/mordor/master/datasets/small/windows/credential_access/host/covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges.zip
Extracting covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges_2020-08-05020926.json
@version |
Keywords |
ThreadID |
Version |
DestAddress |
host |
LayerRTID |
Message |
---|---|---|---|---|---|---|---|
1 |
-9214364837600034816 |
4888 |
1 |
239.255.255.250 |
wec.internal.cloudapp.net |
44 |
The Windows Filtering Platform has permitted a connection. |
1 |
-9223372036854775808 |
4452 |
2 |
nan |
wec.internal.cloudapp.net |
nan |
File created: eventlog |
1 |
-9223372036854775808 |
4452 |
2 |
nan |
wec.internal.cloudapp.net |
nan |
RawAccessRead detected: eventlog |
Note
the table shown above has been truncated for illustration.
Optional parameters
The data provider and the query functions support some parameters to control aspects of the query operation.
use_cached : bool, optional Try to use locally saved file first, by default True. If you’ve previously downloaded a file, it will use this rather than downloading a new copy.
save_folder : str, optional Path to output folder, by default “.”. The path that downloaded and extracted files are saved to.
silent : bool If True, suppress feedback. By default, False.
If you specify these when you initialize the data provider, the settings will apply to all queries.
>>> mdr_data = QueryProvider("Mordor", save_folder="./mordor")
>>> mdr_data.connect()
Note
since the first line is creating a new instance of the Mordor provider, you will need to call “connect” again. The Mordor and Mitre metadata will be cached so you will not have to download this again in this session.
Using these parameters in the query will override the provider settings and defaults for that query.
>>> mdr_data.small.windows.credential_access.host.covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges(
save_folder="./investigation002"
)
Getting summary data about a query
Call the query function with a single “?” parameter to display summary information.
>>> mdr_data.small.windows.credential_access.host.covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges("?")
Query: covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges
Data source: Mordor
Covenant DCSync
Notes
-----
Mordor ID: SDWIN-200805020926
This dataset represents adversaries abusing Active Directory Replication services to retrieve secret domain data (i.e. NTLM hashes) from domain accounts.
Mitre Techniques: T1003: OS Credential Dumping
Mitre Tactics: TA0006: Credential Access
Parameters
----------
Query:
https://raw.githubusercontent.com/OTRF/mordor/master/datasets/small/windows/credential_access/host/covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges.zip
Searching for Queries with QueryProvider.search_queries()
You can use the provider
search_queries
function to search for queries for matching required attributes.
This function takes a single string parameter - search
.
Unless you include delimiters (see next), the search parameter treated as a literal text string to search for. It tries to match this string against any text in the metadata of the Mordor data sets. The search is case-sensitive.
Search also supports some simple search term logic and AND and OR expressions:
Substrings separated by commas will be treated as OR terms, e.g. “a, b” == “a” OR “b”.
Substrings separated by “+” will be treated as AND terms, e.g. “a + b” == “a” AND “b”
Note
You cannot combine “+” and “,” in the same search. For this reason, grouping of expressions is not supported.
The search returns a Python list of the names and descriptions of any matching queries.
Examples:
Simple text string
>>> mdr_data.search_queries("AWS")
['small.aws.collection.ec2_proxy_s3_exfiltration (AWS Cloud Bank Breach S3)']
Search for items that have both “Empire” and “T1222”.
>>> mdr_data.search_queries("Empire + T1222")
['small.windows.defense_evasion.host.empire_powerview_ldap_ntsecuritydescriptor (Empire Powerview Add-DomainObjectAcl)',
'small.windows.defense_evasion.network.empire_powerview_ldap_ntsecuritydescriptor (Empire Powerview Add-DomainObjectAcl)']
Search for items that have both “Empire” and “Credential”.
>>> mdr_data.search_queries("Empire + Credential")
['small.windows.credential_access.host.empire_dcsync_dcerpc_drsuapi_DsGetNCChanges (Empire DCSync)',
'small.windows.credential_access.network.empire_dcsync_dcerpc_drsuapi_DsGetNCChanges (Empire DCSync)',
'small.windows.defense_evasion.host.empire_wdigest_downgrade.tar (Empire WDigest Downgrade)',
'small.windows.credential_access.host.empire_mimikatz_sam_access (Empire Mimikatz SAM Extract Hashes)',
'small.windows.credential_access.host.empire_mimikatz_lsadump_patch (Empire Mimikatz Lsadump LSA Patch)',
'small.windows.credential_access.host.empire_mimikatz_logonpasswords (Empire Mimikatz LogonPasswords)']
Mordor Browser
We’ve built a specialized browser for Mordor data. This uses the metadata in the repository to let you search for and view full details of the dataset.
You can also download and preview (if it is convertible to a DataFrame) the dataset from the browser
See
MordorBrowser
.
for API details.
For more explanation of the data items shown in the browser, please see the Mordor GitHub repo and the Threat Hunter Playbook
>>> from msticpy.vis.mordor_browser import MordorBrowser
>>> mdr_browser = MordorBrowser()
The top scrollable list is a list of the Mordor datasets. Selecting one of these updates the data in the lower half of the browser.
Filter Drop-down
To narrow your search you can filter using a text search or filter by Mitre ATT&CK Techniques or Tactics. Click on the arrow to open the filter pane.
The Filter text box
This uses the same syntax as the provider search_queries()
function.
Simple text string will find matches for datasets that contain this string
Strings separated by “,” are treated as OR terms i.e. it will match items that contain ANY of the substrings
Strings separated by “+” are treated as AND terms i.e. it will match items that contain ALL of the substrings
Filtering by Mitre Categories
The Mitre ATT&CK Techniques and Tactics lists are multi-select lists. Only items that have techniques and tactics matching the selected items will be show. By default, all are selected.
Clearing the Filter
Reset Filter button will clear any filtering.
Main Details Window
title, ID, author, creation date, modification date and description are self-explanatory.
tags can be used for searching (although the search functions in the browser and data provider will search over all text).
file_paths (see File paths below)
attacks - lists related Mitre Technique and Tactics. The item title is a link to the Mitre page describing the technique or tactic.
notebooks - if there are one or more notebooks in the Threat Hunter Playbook site that relate to this dataset, descriptions and links to the notebooks are shown here.
simulation - raw data listing the steps in the attack (and useful for replaying the attack in a demo environment).
references - links to any external documents about the attack.
File paths
This section allows you to select, download and (in most cases) display the event data relating to the attack.
Select a file and click on the Download button.
The zipped file is downloaded and extracted. If it is event data, this is converted to a pandas DataFrame and displayed below the rest of the data.
The current dataset is available as an attribute of the browser:
mdr_browser.current_dataset
Datasets that you’ve downloaded and displayed in this session are also
cached in the browser and available in the mdr_browser.datasets
attribute.
Downloaded files
By default files are downloaded and extracted to the current folder. You
can change this with the save_folder
parameter when creating the
MordorBrowser
object.
You can also specify the use_cached
parameter. By default, this is
True
, which causes downloaded files not to be deleted after
extraction. These local copies are used if you try to view the same data
set again. This also works across sessions.
If use_cache
is set to False, files are deleted immediately after
downloading, extracting and populating the DataFrame.
Using the standard query browser
You can also use the standard QueryProvider query browser to view some
details of the queries. This works for all query types (not just Mordor)
but has fewer details.
See
query_browser
for more details.
>>> mdr_data.browse_queries()