msticpy.sectools package

msticpy.sectools.auditdextract module

Auditd extractor.

Module to load and decode Linux audit logs. It collapses messages sharing the same message ID into single events, decodes hex-encoded data fields and performs some event-specific formatting and normalization (e.g. for process start events it will re-assemble the process command line arguments into a single string). This is still a work-in-progress.

msticpy.sectools.auditdextract.cluster_auditd_processes(audit_data: pandas.core.frame.DataFrame, app: str = None) → pandas.core.frame.DataFrame

Clusters process data into specific processes.

Parameters:
  • audit_data (pd.DataFrame) – The Audit data containing process creation events
  • app (str, optional) – The name of a specific app you wish to cluster
Returns:

Details of the clustered process

Return type:

pd.DataFrame

msticpy.sectools.auditdextract.extract_events_to_df(data: pandas.core.frame.DataFrame, input_column: str = 'AuditdMessage', event_type: str = None, verbose: bool = False) → pandas.core.frame.DataFrame

Extract auditd raw messages into a dataframe.

Parameters:
  • data (pd.DataFrame) – The input dataframe with raw auditd data in a single string column
  • input_column (str, optional) – the input column name (the default is ‘AuditdMessage’)
  • event_type (str, optional) – the event type, if None, defaults to all (the default is None)
  • verbose (bool, optional) – Give feedback on stages of processing (the default is False)
Returns:

The resultant DataFrame

Return type:

pd.DataFrame

msticpy.sectools.auditdextract.generate_process_tree(audit_data: pandas.core.frame.DataFrame, branch_depth: int = 4, processes: pandas.core.frame.DataFrame = None) → pandas.core.frame.DataFrame

Generate process tree data from auditd logs.

Parameters:
  • audit_data (pd.DataFrame) – The Audit data containing process creation events
  • branch_depth (int, optional) – The maximum depth of parent or child processes to extract from the data (The default is 4)
  • processes (pd.DataFrame, optional) – Dataframe of processes to generate tree for
Returns:

The formatted process tree data

Return type:

pd.DataFrame

msticpy.sectools.auditdextract.get_event_subset(data: pandas.core.frame.DataFrame, event_type: str) → pandas.core.frame.DataFrame

Return a subset of the events matching type event_type.

Parameters:
  • data (pd.DataFrame) – The input data
  • event_type (str) – The event type to select
Returns:

The subset of the data where data[‘EventType’] == event_type

Return type:

pd.DataFrame

msticpy.sectools.auditdextract.read_from_file(filepath: str, event_type: str = None, verbose: bool = False, dummy_sep: str = '\t') → pandas.core.frame.DataFrame

Extract Audit events from a log file.

Parameters:
  • filepath (str) – path to the input file
  • event_type (str, optional) – The type of event to extract if only a subset required. (the default is None, which processes all types)
  • verbose (bool, optional) – If true more progress messages are output (the default is False)
  • dummy_sep (str, optional) – Separator to use for reading the ‘csv’ file (default is tab - ‘t’)
Returns:

The output DataFrame

Return type:

pd.DataFrame

Notes

The dummy_sep parameter should be a character that does not occur in an input line. This function uses pandas read_csv to read the audit lines into a single column. Using a separator that does appear in the input (e.g. space or comma) will cause data to be parsed into muliple columns and anything after the first separator in a line will be lost.

msticpy.sectools.auditdextract.unpack_auditd(audit_str: List[Dict[str, str]]) → Mapping[str, Mapping[str, Any]]

Unpack an Audit message and returns a dictionary of fields.

Parameters:audit_str (str) – The auditd raw record
Returns:The extracted message fields and values
Return type:Mapping[str, Any]

msticpy.sectools.base64unpack module

base64_unpack.

The main function of this module is to decode and unpack strings that are obfuscated using base64 and/or certain compression algorithms such as gzip and zip.

It has the following functions: unpack_items - this is the main entry point and takes either a string or a pandas dataframe (with specified column) as input. It returns a string with obfuscated parts replaced by decoded equivalents (unless the decoding results in an undecodable binary, in which case a placeholder is used).

Other helper functions may also be useful standalone get_items_from_gzip(binary): Return decompressed gzip content of byte string get_items_from_zip(binary): Return dictionary of zip contents from byte string get_items_from_tar(binary): Return dictionary of tar file contents get_hashes(binary): Return md5, sha1 and sha256 hashes of input byte string

class msticpy.sectools.base64unpack.B64ExtractAccessor(pandas_obj)

Bases: object

Base64 Unpack pandas extension.

Initialize the extension.

extract(column, **kwargs) → pandas.core.frame.DataFrame

Base64 decode strings taken from a pandas dataframe.

Parameters:
  • data (pd.DataFrame) – dataframe containing column to decode
  • column (str) – Name of dataframe text column
  • trace (bool, optional) – Show additional status (the default is None)
  • utf16 (bool, optional) – Attempt to decode UTF16 byte strings
Returns:

Decoded string and additional metadata in dataframe

Return type:

pd.DataFrame

Notes

Items that decode to utf-8 or utf-16 strings will be returned as decoded strings replaced in the original string. If the encoded string is a known binary type it will identify the file type and return the hashes of the file. If any binary types are known archives (zip, tar, gzip) it will unpack the contents of the archive. For any binary it will return the decoded file as a byte array, and as a printable list of byte values.

The columns of the output DataFrame are:

  • decoded string: this is the input string with any decoded sections replaced by the results of the decoding
  • reference : this is an index that matches an index number in the decoded string (e.g. <<encoded binary type=pdf index=1.2’).
  • original_string : the string prior to decoding - file_type : the type of file if this could be determined
  • file_hashes : a dictionary of hashes (the md5, sha1 and sha256 hashes are broken out into separate columns)
  • input_bytes : the binary image as a byte array
  • decoded_string : printable form of the decoded string (either string or list of hex byte values)
  • encoding_type : utf-8, utf-16 or binary
  • md5, sha1, sha256 : the respective hashes of the binary file_type, file_hashes, input_bytes, md5, sha1, sha256 will be null if this item is decoded to a string
  • src_index - the index of the source row in the input frame.
class msticpy.sectools.base64unpack.BinaryRecord(reference, original_string, file_name, file_type, input_bytes, decoded_string, encoding_type, file_hashes, md5, sha1, sha256, printable_bytes)

Bases: tuple

Create new instance of BinaryRecord(reference, original_string, file_name, file_type, input_bytes, decoded_string, encoding_type, file_hashes, md5, sha1, sha256, printable_bytes)

count()

Return number of occurrences of value.

decoded_string

Alias for field number 5

encoding_type

Alias for field number 6

file_hashes

Alias for field number 7

file_name

Alias for field number 2

file_type

Alias for field number 3

index()

Return first index of value.

Raises ValueError if the value is not present.

input_bytes

Alias for field number 4

md5

Alias for field number 8

original_string

Alias for field number 1

printable_bytes

Alias for field number 11

reference

Alias for field number 0

sha1

Alias for field number 9

sha256

Alias for field number 10

msticpy.sectools.base64unpack.get_hashes(binary: bytes) → Dict[str, str]

Return md5, sha1 and sha256 hashes of input byte string.

Parameters:binary (bytes) – byte string of item to be hashed
Returns:dictionary of hash algorithm + hash value
Return type:Dict[str, str]
msticpy.sectools.base64unpack.get_items_from_gzip(binary: bytes) → Tuple[str, Dict[str, bytes]]

Return decompressed gzip contents.

Parameters:binary (bytes) – byte array of gz file
Returns:File type + decompressed file
Return type:Tuple[str, bytes]
msticpy.sectools.base64unpack.get_items_from_tar(binary: bytes) → Tuple[str, Dict[str, bytes]]

Return dictionary of tar file contents.

Parameters:binary (bytes) – byte array of zip file
Returns:Filetype + dictionary of file name + file content
Return type:Tuple[str, Dict[str, bytes]]
msticpy.sectools.base64unpack.get_items_from_zip(binary: bytes) → Tuple[str, Dict[str, bytes]]

Return dictionary of zip contents.

Parameters:binary (bytes) – byte array of zip file
Returns:Filetype + dictionary of file name + file content
Return type:Tuple[str, Dict[str, bytes]]
msticpy.sectools.base64unpack.unpack(input_string: str, trace: bool = False, utf16: bool = False) → Tuple[str, Optional[List[msticpy.sectools.base64unpack.BinaryRecord]]]

Base64 decode an input string.

Parameters:
  • input_string (str, optional) – single string to decode (the default is None)
  • trace (bool, optional) – Show additional status (the default is None)
  • utf16 (bool, optional) – Attempt to decode UTF16 byte strings
Returns:

Decoded string and additional metadata

Return type:

Tuple[str, Optional[List[BinaryRecord]]]

Notes

Items that decode to utf-8 or utf-16 strings will be returned as decoded strings replaced in the original string. If the encoded string is a known binary type it will identify the file type and return the hashes of the file. If any binary types are known archives (zip, tar, gzip) it will unpack the contents of the archive. For any binary it will return the decoded file as a byte array, and as a printable list of byte values. If the input is a string the function returns:

  • decoded string: this is the input string with any decoded sections replaced by the results of the decoding
msticpy.sectools.base64unpack.unpack_df(data: pandas.core.frame.DataFrame, column: str, trace: bool = False, utf16: bool = False) → pandas.core.frame.DataFrame

Base64 decode strings taken from a pandas dataframe.

Parameters:
  • data (pd.DataFrame) – dataframe containing column to decode
  • column (str) – Name of dataframe text column
  • trace (bool, optional) – Show additional status (the default is None)
  • utf16 (bool, optional) – Attempt to decode UTF16 byte strings
Returns:

Decoded string and additional metadata in dataframe

Return type:

pd.DataFrame

Notes

Items that decode to utf-8 or utf-16 strings will be returned as decoded strings replaced in the original string. If the encoded string is a known binary type it will identify the file type and return the hashes of the file. If any binary types are known archives (zip, tar, gzip) it will unpack the contents of the archive. For any binary it will return the decoded file as a byte array, and as a printable list of byte values.

The columns of the output DataFrame are:

  • decoded string: this is the input string with any decoded sections replaced by the results of the decoding
  • reference : this is an index that matches an index number in the decoded string (e.g. <<encoded binary type=pdf index=1.2’).
  • original_string : the string prior to decoding - file_type : the type of file if this could be determined
  • file_hashes : a dictionary of hashes (the md5, sha1 and sha256 hashes are broken out into separate columns)
  • input_bytes : the binary image as a byte array
  • decoded_string : printable form of the decoded string (either string or list of hex byte values)
  • encoding_type : utf-8, utf-16 or binary
  • md5, sha1, sha256 : the respective hashes of the binary file_type, file_hashes, input_bytes, md5, sha1, sha256 will be null if this item is decoded to a string
  • src_index - the index of the source row in the input frame.
msticpy.sectools.base64unpack.unpack_items(input_string: str = None, data: pandas.core.frame.DataFrame = None, column: str = None, trace: bool = False, utf16: bool = False) → Any

Base64 decode an input string or strings taken from a pandas dataframe.

Parameters:
  • input_string (str, optional) – single string to decode (the default is None)
  • data (pd.DataFrame, optional) – dataframe containing column to decode (the default is None)
  • column (str, optional) – Name of dataframe text column (the default is None)
  • trace (bool, optional) – Show additional status (the default is None)
  • utf16 (bool, optional) – Attempt to decode UTF16 byte strings
Returns:

  • Tuple[str, pd.DataFrame] (if input_string) – Decoded string and additional metadata
  • pd.DataFrame – Decoded stringa and additional metadata in dataframe

Notes

If the input is a dataframe you must supply the name of the column to use.

Items that decode to utf-8 or utf-16 strings will be returned as decoded strings replaced in the original string. If the encoded string is a known binary type it will identify the file type and return the hashes of the file. If any binary types are known archives (zip, tar, gzip) it will unpack the contents of the archive. For any binary it will return the decoded file as a byte array, and as a printable list of byte values. If the input is a string the function returns:

  • decoded string: this is the input string with any decoded sections replaced by the results of the decoding

It also returns the data as a Pandas DataFrame with the following columns:

  • reference : this is an index that matches an index number in the returned string (e.g. <<encoded binary type=pdf index=1.2’).
  • original_string : the string prior to decoding - file_type : the type of file if this could be determined
  • file_hashes : a dictionary of hashes (the md5, sha1 and sha256 hashes are broken out into separate columns)
  • input_bytes : the binary image as a byte array
  • decoded_string : printable form of the decoded string (either string or list of hex byte values)
  • encoding_type : utf-8, utf-16 or binary
  • md5, sha1, sha256 : the respective hashes of the binary file_type, file_hashes, input_bytes, md5, sha1, sha256 will be null if this item is decoded to a string

If the input is a dataframe the output dataframe will also include the following column: - src_index - the index of the source row in the input frame. This allows you to re-join the output data to the input data.

msticpy.sectools.cmd_line module

cmd_line - Syslog Command processing module.

Contains a series of functions required to correct collect, parse and visualise linux syslog data.

Designed to support standard linux syslog for investigations where auditd is not avalaible.

msticpy.sectools.cmd_line.cmd_speed(cmd_events: pandas.core.frame.DataFrame, cmd_field: str, time: int = 5, events: int = 10) → list

Detect patterns of cmd_line activity whose speed of execution may be suspicious.

Parameters:
  • cmd_events (pd.DataFrame) – A DataFrame of all sudo events to check.
  • cmd_field (str) – The column of the event data that contains command line activity
  • time (int, optional) – Time window in seconds in which to evaluate speed of execution against (Defaults to 5)
  • events (int, optional) – Number of syslog command execution events in which to evaluate speed of execution against (Defaults to 10)
Returns:

risky suspicious_actions – A list of commands that match a risky pattern

Return type:

list

Raises:

AttributeError – If cmd_field is not in supplied data set or TimeGenerated note datetime format

msticpy.sectools.cmd_line.risky_cmd_line(events: pandas.core.frame.DataFrame, log_type: str, detection_rules: str = '/home/docs/checkouts/readthedocs.org/user_builds/msticpy/envs/latest/lib/python3.7/site-packages/msticpy-0.6.0-py3.7.egg/msticpy/resources/cmd_line_rules.json', cmd_field: str = 'Command') → dict

Detect patterns of risky commands in syslog messages.

Risky patterns are defined in a json format file.

Parameters:
  • events (pd.DataFrame) – A DataFrame of all syslog events potentially containing risky command line activity.
  • log_type (str) – The log type of the data included in events. Must correspond to a detection type in detection_rules file.
  • detection_rules (str, optional) – Path to json file containing patterns of risky activity to detect. (Defaults to msticpy/resources/cmd_line_rules.json)
  • cmd_field (str, optional;) – The column in the events dataset that contains the command lines to be analysed. (Defaults to “Command”)
Returns:

risky actions – A dictionary of commands that match a risky pattern

Return type:

dict

Raises:

MsticpyException – The provided dataset does not contain the cmd_field field

msticpy.sectools.eventcluster module

eventcluster module.

This module is intended to be used to summarize large numbers of events into clusters of different patterns. High volume repeating events can often make it difficult to see unique and interesting items.

The module contains functions to generate clusterable features from string data. For example, an administration command that does some maintenance on thousands of servers with a commandline such as: install-update -hostname {host.fqdn} -tmp:/tmp/{GUID}/rollback can be collapsed into a single cluster pattern by ignoring the character values in the string and using delimiters or tokens to group the values.

This is an unsupervised learning module implemented using SciKit Learn DBScan.

Contains: dbcluster_events: generic clustering method using DBSCAN designed to summarize process events and other similar data by grouping on common features.

add_process_features: derives numerical features from text features such as commandline and process path.

msticpy.sectools.eventcluster.add_process_features(input_frame: pandas.core.frame.DataFrame, path_separator: str = None, force: bool = False) → pandas.core.frame.DataFrame

Add numerical features based on patterns of command line and process name.

Parameters:
  • input_frame (pd.DataFrame) – The input dataframe
  • path_separator (str, optional) – Path separator. If not supplied, try to determine from ‘NewProcessName’ column of first 10 rows (the default is None)
  • force (bool, optional) – Forces re-calculation of feature columns even if they already exist (the default is False)
Returns:

Copy of the dataframe with the additional numeric features

Return type:

pd.DataFrame

Notes

Features added:

  • processNameLen: length of process file name (inc path)
  • processNameTokens: the number of elements in the path
  • processName: the process file name (minus path)
  • commandlineTokens: number of space-separated tokens in the command line
  • commandlineLen: length of the command line
  • commandlineLogLen: log10 length of commandline
  • isSystemSession: 1 if session Id is 0x3e7 for Windows or -1 for Linux
  • commandlineTokensFull: counts number of token separators in commandline [\s-\/.,”’|&:;%$()]
  • pathScore: sum of ord() value of characters in path
  • pathLogScore: log10 of pathScore
  • commandlineScore: sum of ord() value of characters in commandline
  • commandlineLogScore: log10 of commandlineScore
msticpy.sectools.eventcluster.char_ord_score

Return sum of ord values of characters in string.

Parameters:
  • value (str) – Data to process
  • scale (int, optional) – reduce the scale of the feature (reducing the influence of variations this feature on the clustering algorithm (the default is 1)
Returns:

[description]

Return type:

int

Notes

This function sums the ordinal value of each character in the input string. Two strings with minor differences will result in a similar score. However, for strings with highly variable content (e.g. command lines or http requests containing GUIDs) this may result in too much variance to be useful when you are trying to detect similar patterns. You can use the scale parameter to reduce the influence of features using this function on clustering and anomaly algorithms.

msticpy.sectools.eventcluster.char_ord_score_df(data: pandas.core.frame.DataFrame, column: str, scale: int = 1) → pandas.core.series.Series

Return sum of ord values of characters in string.

Parameters:
  • data (pd.DataFrame) – The DataFrame to process
  • column (str) – Column name to process
  • scale (int, optional) – reduce the scale of the feature (reducing the influence of variations this feature on the clustering algorithm (the default is 1)
Returns:

The sum of the ordinal values of the characters in column.

Return type:

pd.Series

Notes

This function sums the ordinal value of each character in the input string. Two strings with minor differences will result in a similar score. However, for strings with highly variable content (e.g. command lines or http requests containing GUIDs) this may result in too much variance to be useful when you are trying to detect similar patterns. You can use the scale parameter to reduce the influence of features using this function on clustering and anomaly algorithms.

msticpy.sectools.eventcluster.crc32_hash

Return the CRC32 hash of the input column.

Parameters:value (str) – Data to process
Returns:CRC32 hash
Return type:int
msticpy.sectools.eventcluster.crc32_hash_df(data: pandas.core.frame.DataFrame, column: str) → pandas.core.series.Series

Return the CRC32 hash of the input column.

Parameters:
  • data (pd.DataFrame) – The DataFrame to process
  • column (str) – Column name to process
Returns:

CRC32 hash of input column

Return type:

pd.Series

msticpy.sectools.eventcluster.dbcluster_events(data: Any, cluster_columns: List[Any] = None, verbose: bool = False, normalize: bool = True, time_column: str = 'TimeCreatedUtc', max_cluster_distance: float = 0.01, min_cluster_samples: int = 2, **kwargs) → Tuple[pandas.core.frame.DataFrame, sklearn.cluster._dbscan.DBSCAN, numpy.ndarray]

Cluster data set according to cluster_columns features.

Parameters:
  • data (Any) – Input data as a pandas DataFrame or numpy array
  • cluster_columns (List[Any], optional) – List of columns to use for features - for DataFrame this is a list of column names - for numpy array this is a list of column indexes
  • verbose (bool, optional) – Print additional information about clustering results (the default is False)
  • normalize (bool, optional) – Normalize the input data (should probably always be True)
  • time_column (str, optional) – If there is a time column the output data will be ordered by this (the default is ‘TimeCreatedUtc’)
  • max_cluster_distance (float, optional) – DBSCAN eps (max cluster member distance) (the default is 0.01)
  • min_cluster_samples (int, optional) – DBSCAN min_samples (the minimum cluster size) (the default is 2)
Other Parameters:
 

kwargs (Other arguments are passed to DBSCAN constructor)

Returns:

Output dataframe with clustered rows DBSCAN model Normalized data set

Return type:

Tuple[pd.DataFrame, DBSCAN, np.ndarray]

msticpy.sectools.eventcluster.delim_count

Count the delimiters in input column.

Parameters:
  • value (str) – Data to process
  • delim_list (str, optional) – delimiters to use. (the default is r’[\s\-\\/.,”\’|&:;%$()]’)
Returns:

Count of delimiters in the string.

Return type:

int

msticpy.sectools.eventcluster.delim_count_df(data: pandas.core.frame.DataFrame, column: str, delim_list: str = '[\\s\\-\\\\/\\., "\\\'|&:;%$()]') → pandas.core.series.Series

Count the delimiters in input column.

Parameters:
  • data (pd.DataFrame) – The DataFrame to process
  • column (str) – The name of the column to process
  • delim_list (str, optional) – delimiters to use. (the default is r’[\s\-\\/.,”\’|&:;%$()]’)
Returns:

Count of delimiters in the string in column.

Return type:

pd.Series

msticpy.sectools.eventcluster.delim_hash

Return a hash (CRC32) of the delimiters from input column.

Parameters:
  • value (str) – Data to process
  • delim_list (str, optional) – delimiters to use. (the default is r’[\s\-\\/.,”\’|&:;%$()]’)
Returns:

Hash of delimiter set in the string.

Return type:

int

msticpy.sectools.eventcluster.plot_cluster(db_cluster: sklearn.cluster._dbscan.DBSCAN, data: pandas.core.frame.DataFrame, x_predict: numpy.ndarray, plot_label: str = None, plot_features: Tuple[int, int] = (0, 1), verbose: bool = False, cut_off: int = 3, xlabel: str = None, ylabel: str = None)

Plot clustered data as scatter chart.

Parameters:
  • db_cluster (DBSCAN) – DBScan Cluster (from SkLearn DBSCAN).
  • data (pd.DataFrame) – Dataframe containing original data.
  • x_predict (np.ndarray) – The DBSCAN predict numpy array
  • plot_label (str, optional) – If set the column to use to label data points (the default is None)
  • plot_features (Tuple[int, int], optional) – Which two features in x_predict to plot (the default is (0, 1))
  • verbose (bool, optional) – Verbose execution with some extra info (the default is False)
  • cut_off (int, optional) – The cluster size below which items are considered outliers (the default is 3)
  • xlabel (str, optional) – x-axis label (the default is None)
  • ylabel (str, optional) – y-axis label (the default is None)
msticpy.sectools.eventcluster.token_count

Return count of delimiter-separated tokens pd.Series column.

Parameters:
  • value (str) – Data to process
  • delimiter (str, optional) – Delimiter used to split the column string. (the default is ‘ ‘)
Returns:

count of tokens

Return type:

int

msticpy.sectools.eventcluster.token_count_df(data: pandas.core.frame.DataFrame, column: str, delimiter: str = ' ') → pandas.core.series.Series

Return count of delimiter-separated tokens pd.Series column.

Parameters:
  • data (pd.DataFrame) – The DataFrame to process
  • column (str) – Column name to process
  • delimiter (str, optional) – Delimiter used to split the column string. (the default is ‘ ‘)
Returns:

count of tokens in strings in column

Return type:

pd.Series

msticpy.sectools.geoip module

Geoip Lookup module using IPStack and Maxmind GeoLite2.

Geographic location lookup for IP addresses. This module has two classes for different services:

Both services offer a free tier for non-commercial use. However, a paid tier will normally get you more accuracy, more detail and a higher throughput rate. Maxmind geolite uses a downloadable database, while IPStack is an online lookup (API key required).

exception msticpy.sectools.geoip.GeoIPDatabaseException

Bases: Exception

Exception when GeoIP database cannot be found.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class msticpy.sectools.geoip.GeoIpLookup

Bases: object

Abstract base class for GeoIP Lookup classes.

See also

IPStackLookup
IPStack GeoIP Implementation
GeoLiteLookup
MaxMind GeoIP Implementation

Initialize instance of GeoIpLookup class.

df_lookup_ip(data: pandas.core.frame.DataFrame, column: str) → pandas.core.frame.DataFrame

Lookup Geolocation data from a pandas Dataframe.

Parameters:
  • data (pd.DataFrame) – pandas dataframe containing IpAddress column
  • column (str) – the name of the dataframe column to use as a source
Returns:

Copy of original dataframe with IP Location information columns appended (where a location lookup was successful)

Return type:

pd.DataFrame

lookup_ip(ip_address: str = None, ip_addr_list: collections.abc.Iterable = None, ip_entity: msticpy.nbtools.entityschema.IpAddress = None) → Tuple[List[Any], List[msticpy.nbtools.entityschema.IpAddress]]

Lookup IP location abstract method.

Parameters:
  • ip_address (str, optional) – a single address to look up (the default is None)
  • ip_addr_list (Iterable, optional) – a collection of addresses to lookup (the default is None)
  • ip_entity (IpAddress, optional) – an IpAddress entity (the default is None) - any existing data in the Location property will be overwritten
Returns:

raw geolocation results and same results as IpAddress entities with populated Location property.

Return type:

Tuple[List[Any], List[IpAddress]]

class msticpy.sectools.geoip.GeoLiteLookup(api_key: Optional[str] = None, db_folder: Optional[str] = None, force_update: bool = False, auto_update: bool = True)

Bases: msticpy.sectools.geoip.GeoIpLookup

GeoIP Lookup using MaxMindDB database.

See also

GeoIpLookup
Abstract base class
IPStackLookup
IPStack GeoIP Implementation

Return new instance of GeoLiteLookup class.

Parameters:
  • api_key (str, optional) – Default is None - use configuration value from msticpyconfig.yaml. API Key from MaxMind - Read more about GeoLite2 : https://dev.maxmind.com/geoip/geoip2/geolite2/ Sign up for a MaxMind account: https://www.maxmind.com/en/geolite2/signup Set your password and create a license key: https://www.maxmind.com/en/accounts/current/license-key
  • db_folder (str, optional) – Provide absolute path to the folder containing MMDB file (e.g. ‘/usr/home’ or ‘C:/maxmind’). If no path provided, it is set to download to .msticpy/GeoLite2 under user`s home directory.
  • force_update (bool, optional) – Force update can be set to true or false. depending on it, new download request will be initiated.
  • auto_update (bool, optional) – Auto update can be set to true or false. depending on it, new download request will be initiated if age criteria is matched.
df_lookup_ip(data: pandas.core.frame.DataFrame, column: str) → pandas.core.frame.DataFrame

Lookup Geolocation data from a pandas Dataframe.

Parameters:
  • data (pd.DataFrame) – pandas dataframe containing IpAddress column
  • column (str) – the name of the dataframe column to use as a source
Returns:

Copy of original dataframe with IP Location information columns appended (where a location lookup was successful)

Return type:

pd.DataFrame

lookup_ip(ip_address: str = None, ip_addr_list: collections.abc.Iterable = None, ip_entity: msticpy.nbtools.entityschema.IpAddress = None) → Tuple[List[Any], List[msticpy.nbtools.entityschema.IpAddress]]

Lookup IP location from GeoLite2 data created by MaxMind.

Parameters:
  • ip_address (str, optional) – a single address to look up (the default is None)
  • ip_addr_list (Iterable, optional) – a collection of addresses to lookup (the default is None)
  • ip_entity (IpAddress, optional) – an IpAddress entity (the default is None) - any existing data in the Location property will be overwritten
Returns:

raw geolocation results and same results as IpAddress entities with populated Location property.

Return type:

Tuple[List[Any], List[IpAddress]]

class msticpy.sectools.geoip.IPStackLookup(api_key: Optional[str] = None, bulk_lookup: bool = False)

Bases: msticpy.sectools.geoip.GeoIpLookup

IPStack GeoIP Implementation.

See also

GeoIpLookup
Abstract base class
GeoLiteLookup
MaxMind GeoIP Implementation

Create a new instance of IPStackLookup.

Parameters:
  • api_key (str, optional) – API Key from IPStack - see https://ipstack.com default is None - obtain key from msticpyconfig.yaml
  • bulk_lookup (bool, optional) – For Professional and above tiers allowing you to submit multiple IPs in a single request. (the default is False, which submits a single request per address)
df_lookup_ip(data: pandas.core.frame.DataFrame, column: str) → pandas.core.frame.DataFrame

Lookup Geolocation data from a pandas Dataframe.

Parameters:
  • data (pd.DataFrame) – pandas dataframe containing IpAddress column
  • column (str) – the name of the dataframe column to use as a source
Returns:

Copy of original dataframe with IP Location information columns appended (where a location lookup was successful)

Return type:

pd.DataFrame

lookup_ip(ip_address: str = None, ip_addr_list: collections.abc.Iterable = None, ip_entity: msticpy.nbtools.entityschema.IpAddress = None) → Tuple[List[Any], List[msticpy.nbtools.entityschema.IpAddress]]

Lookup IP location from IPStack web service.

Parameters:
  • ip_address (str, optional) – a single address to look up (the default is None)
  • ip_addr_list (Iterable, optional) – a collection of addresses to lookup (the default is None)
  • ip_entity (IpAddress, optional) – an IpAddress entity (the default is None) - any existing data in the Location property will be overwritten
Returns:

raw geolocation results and same results as IpAddress entities with populated Location property.

Return type:

Tuple[List[Any], List[IpAddress]]

Raises:
  • ConnectionError – Invalid status returned from http request
  • PermissionError – Service refused request (e.g. requesting batch of addresses on free tier API key)
msticpy.sectools.geoip.entity_distance(ip_src: msticpy.nbtools.entityschema.IpAddress, ip_dest: msticpy.nbtools.entityschema.IpAddress) → float

Return distance between two IP Entities.

Parameters:
  • ip_src (IpAddress) – Source/Origin IpAddress Entity
  • ip_dest (IpAddress) – Destination IpAddress Entity
Returns:

Distance in kilometers.

Return type:

float

Raises:

AttributeError – If either entity has no location information

msticpy.sectools.geoip.geo_distance(origin: Tuple[float, float], destination: Tuple[float, float]) → float

Calculate the Haversine distance.

Parameters:
  • origin (Tuple[float, float]) – Latitude, Longitude of origin of distance measurement.
  • destination (Tuple[float, float]) – Latitude, Longitude of origin of distance measurement.
Returns:

Distance in kilometers.

Return type:

float

Examples

>>> origin = (48.1372, 11.5756)  # Munich
>>> destination = (52.5186, 13.4083)  # Berlin
>>> round(geo_distance(origin, destination), 1)
504.2

Notes

Author: Martin Thoma - stackoverflow

msticpy.sectools.iocextract module

Module for IoCExtract class.

Uses a set of builtin regular expressions to look for Indicator of Compromise (IoC) patterns. Input can be a single string or a pandas dataframe with one or more columns specified as input.

The following types are built-in:

  • IPv4 and IPv6
  • URL
  • DNS domain
  • Hashes (MD5, SHA1, SHA256)
  • Windows file paths
  • Linux file paths (this is kind of noisy because a legal linux file path can have almost any character) You can modify or add to the regular expressions used at runtime.
class msticpy.sectools.iocextract.IoCExtract

Bases: object

IoC Extractor - looks for common IoC patterns in input strings.

The extract() method takes either a string or a pandas DataFrame as input. When using the string option as an input extract will return a dictionary of results. When using a DataFrame the results will be returned as a new DataFrame with the following columns: IoCType: the mnemonic used to distinguish different IoC Types Observable: the actual value of the observable SourceIndex: the index of the row in the input DataFrame from which the source for the IoC observable was extracted.

The class has a number of built-in IoC regex definitions. These can be retrieved using the ioc_types attribute.

Addition IoC definitions can be added using the add_ioc_type method.

Note: due to some ambiguity in the regular expression patterns for different types and observable may be returned assigned to multiple observable types. E.g. 192.168.0.1 is a also a legal file name in both Linux and Windows. Linux file names have a particularly large scope in terms of legal characters so it will be quite common to see other IoC observables (or parts of them) returned as a possible linux path.

Intialize new instance of IoCExtract.

DNS_REGEX = '((?=[a-z0-9-]{1,63}\\.)[a-z0-9]+(-[a-z0-9]+)*\\.){1,126}[a-z]{2,63}'
IPV4_REGEX = '(?P<ipaddress>(?:[0-9]{1,3}\\.){3}[0-9]{1,3})'
IPV6_REGEX = '(?<![:.\\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\\w])'
LXPATH_REGEX = '(?P<root>/+||[.]+)\n (?P<folder>/(?:[^\\\\/:*?<>|\\r\\n]+/)*)\n (?P<file>[^/\\0<>|\\r\\n ]+)'
MD5_REGEX = '(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{32})(?:$|[^A-Fa-f0-9])'
SHA1_REGEX = '(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{40})(?:$|[^A-Fa-f0-9])'
SHA256_REGEX = '(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{64})(?:$|[^A-Fa-f0-9])'
URL_REGEX = '\n (?P<protocol>(https?|ftp|telnet|ldap|file)://)\n (?P<userinfo>([a-z0-9-._~!$&\\\'()*+,;=:]|%[0-9A-F]{2})*@)?\n (?P<host>([a-z0-9-._~!$&\\\'()*+,;=]|%[0-9A-F]{2})*)\n (:(?P<port>\\d*))?\n (/(?P<path>([^?\\#"<>\\s]|%[0-9A-F]{2})*/?))?\n (\\?(?P<query>([a-z0-9-._~!$&\'()*+,;=:/?@]|%[0-9A-F]{2})*))?\n (\\#(?P<fragment>([a-z0-9-._~!$&\'()*+,;=:/?@]|%[0-9A-F]{2})*))?'
WINPATH_REGEX = '\n (?P<root>[a-z]:|\\\\\\\\[a-z0-9_.$-]+||[.]+)\n (?P<folder>\\\\(?:[^\\/:*?"\\\'<>|\\r\\n]+\\\\)*)\n (?P<file>[^\\\\/*?""<>|\\r\\n ]+)'
add_ioc_type(ioc_type: str, ioc_regex: str, priority: int = 0, group: str = None)

Add an IoC type and regular expression to use to the built-in set.

Parameters:
  • ioc_type (str) – A unique name for the IoC type
  • ioc_regex (str) – A regular expression used to search for the type
  • priority (int, optional) – Priority of the regex match vs. other ioc_patterns. 0 is the highest priority (the default is 0).
  • group (str, optional) – The regex group to match (the default is None, which will match on the whole expression)

Notes

Pattern priorities.
If two IocType patterns match on the same substring, the matched substring is assigned to the pattern/IocType with the highest priority. E.g. foo.bar.com will match types: dns, windows_path and linux_path but since dns has a higher priority, the expression is assigned to the dns matches.
extract(src: str = None, data: pandas.core.frame.DataFrame = None, columns: List[str] = None, **kwargs) → Union[Dict[str, Set[str]], pandas.core.frame.DataFrame]

Extract IoCs from either a string or pandas DataFrame.

Parameters:
  • src (str, optional) – source string in which to look for IoC patterns (the default is None)
  • data (pd.DataFrame, optional) – input DataFrame from which to read source strings (the default is None)
  • columns (list, optional) – The list of columns to use as source strings, if the data parameter is used. (the default is None)
Other Parameters:
 
  • ioc_types (list, optional) – Restrict matching to just specified types. (default is all types)
  • include_paths (bool, optional) – Whether to include path matches (which can be noisy) (the default is false - excludes ‘windows_path’ and ‘linux_path’). If ioc_types is specified this parameter is ignored.
Returns:

dict of found observables (if input is a string) or DataFrame of observables

Return type:

Any

Notes

Extract takes either a string or a pandas DataFrame as input. When using the string option as an input extract will return a dictionary of results. When using a DataFrame the results will be returned as a new DataFrame with the following columns: - IoCType: the mnemonic used to distinguish different IoC Types - Observable: the actual value of the observable - SourceIndex: the index of the row in the input DataFrame from which the source for the IoC observable was extracted.

IoCType Pattern selection The default list is: [‘ipv4’, ‘ipv6’, ‘dns’, ‘url’, ‘md5_hash’, ‘sha1_hash’, ‘sha256_hash’] plus any user-defined types. ‘windows_path’, ‘linux_path’ are excluded unless include_paths is True or explicitly included in ioc_paths.

extract_df(data: pandas.core.frame.DataFrame, columns: List[str], **kwargs) → pandas.core.frame.DataFrame

Extract IoCs from either a pandas DataFrame.

Parameters:
  • data (pd.DataFrame) – input DataFrame from which to read source strings
  • columns (list) – The list of columns to use as source strings,
Other Parameters:
 
  • ioc_types (list, optional) – Restrict matching to just specified types. (default is all types)
  • include_paths (bool, optional) – Whether to include path matches (which can be noisy) (the default is false - excludes ‘windows_path’ and ‘linux_path’). If ioc_types is specified this parameter is ignored.
Returns:

DataFrame of observables

Return type:

pd.DataFrame

Notes

Extract takes a pandas DataFrame as input. The results will be returned as a new DataFrame with the following columns: - IoCType: the mnemonic used to distinguish different IoC Types - Observable: the actual value of the observable - SourceIndex: the index of the row in the input DataFrame from which the source for the IoC observable was extracted.

IoCType Pattern selection The default list is: [‘ipv4’, ‘ipv6’, ‘dns’, ‘url’, ‘md5_hash’, ‘sha1_hash’, ‘sha256_hash’] plus any user-defined types. ‘windows_path’, ‘linux_path’ are excluded unless include_paths is True or explicitly included in ioc_paths.

static file_hash_type(file_hash: str) → msticpy.sectools.iocextract.IoCType

Return specific IoCType based on hash length.

Parameters:file_hash (str) – File hash string
Returns:Specific hash type or unknown.
Return type:IoCType
get_ioc_type(observable: str) → str

Return first matching type.

Parameters:observable (str) – The IoC Observable to check
Returns:The IoC type enumeration (unknown, if no match)
Return type:str
ioc_types

Return the current set of IoC types and regular expressions.

Returns:dict of IoC Type names and regular expressions
Return type:dict
validate(input_str: str, ioc_type: str) → bool

Check that input_str matches the regex for the specificed ioc_type.

Parameters:
  • input_str (str) – the string to test
  • ioc_type (str) – the regex pattern to use
Returns:

True if match.

Return type:

bool

class msticpy.sectools.iocextract.IoCExtractAccessor(pandas_obj)

Bases: object

Pandas api extension for IoC Extractor.

Instantiate pandas extension class.

extract(columns, **kwargs)

Extract IoCs from either a pandas DataFrame.

Parameters:

columns (list) – The list of columns to use as source strings,

Other Parameters:
 
  • ioc_types (list, optional) – Restrict matching to just specified types. (default is all types)
  • include_paths (bool, optional) – Whether to include path matches (which can be noisy) (the default is false - excludes ‘windows_path’ and ‘linux_path’). If ioc_types is specified this parameter is ignored.
Returns:

DataFrame of observables

Return type:

pd.DataFrame

Notes

Extract takes a pandas DataFrame as input. The results will be returned as a new DataFrame with the following columns: - IoCType: the mnemonic used to distinguish different IoC Types - Observable: the actual value of the observable - SourceIndex: the index of the row in the input DataFrame from which the source for the IoC observable was extracted.

IoCType Pattern selection The default list is: [‘ipv4’, ‘ipv6’, ‘dns’, ‘url’, ‘md5_hash’, ‘sha1_hash’, ‘sha256_hash’] plus any user-defined types. ‘windows_path’, ‘linux_path’ are excluded unless include_paths is True or explicitly included in ioc_paths.

class msticpy.sectools.iocextract.IoCPattern(ioc_type, comp_regex, priority, group)

Bases: tuple

Create new instance of IoCPattern(ioc_type, comp_regex, priority, group)

comp_regex

Alias for field number 1

count()

Return number of occurrences of value.

group

Alias for field number 3

index()

Return first index of value.

Raises ValueError if the value is not present.

ioc_type

Alias for field number 0

priority

Alias for field number 2

class msticpy.sectools.iocextract.IoCType

Bases: enum.Enum

Enumeration of IoC Types.

dns = 'dns'
email = 'email'
file_hash = 'file_hash'
hostname = 'hostname'
ipv4 = 'ipv4'
ipv6 = 'ipv6'
linux_path = 'linux_path'
md5_hash = 'md5_hash'
parse = <bound method IoCType.parse of <enum 'IoCType'>>
sha1_hash = 'sha1_hash'
sha256_hash = 'sha256_hash'
unknown = 'unknown'
url = 'url'
windows_path = 'windows_path'

msticpy.sectools.outliers module

Outlier detection class. TODO Preliminary.

Similar to the eventcluster module but a little bit more experimental (read ‘less tested’). It uses SkLearn Isolation Forest to identify outlier events in a single data set or using one data set as training data and another on which to predict outliers.

msticpy.sectools.outliers.identify_outliers(x: numpy.ndarray, x_predict: numpy.ndarray, contamination: float = 0.05) → Tuple[sklearn.ensemble._iforest.IsolationForest, numpy.ndarray, numpy.ndarray]

Identify outlier items using SkLearn IsolationForest.

Parameters:
  • x (np.ndarray) – Input data
  • x_predict (np.ndarray) – Model
  • contamination (float) – Percentage contamination (default: {0.05})
Returns:

IsolationForest model, X_Outliers, y_pred_outliers

Return type:

Tuple[IsolationForest, np.ndarray, np.ndarray]

msticpy.sectools.outliers.plot_outlier_results(clf: sklearn.ensemble._iforest.IsolationForest, x: numpy.ndarray, x_predict: numpy.ndarray, x_outliers: numpy.ndarray, feature_columns: List[int], plt_title: str)

Plot Isolation Forest results.

Parameters:
  • clf (IsolationForest) – Isolation Forest model
  • x (np.ndarray) – Input data
  • x_predict (np.ndarray) – Prediction
  • x_outliers (np.ndarray) – Set of outliers
  • feature_columns (List[int]) – list of feature columns to display
  • plt_title (str) – Plot title
msticpy.sectools.outliers.remove_common_items(data: pandas.core.frame.DataFrame, columns: List[str]) → pandas.core.frame.DataFrame

Remove rows from input DataFrame.

Parameters:
  • data (pd.DataFrame) – Input dataframe
  • columns (List[str]) – Column list to filter
Returns:

Filtered DataFrame

Return type:

pd.DataFrame

msticpy.sectools.process_tree_utils module

Process Tree Visualization.

class msticpy.sectools.process_tree_utils.ProcSchema(process_name: str, process_id: str, parent_id: str, logon_id: str, cmd_line: str, user_name: str, path_separator: str, time_stamp: str = 'TimeGenerated', parent_name: Optional[str] = None, target_logon_id: Optional[str] = None, user_id: Optional[str] = None)

Bases: object

Property name lookup for Process event schema.

column_map

Return a dictionary that maps fields to schema names.

columns

Return an interable of target column names.

event_filter

Return the event type/ID to process for the current schema.

Returns:The value of the event ID to process.
Return type:Any
Raises:ProcessTreeSchemaException – If the schema is not known.
event_type_col

Return the column name containing the event identifier.

Returns:The name of the event ID column.
Return type:str
Raises:ProcessTreeSchemaException – If the schema is not known.
exception msticpy.sectools.process_tree_utils.ProcessTreeSchemaException

Bases: msticpy.common.exceptions.MsticpyException

Custom exception for Process Tree schema.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

msticpy.sectools.process_tree_utils.build_process_key(source_proc: pandas.core.series.Series, schema: msticpy.sectools.process_tree_utils.ProcSchema = None) → str

Return a process key from a process event.

Parameters:
  • source_proc (pd.Series, optional) – Source process
  • schema (ProcSchema, optional) – The data schema to use, by default None - if None the schema will be inferred
Returns:

Process key of the process

Return type:

str

msticpy.sectools.process_tree_utils.build_process_tree(procs: pandas.core.frame.DataFrame, schema: msticpy.sectools.process_tree_utils.ProcSchema = None, show_progress: bool = False, debug: bool = False) → pandas.core.frame.DataFrame

Build process trees from the process events.

Parameters:
  • procs (pd.DataFrame) – Process events (Windows 4688 or Linux Auditd)
  • schema (ProcSchema, optional) – The column schema to use, by default None If None, then the schema is inferred
  • show_progress (bool) – Shows the progress of the process (helpful for very large data sets)
  • debug (bool) – If True produces extra debugging output, by default False
Returns:

Process tree dataframe.

Return type:

pd.DataFrame

msticpy.sectools.process_tree_utils.get_ancestors(procs: pandas.core.frame.DataFrame, source, include_source=True) → pandas.core.frame.DataFrame

Return the ancestor processes of the source process.

Parameters:
  • procs (pd.DataFrame) – Process events (with process tree metadata)
  • source (Union[str, pd.Series]) – source_index of process or the process row
  • include_source (bool, optional) – Include the source process in the results, by default True
Returns:

Ancestor processes

Return type:

pd.DataFrame

msticpy.sectools.process_tree_utils.get_children(procs: pandas.core.frame.DataFrame, source: Union[str, pandas.core.series.Series], include_source: bool = True) → pandas.core.frame.DataFrame

Return the child processes for the source process.

Parameters:
  • procs (pd.DataFrame) – Process events (with process tree metadata)
  • source (Union[str, pd.Series]) – source_index of process or the process row
  • include_source (bool, optional) – If True include the source process in the results, by default True
Returns:

Child processes

Return type:

pd.DataFrame

msticpy.sectools.process_tree_utils.get_descendents(procs: pandas.core.frame.DataFrame, source: Union[str, pandas.core.series.Series], include_source: bool = True, max_levels: int = -1) → pandas.core.frame.DataFrame

Return the descendents of the source process.

Parameters:
  • procs (pd.DataFrame) – Process events (with process tree metadata)
  • source (Union[str, pd.Series]) – source_index of process or the process row
  • include_source (bool, optional) – Include the source process in the results, by default True
  • max_levels (int, optional) – Maximum number of levels to descend, by default -1 (all levels)
Returns:

Descendent processes

Return type:

pd.DataFrame

msticpy.sectools.process_tree_utils.get_parent(procs: pandas.core.frame.DataFrame, source: Union[str, pandas.core.series.Series]) → Optional[pandas.core.series.Series]

Return the parent of the source process.

Parameters:
  • procs (pd.DataFrame) – Process events (with process tree metadata)
  • source (Union[str, pd.Series]) – source_index of process or the process row
Returns:

Parent Process row or None if no parent was found.

Return type:

Optional[pd.Series]

msticpy.sectools.process_tree_utils.get_process(procs: pandas.core.frame.DataFrame, source: Union[str, pandas.core.series.Series]) → pandas.core.series.Series

Return the process event as a Series.

Parameters:
  • procs (pd.DataFrame) – Process events (with process tree metadata)
  • source (Union[str, pd.Series]) – source_index of process or the process row
Returns:

Process row

Return type:

pd.Series

Raises:

ValueError – If unknown type is supplied as source

msticpy.sectools.process_tree_utils.get_process_key(procs: pandas.core.frame.DataFrame, source_index: int) → str

Return the process key of the process given its source_index.

Parameters:
  • procs (pd.DataFrame) – Process events
  • source_index (int, optional) – source_index of the process record
Returns:

The process key of the process.

Return type:

str

msticpy.sectools.process_tree_utils.get_root(procs: pandas.core.frame.DataFrame, source: Union[str, pandas.core.series.Series]) → pandas.core.series.Series

Return the root process for the source process.

Parameters:
  • procs (pd.DataFrame) – Process events (with process tree metadata)
  • source (Union[str, pd.Series]) – source_index of process or the process row
Returns:

Root process

Return type:

pd.Series

msticpy.sectools.process_tree_utils.get_root_tree(procs: pandas.core.frame.DataFrame, source: Union[str, pandas.core.series.Series]) → pandas.core.frame.DataFrame

Return the process tree to which the source process belongs.

Parameters:
  • procs (pd.DataFrame) – Process events (with process tree metadata)
  • source (Union[str, pd.Series]) – source_index of process or the process row
Returns:

Process Tree

Return type:

pd.DataFrame

msticpy.sectools.process_tree_utils.get_roots(procs: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Return the process tree roots for the current data set.

Parameters:procs (pd.DataFrame) – Process events (with process tree metadata)
Returns:Process Tree root processes
Return type:pd.DataFrame
msticpy.sectools.process_tree_utils.get_siblings(procs: pandas.core.frame.DataFrame, source: Union[str, pandas.core.series.Series], include_source: bool = True) → pandas.core.frame.DataFrame

Return the processes that share the parent of the source process.

Parameters:
  • procs (pd.DataFrame) – Process events (with process tree metadata)
  • source (Union[str, pd.Series]) – source_index of process or the process row
  • include_source (bool, optional) – Include the source process in the results, by default True
Returns:

Sibling processes.

Return type:

pd.DataFrame

msticpy.sectools.process_tree_utils.get_summary_info(procs: pandas.core.frame.DataFrame) → Dict[str, int]

Return summary information about the process trees.

Parameters:procs (pd.DataFrame) – Process events (with process tree metadata)
Returns:Summary statistic about the process tree
Return type:Dict[str, int]
msticpy.sectools.process_tree_utils.get_tree_depth(procs: pandas.core.frame.DataFrame) → int

Return the depth of the process tree.

Parameters:procs (pd.DataFrame) – Process events (with process tree metadata)
Returns:Tree depth
Return type:int
msticpy.sectools.process_tree_utils.infer_schema(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series]) → msticpy.sectools.process_tree_utils.ProcSchema

Infer the correct schema to use for this data set.

Parameters:data (Union[pd.DataFrame, pd.Series]) – Data set to test
Returns:The schema most closely matching the data set.
Return type:ProcSchema

msticpy.sectools.syslog_utils module

syslog_utils - Syslog parsing and utility module.

Functions required to correct collect, parse and visualize syslog data.

Designed to support standard linux syslog for investigations where auditd is not available.

msticpy.sectools.syslog_utils.cluster_syslog_logons_df(logon_events: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Cluster logon sessions in syslog by start/end time based on PAM events.

Parameters:logon_events (pd.DataFrame) – A DataFrame of all syslog logon events (can be generated with LinuxSyslog.user_logon query)
Returns:logon_sessions – A dictionary of logon sessions including start and end times and logged on user
Return type:pd.DataFrame
Raises:MsticpyException – There are no logon sessions in the supplied data set
msticpy.sectools.syslog_utils.create_host_record(syslog_df: pandas.core.frame.DataFrame, heartbeat_df: pandas.core.frame.DataFrame, az_net_df: pandas.core.frame.DataFrame = None) → msticpy.nbtools.entityschema.Host

Generate host_entity record for selected computer.

Parameters:
  • syslog_df (pd.DataFrame) – A dataframe of all syslog events for the host in the time window requried
  • heartbeat_df (pd.DataFrame) – A dataframe of heartbeat data for the host
  • az_net_df (pd.DataFrame) – Option dataframe of Azure network data for the host
Returns:

Details of the host data collected

Return type:

Host

msticpy.sectools.syslog_utils.risky_sudo_sessions(sudo_sessions: pandas.core.frame.DataFrame, risky_actions: dict = None, suspicious_actions: list = None) → dict

Detect if a sudo session occurs at the point of a suspicious event.

Parameters:
  • sudo_sessions (dict) – Dictionary of sudo sessions (as generated by cluster_syslog_logons)
  • risky_actions (dict (Optional)) – Dictionary of risky sudo commands (as generated by cmd_line.risky_cmd_line)
  • suspicious_actions (list (Optional)) – List of risky sudo commands (as generated by cmd_line.cmd_speed)
Returns:

risky_sessions – A dictionary of sudo sessions with flags denoting risk

Return type:

dict

msticpy.sectools.tilookup module

Module for TILookup classes.

Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing may require a an API key and processing performance may be limited to a specific number of requests per minute for the account type that you have.

class msticpy.sectools.tilookup.TILookup(primary_providers: Optional[List[msticpy.sectools.tiproviders.ti_provider_base.TIProvider]] = None, secondary_providers: Optional[List[msticpy.sectools.tiproviders.ti_provider_base.TIProvider]] = None, providers: Optional[List[str]] = None)

Bases: object

Threat Intel observable lookup from providers.

Initialize TILookup instance.

Parameters:
  • primary_providers (Optional[List[TIProvider]], optional) – Primary TI Providers, by default None
  • secondary_providers (Optional[List[TIProvider]], optional) – Secondary TI Providers, by default None
  • providers (Optional[List[str]], optional) – List of provider names to load, by default all available providers are loaded. To see the list of available providers call TILookup.list_available_providers(). Note: if primary_provides or secondary_providers is specified This will override the providers list.
add_provider(provider: msticpy.sectools.tiproviders.ti_provider_base.TIProvider, name: str = None, primary: bool = True)

Add a TI provider to the current collection.

Parameters:
  • provider (TIProvider) – Provider instance
  • name (str, optional) – The name to use for the provider (overrides the class name of provider)
  • primary (bool, optional) – “primary” or “secondary” if False, by default “primary”
available_providers

Return a list of builtin providers.

Returns:List of TI Provider classes.
Return type:List[str]
configured_providers

Return a list of avaliable providers that have configuration details present.

Returns:List of TI Provider classes.
Return type:List[str]
classmethod list_available_providers(show_query_types=False, as_list: bool = False) → Optional[List[str]]

Print a list of builtin providers with optional usage.

Parameters:
  • show_query_types (bool, optional) – Show query types supported by providers, by default False
  • as_list (bool, optional) – Return list of providers instead of printing to stdout. Note: if you specify show_query_types this will be printed irrespective of this parameter setting.
Returns:

A list of provider names (if return_list=True)

Return type:

Optional[List[str]]

loaded_providers

Return dictionary of loaded providers.

Returns:[description]
Return type:Dict[str, TIProvider]
lookup_ioc(observable: str = None, ioc_type: str = None, ioc_query_type: str = None, providers: List[str] = None, prov_scope: str = 'primary', **kwargs) → Tuple[bool, List[Tuple[str, msticpy.sectools.tiproviders.ti_provider_base.LookupResult]]]

Lookup single IoC in active providers.

Parameters:
  • observable (str) – IoC observable (ioc is also an alias for observable)
  • ioc_type (str, optional) – One of IoCExtract.IoCType, by default None If none, the IoC type will be inferred
  • ioc_query_type (str, optional) – The ioc query type (e.g. rep, info, malware)
  • providers (List[str]) – Explicit list of providers to use
  • prov_scope (str, optional) – Use “primary”, “secondary” or “all” providers, by default “primary”
  • kwargs – Additional arguments passed to the underlying provider(s)
Returns:

The result returned as a tuple(bool, list): bool indicates whether a TI record was found in any provider list has an entry for each provider result

Return type:

Tuple[bool, List[Tuple[str, LookupResult]]]

lookup_iocs(data: Union[pandas.core.frame.DataFrame, Mapping[str, str], Iterable[str]], obs_col: str = None, ioc_type_col: str = None, ioc_query_type: str = None, providers: List[str] = None, prov_scope: str = 'primary', **kwargs) → pandas.core.frame.DataFrame

Lookup a collection of IoCs.

Parameters:
  • data (Union[pd.DataFrame, Mapping[str, str], Iterable[str]]) – Data input in one of three formats: 1. Pandas dataframe (you must supply the column name in obs_col parameter) 2. Mapping (e.g. a dict) of [observable, IoCType] 3. Iterable of observables - IoCTypes will be inferred
  • obs_col (str, optional) – DataFrame column to use for observables, by default None
  • ioc_type_col (str, optional) – DataFrame column to use for IoCTypes, by default None
  • ioc_query_type (str, optional) – The ioc query type (e.g. rep, info, malware)
  • providers (List[str]) – Explicit list of providers to use
  • prov_scope (str, optional) – Use “primary”, “secondary” or “all” providers, by default “primary”
  • kwargs – Additional arguments passed to the underlying provider(s)
Returns:

DataFrame of results

Return type:

pd.DataFrame

provider_status

Return loaded provider status.

Returns:List of providers and descriptions.
Return type:Iterable[str]
provider_usage()

Print usage of loaded providers.

classmethod reload_provider_settings()

Reload provider settings from config.

reload_providers()

Reload providers based on currrent settings in config.

Parameters:clear_keyring (bool, optional) – Clears any secrets cached in keyring, by default False
static result_to_df(ioc_lookup: Tuple[bool, List[Tuple[str, msticpy.sectools.tiproviders.ti_provider_base.LookupResult]]]) → pandas.core.frame.DataFrame

Return DataFrame representation of IoC Lookup response.

Parameters:ioc_lookup (Tuple[bool, List[Tuple[str, LookupResult]]]) – Output from lookup_ioc
Returns:The response as a DataFrame with a row for each provider response.
Return type:pd.DataFrame

msticpy.sectools.tiproviders.ti_provider_base module

Module for TILookup classes.

Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing may require a an API key and processing performance may be limited to a specific number of requests per minute for the account type that you have.

class msticpy.sectools.tiproviders.ti_provider_base.LookupResult(ioc: str, ioc_type: str, safe_ioc: str = '', query_subtype: Optional[str] = None, provider: Optional[str] = None, result: bool = False, severity: int = 0, details: Any = None, raw_result: Union[str, dict, None] = None, reference: Optional[str] = None, status: int = 0)

Bases: object

Lookup result for IoCs.

classmethod column_map()

Return a dictionary that maps fields to DF Names.

raw_result_fmtd

Print raw results of the Lookup Result.

set_severity(value: Any)

Set the severity from enum, int or string.

Parameters:value (Any) – The severity value to set
severity_name

Return text description of severity score.

Returns:Severity description.
Return type:str
summary

Print a summary of the Lookup Result.

class msticpy.sectools.tiproviders.ti_provider_base.SanitizedObservable(observable, status)

Bases: tuple

Create new instance of SanitizedObservable(observable, status)

count()

Return number of occurrences of value.

index()

Return first index of value.

Raises ValueError if the value is not present.

observable

Alias for field number 0

status

Alias for field number 1

class msticpy.sectools.tiproviders.ti_provider_base.TILookupStatus

Bases: enum.Enum

Threat intelligence lookup status.

bad_format = 2
not_supported = 1
ok = 0
other = 10
query_failed = 3
class msticpy.sectools.tiproviders.ti_provider_base.TIProvider(**kwargs)

Bases: abc.ABC

Abstract base class for Threat Intel providers.

Initialize the provider.

classmethod is_known_type(ioc_type: str) → bool

Return True if this a known IoC Type.

Parameters:ioc_type (str) – IoCType string to test
Returns:True if known type.
Return type:bool
is_supported_type(ioc_type: Union[str, msticpy.sectools.iocextract.IoCType]) → bool

Return True if the passed type is supported.

Parameters:ioc_type (Union[str, IoCType]) – IoC type name or instance
Returns:True if supported.
Return type:bool
lookup_ioc(ioc: str, ioc_type: str = None, query_type: str = None, **kwargs) → msticpy.sectools.tiproviders.ti_provider_base.LookupResult

Lookup a single IoC observable.

Parameters:
  • ioc (str) – IoC Observable value
  • ioc_type (str, optional) – IoC Type, by default None (type will be inferred)
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

The returned results.

Return type:

LookupResult

lookup_iocs(data: Union[pandas.core.frame.DataFrame, Dict[str, str], Iterable[str]], obs_col: str = None, ioc_type_col: str = None, query_type: str = None, **kwargs) → pandas.core.frame.DataFrame

Lookup collection of IoC observables.

Parameters:
  • data (Union[pd.DataFrame, Dict[str, str], Iterable[str]]) – Data input in one of three formats: 1. Pandas dataframe (you must supply the column name in obs_col parameter) 2. Dict of observable, IoCType 3. Iterable of observables - IoCTypes will be inferred
  • obs_col (str, optional) – DataFrame column to use for observables, by default None
  • ioc_type_col (str, optional) – DataFrame column to use for IoCTypes, by default None
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

DataFrame of results.

Return type:

pd.DataFrame

parse_results(response: msticpy.sectools.tiproviders.ti_provider_base.LookupResult) → Tuple[bool, msticpy.sectools.tiproviders.ti_provider_base.TISeverity, Any]

Return the details of the response.

Parameters:response (LookupResult) – The returned data response
Returns:bool = positive or negative hit TISeverity = enumeration of severity Object with match details
Return type:Tuple[bool, TISeverity, Any]
resolve_ioc_type

Return IoCType determined by IoCExtract.

Parameters:observable (str) – IoC observable string
Returns:IoC Type (or unknown if type could not be determined)
Return type:str
supported_types

Return list of supported IoC types for this provider.

Returns:List of supported type names
Return type:List[str]
classmethod usage()

Print usage of provider.

class msticpy.sectools.tiproviders.ti_provider_base.TISeverity

Bases: enum.Enum

Threat intelligence report severity.

high = 2
information = 0
parse = <bound method TISeverity.parse of <enum 'TISeverity'>>
unknown = -1
warning = 1
msticpy.sectools.tiproviders.ti_provider_base.entropy(input_str: str) → float

Compute entropy of input string.

msticpy.sectools.tiproviders.ti_provider_base.generate_items(data: Any, obs_col: Optional[str] = None, ioc_type_col: Optional[str] = None) → Iterable[Tuple[Optional[str], Optional[str]]]

Generate item pairs from different input types.

Parameters:
  • data (Any) – DataFrame, dictionary or iterable
  • obs_col (Optional[str]) – If data is a DataFrame, the column containing the observable value.
  • ioc_type_col (Optional[str]) – If data is a DataFrame, the column containing the observable type.
Returns:

Return type:

Iterable[Tuple[Optional[str], Optional[str]]]] - a tuple of Observable/Type.

msticpy.sectools.tiproviders.ti_provider_base.get_schema_and_host(url: str, require_url_encoding: bool = False) → Tuple[Optional[str], Optional[str], Optional[str]]

Return URL scheme and host and cleaned URL.

Parameters:
  • url (str) – Input URL
  • require_url_encoding (bool) – Set to True if url needs encoding. Defualt is False.
Returns:

Tuple of URL, scheme, host

Return type:

Tuple[Optional[str], Optional[str], Optional[str]

msticpy.sectools.tiproviders.ti_provider_base.preprocess_observable(observable, ioc_type, require_url_encoding: bool = False) → msticpy.sectools.tiproviders.ti_provider_base.SanitizedObservable

Preprocesses and checks validity of observable against declared IoC type.

param observable:
 the value of the IoC
param ioc_type:the IoC type

msticpy.sectools.tiproviders.http_base module

HTTP TI Provider base.

Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing may require a an API key and processing performance may be limited to a specific number of requests per minute for the account type that you have.

class msticpy.sectools.tiproviders.http_base.HttpProvider(**kwargs)

Bases: msticpy.sectools.tiproviders.ti_provider_base.TIProvider

HTTP TI provider base class.

Initialize a new instance of the class.

classmethod is_known_type(ioc_type: str) → bool

Return True if this a known IoC Type.

Parameters:ioc_type (str) – IoCType string to test
Returns:True if known type.
Return type:bool
is_supported_type(ioc_type: Union[str, msticpy.sectools.iocextract.IoCType]) → bool

Return True if the passed type is supported.

Parameters:ioc_type (Union[str, IoCType]) – IoC type name or instance
Returns:True if supported.
Return type:bool
lookup_ioc

Lookup a single IoC observable.

Parameters:
  • ioc (str) – IoC observable
  • ioc_type (str, optional) – IocType, by default None (type will be inferred)
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

The lookup result: result - Positive/Negative, details - Lookup Details (or status if failure), raw_result - Raw Response reference - URL of IoC

Return type:

LookupResult

Raises:

NotImplementedError – If attempting to use an HTTP method or authentication protocol that is not supported.

Notes

Note: this method uses memoization (lru_cache) to cache results for a particular observable to try avoid repeated network calls for the same item.

lookup_iocs(data: Union[pandas.core.frame.DataFrame, Dict[str, str], Iterable[str]], obs_col: str = None, ioc_type_col: str = None, query_type: str = None, **kwargs) → pandas.core.frame.DataFrame

Lookup collection of IoC observables.

Parameters:
  • data (Union[pd.DataFrame, Dict[str, str], Iterable[str]]) – Data input in one of three formats: 1. Pandas dataframe (you must supply the column name in obs_col parameter) 2. Dict of observable, IoCType 3. Iterable of observables - IoCTypes will be inferred
  • obs_col (str, optional) – DataFrame column to use for observables, by default None
  • ioc_type_col (str, optional) – DataFrame column to use for IoCTypes, by default None
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

DataFrame of results.

Return type:

pd.DataFrame

parse_results(response: msticpy.sectools.tiproviders.ti_provider_base.LookupResult) → Tuple[bool, msticpy.sectools.tiproviders.ti_provider_base.TISeverity, Any]

Return the details of the response.

Parameters:response (LookupResult) – The returned data response
Returns:bool = positive or negative hit TISeverity = enumeration of severity Object with match details
Return type:Tuple[bool, TISeverity, Any]
resolve_ioc_type

Return IoCType determined by IoCExtract.

Parameters:observable (str) – IoC observable string
Returns:IoC Type (or unknown if type could not be determined)
Return type:str
supported_types

Return list of supported IoC types for this provider.

Returns:List of supported type names
Return type:List[str]
classmethod usage()

Print usage of provider.

class msticpy.sectools.tiproviders.http_base.IoCLookupParams(path: str = '', verb: str = 'GET', full_url: bool = False, headers: Dict[str, str] = NOTHING, params: Dict[str, str] = NOTHING, data: Dict[str, str] = NOTHING, auth_type: str = '', auth_str: List[str] = NOTHING, sub_type: str = '')

Bases: object

IoC HTTP Lookup Params definition.

msticpy.sectools.tiproviders.alienvault_otx module

AlienVault OTX Provider.

Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing may require a an API key and processing performance may be limited to a specific number of requests per minute for the account type that you have.

class msticpy.sectools.tiproviders.alienvault_otx.OTX(**kwargs)

Bases: msticpy.sectools.tiproviders.http_base.HttpProvider

AlientVault OTX Lookup.

Set OTX specific settings.

classmethod is_known_type(ioc_type: str) → bool

Return True if this a known IoC Type.

Parameters:ioc_type (str) – IoCType string to test
Returns:True if known type.
Return type:bool
is_supported_type(ioc_type: Union[str, msticpy.sectools.iocextract.IoCType]) → bool

Return True if the passed type is supported.

Parameters:ioc_type (Union[str, IoCType]) – IoC type name or instance
Returns:True if supported.
Return type:bool
lookup_ioc

Lookup a single IoC observable.

Parameters:
  • ioc (str) – IoC observable
  • ioc_type (str, optional) – IocType, by default None (type will be inferred)
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

The lookup result: result - Positive/Negative, details - Lookup Details (or status if failure), raw_result - Raw Response reference - URL of IoC

Return type:

LookupResult

Raises:

NotImplementedError – If attempting to use an HTTP method or authentication protocol that is not supported.

Notes

Note: this method uses memoization (lru_cache) to cache results for a particular observable to try avoid repeated network calls for the same item.

lookup_iocs(data: Union[pandas.core.frame.DataFrame, Dict[str, str], Iterable[str]], obs_col: str = None, ioc_type_col: str = None, query_type: str = None, **kwargs) → pandas.core.frame.DataFrame

Lookup collection of IoC observables.

Parameters:
  • data (Union[pd.DataFrame, Dict[str, str], Iterable[str]]) – Data input in one of three formats: 1. Pandas dataframe (you must supply the column name in obs_col parameter) 2. Dict of observable, IoCType 3. Iterable of observables - IoCTypes will be inferred
  • obs_col (str, optional) – DataFrame column to use for observables, by default None
  • ioc_type_col (str, optional) – DataFrame column to use for IoCTypes, by default None
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

DataFrame of results.

Return type:

pd.DataFrame

parse_results(response: msticpy.sectools.tiproviders.ti_provider_base.LookupResult) → Tuple[bool, msticpy.sectools.tiproviders.ti_provider_base.TISeverity, Any]

Return the details of the response.

Parameters:response (LookupResult) – The returned data response
Returns:bool = positive or negative hit TISeverity = enumeration of severity Object with match details
Return type:Tuple[bool, TISeverity, Any]
resolve_ioc_type

Return IoCType determined by IoCExtract.

Parameters:observable (str) – IoC observable string
Returns:IoC Type (or unknown if type could not be determined)
Return type:str
supported_types

Return list of supported IoC types for this provider.

Returns:List of supported type names
Return type:List[str]
classmethod usage()

Print usage of provider.

msticpy.sectools.tiproviders.ibm_xforce module

IBM XForce Provider.

Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing may require a an API key and processing performance may be limited to a specific number of requests per minute for the account type that you have.

class msticpy.sectools.tiproviders.ibm_xforce.XForce(**kwargs)

Bases: msticpy.sectools.tiproviders.http_base.HttpProvider

IBM XForce Lookup.

Initialize a new instance of the class.

classmethod is_known_type(ioc_type: str) → bool

Return True if this a known IoC Type.

Parameters:ioc_type (str) – IoCType string to test
Returns:True if known type.
Return type:bool
is_supported_type(ioc_type: Union[str, msticpy.sectools.iocextract.IoCType]) → bool

Return True if the passed type is supported.

Parameters:ioc_type (Union[str, IoCType]) – IoC type name or instance
Returns:True if supported.
Return type:bool
lookup_ioc

Lookup a single IoC observable.

Parameters:
  • ioc (str) – IoC observable
  • ioc_type (str, optional) – IocType, by default None (type will be inferred)
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

The lookup result: result - Positive/Negative, details - Lookup Details (or status if failure), raw_result - Raw Response reference - URL of IoC

Return type:

LookupResult

Raises:

NotImplementedError – If attempting to use an HTTP method or authentication protocol that is not supported.

Notes

Note: this method uses memoization (lru_cache) to cache results for a particular observable to try avoid repeated network calls for the same item.

lookup_iocs(data: Union[pandas.core.frame.DataFrame, Dict[str, str], Iterable[str]], obs_col: str = None, ioc_type_col: str = None, query_type: str = None, **kwargs) → pandas.core.frame.DataFrame

Lookup collection of IoC observables.

Parameters:
  • data (Union[pd.DataFrame, Dict[str, str], Iterable[str]]) – Data input in one of three formats: 1. Pandas dataframe (you must supply the column name in obs_col parameter) 2. Dict of observable, IoCType 3. Iterable of observables - IoCTypes will be inferred
  • obs_col (str, optional) – DataFrame column to use for observables, by default None
  • ioc_type_col (str, optional) – DataFrame column to use for IoCTypes, by default None
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

DataFrame of results.

Return type:

pd.DataFrame

parse_results(response: msticpy.sectools.tiproviders.ti_provider_base.LookupResult) → Tuple[bool, msticpy.sectools.tiproviders.ti_provider_base.TISeverity, Any]

Return the details of the response.

Parameters:response (LookupResult) – The returned data response
Returns:bool = positive or negative hit TISeverity = enumeration of severity Object with match details
Return type:Tuple[bool, TISeverity, Any]
resolve_ioc_type

Return IoCType determined by IoCExtract.

Parameters:observable (str) – IoC observable string
Returns:IoC Type (or unknown if type could not be determined)
Return type:str
supported_types

Return list of supported IoC types for this provider.

Returns:List of supported type names
Return type:List[str]
classmethod usage()

Print usage of provider.

msticpy.sectools.tiproviders.virustotal module

VirusTotal Provider.

Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing may require a an API key and processing performance may be limited to a specific number of requests per minute for the account type that you have.

class msticpy.sectools.tiproviders.virustotal.VirusTotal(**kwargs)

Bases: msticpy.sectools.tiproviders.http_base.HttpProvider

VirusTotal Lookup.

Initialize a new instance of the class.

classmethod is_known_type(ioc_type: str) → bool

Return True if this a known IoC Type.

Parameters:ioc_type (str) – IoCType string to test
Returns:True if known type.
Return type:bool
is_supported_type(ioc_type: Union[str, msticpy.sectools.iocextract.IoCType]) → bool

Return True if the passed type is supported.

Parameters:ioc_type (Union[str, IoCType]) – IoC type name or instance
Returns:True if supported.
Return type:bool
lookup_ioc

Lookup a single IoC observable.

Parameters:
  • ioc (str) – IoC observable
  • ioc_type (str, optional) – IocType, by default None (type will be inferred)
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

The lookup result: result - Positive/Negative, details - Lookup Details (or status if failure), raw_result - Raw Response reference - URL of IoC

Return type:

LookupResult

Raises:

NotImplementedError – If attempting to use an HTTP method or authentication protocol that is not supported.

Notes

Note: this method uses memoization (lru_cache) to cache results for a particular observable to try avoid repeated network calls for the same item.

lookup_iocs(data: Union[pandas.core.frame.DataFrame, Dict[str, str], Iterable[str]], obs_col: str = None, ioc_type_col: str = None, query_type: str = None, **kwargs) → pandas.core.frame.DataFrame

Lookup collection of IoC observables.

Parameters:
  • data (Union[pd.DataFrame, Dict[str, str], Iterable[str]]) – Data input in one of three formats: 1. Pandas dataframe (you must supply the column name in obs_col parameter) 2. Dict of observable, IoCType 3. Iterable of observables - IoCTypes will be inferred
  • obs_col (str, optional) – DataFrame column to use for observables, by default None
  • ioc_type_col (str, optional) – DataFrame column to use for IoCTypes, by default None
  • query_type (str, optional) – Specify the data subtype to be queried, by default None. If not specified the default record type for the IoC type will be returned.
Returns:

DataFrame of results.

Return type:

pd.DataFrame

parse_results(response: msticpy.sectools.tiproviders.ti_provider_base.LookupResult) → Tuple[bool, msticpy.sectools.tiproviders.ti_provider_base.TISeverity, Any]

Return the details of the response.

Parameters:response (LookupResult) – The returned data response
Returns:bool = positive or negative hit TISeverity = enumeration of severity Object with match details
Return type:Tuple[bool, TISeverity, Any]
resolve_ioc_type

Return IoCType determined by IoCExtract.

Parameters:observable (str) – IoC observable string
Returns:IoC Type (or unknown if type could not be determined)
Return type:str
supported_types

Return list of supported IoC types for this provider.

Returns:List of supported type names
Return type:List[str]
classmethod usage()

Print usage of provider.

msticpy.sectools.vtlookup module

Module for VTLookup class.

Wrapper class around Virus Total API. Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing requires a Virus Total account and API key and processing performance is limited to the number of requests per minute for the account type that you have. Support IoC Types:

  • Filehash
  • URL
  • DNS Domain
  • IPv4 Address
class msticpy.sectools.vtlookup.DuplicateStatus(is_dup, status)

Bases: tuple

Create new instance of DuplicateStatus(is_dup, status)

count()

Return number of occurrences of value.

index()

Return first index of value.

Raises ValueError if the value is not present.

is_dup

Alias for field number 0

status

Alias for field number 1

class msticpy.sectools.vtlookup.VTLookup(vtkey: str, verbosity: int = 1)

Bases: object

VTLookup: VirusTotal lookup of IoC reports.

Main methods are: lookup_iocs() - accepts input of multiple IoCs in a Pandas DataFrame lookup_ioc() - looks up a single IoC observable. supported_ioc_types - a list of valid target types. ioc_vt_type_mapping - a dictionary of mappings to recognized VT Types. Types mapped to None will not be submitted to VT.

For urls a full http request can be submitted, query string and fragments will be dropped before submitting. For files MD5, SHA1 and SHA256 hashes are supported. For IP addresses only dotted IPv4 addresses are supported.

Create a new instance of VTLookup class.

Parameters:
  • vtkey (str) – VirusTotal API key
  • verbosity (int, optional) –
    The level of detail of reporting
    0 = no reporting 1 = minimal reporting (default) 2 = verbose reporting
ioc_vt_type_mapping

Return mapping between internal and VirusTotal IoC type names.

Returns:Return mapping between internal and VirusTotal IoC type names.
Return type:Mapping[str, str]
lookup_ioc(observable: str, ioc_type: str, output: str = 'dict') → Any

Look up and single IoC observable.

Parameters:
  • observable (str) – The observable value
  • ioc_type (str) – The IoC Type (see ‘supported_ioc_types’ attribute)
  • output (str, optional) – Output results as a dictionary (or list of dicts) if output is any other value the result will be returned in a Pandas DataFrame (the default is ‘dict’)
Returns:

  • list{dict} (if output == ‘dict’)
  • pd.DataFrame (otherwise)

Raises:

KeyError – Unknown ioc_type

lookup_iocs(data: pandas.core.frame.DataFrame, src_col: str = 'Observable', type_col: str = 'IoCType', src_index_col: str = 'SourceIndex', **kwargs) → pandas.core.frame.DataFrame

Retrieve results for IoC observables in the source dataframe.

Parameters:
  • data (pd.DataFrame) – Dataframe containing the observables to search for
  • src_col (str, optional) – The column name that contains the observable data (one item per row) (the default is ‘Observable’)
  • type_col (str, optional) – The column name containing the observable type (the default is ‘IoCType’)
  • src_index_col (str, optional) – The name of the column to use as source index. If not specified this defaults to ‘SourceIndex’. If this (or the supplied value) is not in the source dataframe, the index of the source dataframe will be used. This is retained in the output so that you can join the results back to the original data. (the default is ‘SourceIndex’)
Other Parameters:
 
  • key/value pairs of additional mappings to supported IoC type names
  • e.g. ipv4=’ipaddress’, url=’httprequest’.
  • This allows you to specify custom
  • mappings when the source data is tagged with different names.
Returns:

Combined results of local pre-processing and VirusTotal Lookups

Return type:

pd.DataFrame

Raises:

KeyError – Unknown ioc_type

Notes

See supported_ioc_types attribute for a list of valid target types. Not all of these types are supported by VirusTotal. See ioc_vt_type_mapping for current mappings. Types mapped to None will not be submitted to VT.

For urls a full http request can be submitted, query string and fragments will be dropped before submitting. Other supported protocols are ftp, telnet, ldap, file For files MD5, SHA1 and SHA256 hashes are supported. For IP addresses only dotted IPv4 addresses are supported.

supported_ioc_types

Return list of supported IoC type internal names.

Returns:List of supported IoC type internal names.
Return type:List[str]
supported_vt_types

Return list of VirusTotal supported IoC type names.

Returns:List of VirusTotal supported IoC type names.
Return type:List[str]
class msticpy.sectools.vtlookup.VTParams(api_type, batch_size, batch_delimiter, http_verb, api_var_name, headers)

Bases: tuple

Create new instance of VTParams(api_type, batch_size, batch_delimiter, http_verb, api_var_name, headers)

api_type

Alias for field number 0

api_var_name

Alias for field number 4

batch_delimiter

Alias for field number 2

batch_size

Alias for field number 1

count()

Return number of occurrences of value.

headers

Alias for field number 5

http_verb

Alias for field number 3

index()

Return first index of value.

Raises ValueError if the value is not present.