msticpy.init.mp_pandas_accessors module

MSTICPy core pandas accessor methods.

class msticpy.init.mp_pandas_accessors.MsticpyCoreAccessor(pandas_obj)

Bases: object

Msticpy pandas accessor for core functions.

Initialize the extension.

b64extract(column: str, **kwargs) → DataFrame

Base64-decode strings taken from a pandas dataframe.

Parameters:

data (pd.DataFrame) – dataframe containing column to decode
column (str) – Name of dataframe text column
trace (bool, optional) – Show additional status (the default is None)
utf16 (bool, optional) – Attempt to decode UTF16 byte strings

Returns:

Decoded string and additional metadata in dataframe

Return type:

pd.DataFrame

Notes

Items that decode to utf-8 or utf-16 strings will be returned as decoded strings replaced in the original string. If the encoded string is a known binary type it will identify the file type and return the hashes of the file. If any binary types are known archives (zip, tar, gzip) it will unpack the contents of the archive. For any binary it will return the decoded file as a byte array, and as a printable list of byte values.

The columns of the output DataFrame are:

decoded string: this is the input string with any decoded sections replaced by the results of the decoding
reference : this is an index that matches an index number in the decoded string (e.g. <<encoded binary type=pdf index=1.2’).
original_string : the string prior to decoding - file_type : the type of file if this could be determined
file_hashes : a dictionary of hashes (the md5, sha1 and sha256 hashes are broken out into separate columns)
input_bytes : the binary image as a byte array
decoded_string : printable form of the decoded string (either string or list of hex byte values)
encoding_type : utf-8, utf-16 or binary
md5, sha1, sha256 : the respective hashes of the binary file_type, file_hashes, input_bytes, md5, sha1, sha256 will be null if this item is decoded to a string
src_index - the index of the source row in the input frame.

build_process_tree(schema: ProcSchema | Dict[str, Any] | None = None, show_summary: bool = False, debug: bool = False) → DataFrame

Build process trees from the process events.

Parameters:

schema (Union[ProcSchema, Dict[str, Any]], optional) – The column schema to use, by default None. If supplied as a dict it must include definitions for the required fields in the ProcSchema class If None, then the schema is inferred
show_summary (bool) – Shows summary of the built tree, default is False.
debug (bool) – If True produces extra debugging output, by default False

Returns:

Process tree dataframe.

Return type:

pd.DataFrame