msticpy.transform.auditdextract module
Auditd extractor.
Module to load and decode Linux audit logs. It collapses messages sharing the same message ID into single events, decodes hex-encoded data fields and performs some event-specific formatting and normalization (e.g. for process start events it will re-assemble the process command line arguments into a single string). This is still a work-in-progress.
- msticpy.transform.auditdextract.extract_events_to_df(data: DataFrame, input_column: str = 'AuditdMessage', event_type: str | None = None, verbose: bool = False) DataFrame
Extract auditd raw messages into a dataframe.
- Parameters:
data (pd.DataFrame) – The input dataframe with raw auditd data in a single string column
input_column (str, optional) – the input column name (the default is ‘AuditdMessage’)
event_type (str, optional) – the event type, if None, defaults to all (the default is None)
verbose (bool, optional) – Give feedback on stages of processing (the default is False)
- Returns:
The resultant DataFrame
- Return type:
pd.DataFrame
- msticpy.transform.auditdextract.generate_process_tree(audit_data: DataFrame, branch_depth: int = 4, processes: DataFrame | None = None) DataFrame
Generate process tree data from auditd logs.
- Parameters:
audit_data (pd.DataFrame) – The Audit data containing process creation events
branch_depth (int, optional) – The maximum depth of parent or child processes to extract from the data (The default is 4)
processes (pd.DataFrame, optional) – Dataframe of processes to generate tree for
- Returns:
The formatted process tree data
- Return type:
pd.DataFrame
- msticpy.transform.auditdextract.get_event_subset(data: DataFrame, event_type: str) DataFrame
Return a subset of the events matching type event_type.
- Parameters:
data (pd.DataFrame) – The input data
event_type (str) – The event type to select
- Returns:
The subset of the data where data[‘EventType’] == event_type
- Return type:
pd.DataFrame
- msticpy.transform.auditdextract.read_from_file(filepath: str, event_type: str | None = None, verbose: bool = False, dummy_sep: str = '\t') DataFrame
Extract Audit events from a log file.
- Parameters:
filepath (str) – path to the input file
event_type (str, optional) – The type of event to extract if only a subset required. (the default is None, which processes all types)
verbose (bool, optional) – If true more progress messages are output (the default is False)
dummy_sep (str, optional) – Separator to use for reading the ‘csv’ file (default is tab - ‘t’)
- Returns:
The output DataFrame
- Return type:
pd.DataFrame
Notes
The dummy_sep parameter should be a character that does not occur in an input line. This function uses pandas read_csv to read the audit lines into a single column. Using a separator that does appear in the input (e.g. space or comma) will cause data to be parsed into multiple columns and anything after the first separator in a line will be lost.
- msticpy.transform.auditdextract.unpack_auditd(audit_str: List[Dict[str, str]]) Mapping[str, Mapping[str, Any]]
Unpack an Audit message and returns a dictionary of fields.
- Parameters:
audit_str (str) – The auditd raw record
- Returns:
The extracted message fields and values
- Return type:
Mapping[str, Any]