msticpy.analysis.anomalous_sequence.sessionize module
Module for creating sessions out of raw data.
- msticpy.analysis.anomalous_sequence.sessionize.create_session_col(data: DataFrame, user_identifier_cols: List[str], time_col: str, max_session_time_mins: int, max_event_separation_mins: int) DataFrame
Create a “session_ind” column in the dataframe.
In particular, the session_ind column will be incremented each time a new session starts.
- Parameters:
data (pd.DataFrame) – This dataframe should contain at least the following columns: - time stamp column - columns related to user name and/or computer name and/or ip address etc
user_identifier_cols (List[str]) – Name of the columns which contain username and/or computer name and/or ip address etc. Each time the value of one of these columns changes, a new session will be started.
time_col (str) – Name of the column which contains a time stamp. If this column is not already in datetime64[ns, UTC] format, it will be casted to it.
max_session_time_mins (int) – The maximum length of a session in minutes. If a sequence of events for the same user_identifier_cols values exceeds this length, then a new session will be started.
max_event_separation_mins (int) – The maximum length in minutes between two events in a session. If we have 2 events for the same user_identifier_cols values, and if those two events are more than max_event_separation_mins apart, then a new session will be started.
- Return type:
pd.DataFrame with an additional “session_ind” column
- msticpy.analysis.anomalous_sequence.sessionize.sessionize_data(data: DataFrame, user_identifier_cols: List[str], time_col: str, max_session_time_mins: int, max_event_separation_mins: int, event_col: str) DataFrame
Sessionize the input data.
In particular, the resulting dataframe will have 1 row per session. It will contain the following columns: the user_identifier_cols, <time_col>_min, <time_col>_max, <event_col>_list, duration (<time_col>_max - <time_col>_min), number_events (length of the <event_col>_list value)
- Parameters:
data (pd.DataFrame) – This dataframe should contain at least the following columns: - time stamp column - columns related to user name and/or computer name and/or ip address etc - column containing an event
user_identifier_cols (List[str]) – Name of the columns which contain username and/or computer name and/or ip address etc. Each time the value of one of these columns changes, a new session will be started.
time_col (str) – Name of the column which contains a time stamp. If this column is not already in datetime64[ns, UTC] format, it will be casted to it.
max_session_time_mins (int) – The maximum length of a session in minutes. If a sequence of events for the same user_identifier_cols values exceeds this length, then a new session will be started.
max_event_separation_mins (int) – The maximum length in minutes between two events in a session. If we have 2 events for the same user_identifier_cols values, and if those two events are more than max_event_separation_mins apart, then a new session will be started.
event_col (str) – Name of the column which contains the event of interest. For example, if we are interested in sessionizing exchange admin commands, the “event_col” could contain values like: “Set-Mailbox” or “Set-User” etc.
- Return type:
pd.DataFrame containing the sessionized data. 1 row per session.