msticpy.analysis.anomalous_sequence.sessionize module

Module for creating sessions out of raw data.

msticpy.analysis.anomalous_sequence.sessionize.create_session_col(data: DataFrame, user_identifier_cols: List[str], time_col: str, max_session_time_mins: int, max_event_separation_mins: int) DataFrame

Create a “session_ind” column in the dataframe.

In particular, the session_ind column will be incremented each time a new session starts.

Parameters:
  • data (pd.DataFrame) – This dataframe should contain at least the following columns: - time stamp column - columns related to user name and/or computer name and/or ip address etc

  • user_identifier_cols (List[str]) – Name of the columns which contain username and/or computer name and/or ip address etc. Each time the value of one of these columns changes, a new session will be started.

  • time_col (str) – Name of the column which contains a time stamp. If this column is not already in datetime64[ns, UTC] format, it will be casted to it.

  • max_session_time_mins (int) – The maximum length of a session in minutes. If a sequence of events for the same user_identifier_cols values exceeds this length, then a new session will be started.

  • max_event_separation_mins (int) – The maximum length in minutes between two events in a session. If we have 2 events for the same user_identifier_cols values, and if those two events are more than max_event_separation_mins apart, then a new session will be started.

Return type:

pd.DataFrame with an additional “session_ind” column

msticpy.analysis.anomalous_sequence.sessionize.sessionize_data(data: DataFrame, user_identifier_cols: List[str], time_col: str, max_session_time_mins: int, max_event_separation_mins: int, event_col: str) DataFrame

Sessionize the input data.

In particular, the resulting dataframe will have 1 row per session. It will contain the following columns: the user_identifier_cols, <time_col>_min, <time_col>_max, <event_col>_list, duration (<time_col>_max - <time_col>_min), number_events (length of the <event_col>_list value)

Parameters:
  • data (pd.DataFrame) – This dataframe should contain at least the following columns: - time stamp column - columns related to user name and/or computer name and/or ip address etc - column containing an event

  • user_identifier_cols (List[str]) – Name of the columns which contain username and/or computer name and/or ip address etc. Each time the value of one of these columns changes, a new session will be started.

  • time_col (str) – Name of the column which contains a time stamp. If this column is not already in datetime64[ns, UTC] format, it will be casted to it.

  • max_session_time_mins (int) – The maximum length of a session in minutes. If a sequence of events for the same user_identifier_cols values exceeds this length, then a new session will be started.

  • max_event_separation_mins (int) – The maximum length in minutes between two events in a session. If we have 2 events for the same user_identifier_cols values, and if those two events are more than max_event_separation_mins apart, then a new session will be started.

  • event_col (str) – Name of the column which contains the event of interest. For example, if we are interested in sessionizing exchange admin commands, the “event_col” could contain values like: “Set-Mailbox” or “Set-User” etc.

Return type:

pd.DataFrame containing the sessionized data. 1 row per session.