msticpy.analysis.anomalous_sequence.utils.cmds_params_values module

Helper module for computations when modelling sessions.

In particular, this module is for when each session is a list of the Cmd datatype with the params attribute set to a dictionary of accompanying params and values.

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.compute_counts(sessions: List[List[Cmd]], start_token: str, end_token: str) Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]], DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]], DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Compute the training counts for the sessions.

In particular, computes counts of individual commands and of sequences of two commands. It also computes the counts of individual params as well as counts of params conditional on the command. It also computes the counts of individual values as well as counts of values conditional on the param.

Parameters:
  • sessions (List[List[Cmd]]) –

    each session is a list of the Cmd datatype. Where the Cmd datatype has a name attribute (command name) and a params attribute (dict with the params and values associated with the command) an example session:

    [
        Cmd(
            name='Set-User',
            params={'Identity': 'blahblah', 'Force': 'true'}
        ),
        Cmd(
            name='Set-Mailbox',
            params={'Identity': 'blahblah', 'AuditEnabled': 'false'}
        )
    ]
    

  • start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)

Returns:

individual command counts, sequence command (length 2) counts, individual param counts, param conditional on command counts individual value counts, value conditional on param counts

Return type:

tuple of counts

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.compute_likelihood_window(window: List[Cmd], prior_probs: StateMatrix | dict, trans_probs: StateMatrix | dict, param_cond_cmd_probs: StateMatrix | dict, value_cond_param_probs: StateMatrix | dict, modellable_params: set, use_start_token: bool, use_end_token: bool, start_token: str | None = None, end_token: str | None = None) float

Compute the likelihood of the input window.

Parameters:
  • window (List[Cmd]) –

    part or all of a session, where a session is a list the Cmd datatype an example session:

    [
        Cmd(name='Set-User', params={'Identity': 'blahblah', 'Force': 'true'}),
        Cmd(name='Set-Mailbox',
            params={'Identity': 'blahblah', 'AuditEnabled': 'false'})
    ]
    

  • prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands

  • trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)

  • param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the commands

  • value_cond_param_probs (Union[StateMatrix, dict]) – computed probabilities of the values conditional on the params

  • modellable_params (set) – set of params for which we will also include the probabilties of their values in the calculation of the likelihood

  • use_start_token (bool) – if set to True, the start_token will be prepended to the window before the likelihood calculation is done

  • use_end_token (bool) – if set to True, the end_token will be appended to the window before the likelihood calculation is done

  • start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

Return type:

likelihood of the window

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.compute_likelihood_windows_in_session(session: List[Cmd], prior_probs: StateMatrix | dict, trans_probs: StateMatrix | dict, param_cond_cmd_probs: StateMatrix | dict, value_cond_param_probs: StateMatrix | dict, modellable_params: set, window_len: int, use_start_end_tokens: bool, start_token: str | None = None, end_token: str | None = None, use_geo_mean: bool = False) List[float]

Compute the likelihoods of a sliding window of window_len in the session.

Parameters:
  • session (List[Cmd]) –

    list of Cmd datatype an example session:

    [
        Cmd(
            name='Set-User',
            params={'Identity': 'blahblah', 'Force': 'true'}
        ),
        Cmd(
            name='Set-Mailbox',
            params={'Identity': 'blahblah', 'AuditEnabled': 'false'}
        )
    ]
    

  • prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands

  • trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)

  • param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the commands

  • value_cond_param_probs (Union[StateMatrix, dict]) – computed probabilities of the values conditional on the params

  • modellable_params (set) – set of params for which we will also include the probabilties of their values in the calculation of the likelihood

  • window_len (int) – length of sliding window for likelihood calculations

  • use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done

  • start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

  • use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Return type:

list of likelihoods

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.compute_prob_setofparams_given_cmd(cmd: str, params_with_vals: dict | set, param_cond_cmd_probs: StateMatrix | dict, value_cond_param_probs: StateMatrix | dict, modellable_params: set | list, use_geo_mean: bool = True) float

Compute probability of a set of params + values given the cmd.

Parameters:
  • cmd (str) – name of command (e.g. for Exchange powershell commands: “Set-Mailbox”)

  • params_with_vals (Union[dict, set]) –

    dict of accompanying params and values for the cmd e.g for Exchange powershell commands:

    {'Identity': 'an_identity' , 'ForwardingEmailAddress': 'email@email.com'}
    

    If params is set to be a set, then an artificial dictionary will be created with the set as the keys and Nones for the values.

  • param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of params conditional on the command

  • value_cond_param_probs (Union[StateMatrix, dict]) – computed probabilities of values conditional on the param

  • modellable_params (set) – set of params for which we will also include the probabilties of their values in the calculation of the likelihood

  • use_geo_mean (bool) – if True, then the likelihood will be raised to the power of (1/K) where K is the number of distinct params which appeared for the given cmd across our training set + the number of values which we included in the modelling for this cmd. Note: some commands may have more params set in general compared with other commands. It can be useful to use the geo mean so that you can compare this probability across different commands with differing number of params.

Return type:

computed probability

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.get_params_to_model_values(param_counts: StateMatrix | dict, param_value_counts: StateMatrix | dict) set

Determine using heuristics which params take categoricals vs arbitrary strings.

This function helps us decide which params we should model the values of later on.

Parameters:
  • param_counts (Union[StateMatrix, dict]) – counts of each of the individual params

  • param_value_counts (Union[StateMatrix, dict]) – counts of each value conditional on the params

Return type:

set of params which have been determined to be categorical

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.laplace_smooth_counts(seq1_counts: DefaultDict[str, int], seq2_counts: DefaultDict[str, DefaultDict[str, int]], param_counts: DefaultDict[str, int], cmd_param_counts: DefaultDict[str, DefaultDict[str, int]], value_counts: DefaultDict[str, int], param_value_counts: DefaultDict[str, DefaultDict[str, int]], start_token: str, end_token: str, unk_token: str) Tuple[StateMatrix, StateMatrix, StateMatrix, StateMatrix, StateMatrix, StateMatrix]

Laplace smoothing is applied to the counts.

We do this by adding 1 to each of the counts. This is so we shift some of the probability mass from the very probable commands/params/values to the unseen and very unlikely commands/params/values. The unk_token means we can handle unseen commands, params, values, sequences of commands.

Parameters:
  • seq1_counts (DefaultDict[str, int]) – individual command counts

  • seq2_counts (DefaultDict[str, DefaultDict[str, int]]) – sequence command (length 2) counts

  • param_counts (DefaultDict[str, int]) – individual param counts

  • cmd_param_counts (DefaultDict[str, DefaultDict[str, int]]) – param conditional on command counts

  • value_counts (DefaultDict[str, int]) – individual value counts

  • param_value_counts (DefaultDict[str, DefaultDict[str, int]]) – value conditional on param counts

  • start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)

  • unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual command counts, sequence command (length 2) counts, individual param counts, param conditional on command counts individual value counts, value conditional on param counts

Return type:

tuple of StateMatrix counts

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.rarest_window_session(session: List[Cmd], prior_probs: StateMatrix | dict, trans_probs: StateMatrix | dict, param_cond_cmd_probs: StateMatrix | dict, value_cond_param_probs: StateMatrix | dict, modellable_params: set, window_len: int, use_start_end_tokens: bool, start_token: str, end_token: str, use_geo_mean: bool = False) Tuple[List[Cmd], float]

Find and compute likelihood of the rarest window of window_len in the session.

Parameters:
  • session (List[Cmd]) –

    list of Cmd datatype an example session:

    [
        Cmd(
            name='Set-User',
            params={'Identity': 'blahblah', 'Force': 'true'}
        ),
        Cmd(
            name='Set-Mailbox',
            params={'Identity': 'blahblah', 'AuditEnabled': 'false'}
        )
    ]
    

  • prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands

  • trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)

  • param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the commands

  • value_cond_param_probs (Union[StateMatrix, dict]) – computed probabilities of the values conditional on the params

  • modellable_params (set) – set of params for which we will also include the probabilties of their values in the calculation of the likelihood

  • window_len (int) – length of sliding window for likelihood calculations

  • use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done

  • start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

  • use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Returns:

rarest window part of the session, likelihood of the rarest window

Return type:

Tuple