msticpy.analysis.anomalous_sequence.utils.probabilities module

Helper module for computing training probabilities when modelling sessions.

msticpy.analysis.anomalous_sequence.utils.probabilities.compute_cmds_probs(seq1_counts: StateMatrix | dict, seq2_counts: StateMatrix | dict, unk_token: str) Tuple[StateMatrix, StateMatrix]

Compute command related probabilities.

In particular, computes the probabilities for the individual commands, and also the probabilities for the transitions of commands.

Parameters:
  • seq1_counts (Union[StateMatrix, dict]) – individual command counts

  • seq2_counts (Union[StateMatrix, dict]) – sequence command (length 2) counts

  • unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual command probabilities, sequence command (length 2) probabilities

Return type:

Tuple

msticpy.analysis.anomalous_sequence.utils.probabilities.compute_params_probs(param_counts: StateMatrix | dict, cmd_param_counts: StateMatrix | dict, seq1_counts: StateMatrix | dict, unk_token: str) Tuple[StateMatrix, StateMatrix]

Compute param related probabilities.

In particular, computes the probabilities of the individual params, and also the probabilities of the params conditional on the command.

Note that we will be modelling whether a parameter is present or not for each command. So we make the modelling assumption that the parameters are independent Bernoulii random variables conditional on the command.

Note also that because multiple parameters can appear at a time for a command, and because we are computing the probability that each parameter is present or not, we do NOT expect the probabilities to sum to 1.

Note also that we use laplace smoothing in the counting stage of the calculations. Therefore if you have parameter p which appeared for every occurrence of command c, the resulting probability for param p appearing conditional on command c would NOT equal 1. It would be slightly less due to the laplace smoothing.

Parameters:
  • param_counts (Union[StateMatrix, dict]) – individual param counts

  • cmd_param_counts (Union[StateMatrix, dict]) – param conditional on command counts

  • seq1_counts (Union[StateMatrix, dict]) – individual command counts

  • unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual param probabilities, param conditional on command probabilities

Return type:

Tuple

msticpy.analysis.anomalous_sequence.utils.probabilities.compute_values_probs(value_counts: StateMatrix | dict, param_value_counts: StateMatrix | dict, unk_token: str) Tuple[StateMatrix, StateMatrix]

Compute value related probabilities.

In particular, compute the probabilities of the individual values, and also the probabilities of the values conditional on the param.

Note that we will be modelling the values as categorical conditional on the parameter. Therefore, we DO expect these probabilities to sum to 1.

Note also that each parameter can only take one value at a time (unlike how a command can take multiple parameters at a time).

Parameters:
  • value_counts (Union[StateMatrix, dict]) – individual value counts

  • param_value_counts (Union[StateMatrix, dict]) – value conditional on param counts

  • unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual value probabilities, value conditional on param probabilities

Return type:

Tuple