msticpy.analysis.anomalous_sequence.utils.laplace_smooth module

Helper module for laplace smoothing counts.

msticpy.analysis.anomalous_sequence.utils.laplace_smooth.laplace_smooth_cmd_counts(seq1_counts: DefaultDict[str, int], seq2_counts: DefaultDict[str, DefaultDict[str, int]], start_token: str, end_token: str, unk_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Apply laplace smoothing to the input counts for the cmds.

In particular, add 1 to each of the counts, including the unk_token. By including the unk_token, we can handle unseen commands.

Parameters:

seq1_counts (DefaultDict[str, int]) – individual command counts
seq2_counts (DefaultDict[str, DefaultDict[str, int]]) – sequence command (length 2) counts
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual command counts, sequence command (length 2) counts

Return type:

tuple of laplace smoothed counts

msticpy.analysis.anomalous_sequence.utils.laplace_smooth.laplace_smooth_param_counts(cmds: List[str], param_counts: DefaultDict[str, int], cmd_param_counts: DefaultDict[str, DefaultDict[str, int]], unk_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Apply laplace smoothing to the input counts for the params.

In particular, add 1 to each of the counts, including the unk_token. By including the unk_token, we can handle unseen params.

Parameters:

cmds (List[str]) – list of all the possible commands (including the unk_token)
param_counts (DefaultDict[str, int]) – individual param counts
cmd_param_counts (DefaultDict[str, DefaultDict[str, int]]) – param conditional on command counts
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual param probabilities, param conditional on command probabilities

Return type:

Tuple

msticpy.analysis.anomalous_sequence.utils.laplace_smooth.laplace_smooth_value_counts(params: List[str], value_counts: DefaultDict[str, int], param_value_counts: DefaultDict[str, DefaultDict[str, int]], unk_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Apply laplace smoothing to the input counts for the values.

In particular, add 1 to each of the counts, including the unk_token. By including the unk_token, we can handle unseen values.

Parameters:

params (List[str]) – list of all possible params, including the unk_token
value_counts (DefaultDict[str, int]) – individual value counts
param_value_counts (DefaultDict[str, DefaultDict[str, int]]) – value conditional on param counts
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual value probabilities, value conditional on param probabilities

Return type:

Tuple