msticpy.analysis.anomalous_sequence.utils.cmds_params_only module
Helper module for computations when modelling sessions.
In particular, this module is for when each session is a list of the Cmd datatype with the params attribute set to a set of accompanying params.
- msticpy.analysis.anomalous_sequence.utils.cmds_params_only.compute_counts(sessions: List[List[Cmd]], start_token: str, end_token: str) Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]], DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]
Compute the training counts for the sessions.
In particular, computes counts of individual commands and of sequences of two commands. It also computes the counts of individual params as well as counts of params conditional on the command.
- Parameters
sessions (List[List[Cmd]]) –
each session is a list of the Cmd datatype. Where the Cmd datatype has a name attribute (command name) and a params attribute (set containing params associated with the command) an example session:
[Cmd(name='Set-User', params={'Identity', 'Force'}), Cmd(name='Set-Mailbox', params={'Identity', 'AuditEnabled'})]
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)
- Returns
individual command counts, sequence command (length 2) counts, individual param counts, param conditional on command counts
- Return type
tuple of counts
- msticpy.analysis.anomalous_sequence.utils.cmds_params_only.compute_likelihood_window(window: List[Cmd], prior_probs: Union[StateMatrix, dict], trans_probs: Union[StateMatrix, dict], param_cond_cmd_probs: Union[StateMatrix, dict], use_start_token: bool, use_end_token: bool, start_token: Optional[str] = None, end_token: Optional[str] = None) float
Compute the likelihood of the input window.
- Parameters
window (List[Cmd]) –
part or all of a session, where a session is a list of the Cmd datatype an example session:
[Cmd(name='Set-User', params={'Identity', 'Force'}), Cmd(name='Set-Mailbox', params={'Identity', 'AuditEnabled'})]
prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the commands
use_start_token (bool) – if set to True, the start_token will be prepended to the window before the likelihood calculation is done
use_end_token (bool) – if set to True, the end_token will be appended to the window before the likelihood calculation is done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
- Return type
likelihood of the window
- msticpy.analysis.anomalous_sequence.utils.cmds_params_only.compute_likelihood_windows_in_session(session: List[Cmd], prior_probs: Union[StateMatrix, dict], trans_probs: Union[StateMatrix, dict], param_cond_cmd_probs: Union[StateMatrix, dict], window_len: int, use_start_end_tokens: bool, start_token: Optional[str] = None, end_token: Optional[str] = None, use_geo_mean: bool = False) List[float]
Compute the likelihoods of a sliding window in the session.
- Parameters
session (List[Cmd]) –
list of Cmd datatype an example session:
[Cmd(name='Set-User', params={'Identity', 'Force'}), Cmd(name='Set-Mailbox', params={'Identity', 'AuditEnabled'})]
prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the command
window_len (int) – length of sliding window for likelihood calculations
use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)
- Returns
list of likelihoods
- Return type
List[float]
- msticpy.analysis.anomalous_sequence.utils.cmds_params_only.compute_prob_setofparams_given_cmd(cmd: str, params: Union[set, dict], param_cond_cmd_probs: Union[StateMatrix, dict], use_geo_mean: bool = True) float
Compute probability of a set of params given the cmd.
- Parameters
cmd (str) – name of command (e.g. for Exchange powershell commands: “Set-Mailbox”)
params (Union[set, dict]) – set of accompanying params for the cmd (e.g for Exchange powershell commands: {‘Identity’, ‘ForwardingEmailAddress’}). If params is set to be a dictionary of accompanying params and values, then only the keys of the dict will be used.
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of params conditional on the command
use_geo_mean (bool) – if True, then the likelihood will be raised to the power of (1/K) where K is the number of distinct params which appeared for the given cmd across our training set. See Notes.
- Returns
computed likelihood
- Return type
float
Notes
use_geo_mean - Some commands may have more params set in general compared with other commands. It can be useful to use the geo mean so that you can compare this probability across different commands with differing number of params
- msticpy.analysis.anomalous_sequence.utils.cmds_params_only.laplace_smooth_counts(seq1_counts: DefaultDict[str, int], seq2_counts: DefaultDict[str, DefaultDict[str, int]], param_counts: DefaultDict[str, int], cmd_param_counts: DefaultDict[str, DefaultDict[str, int]], start_token: str, end_token: str, unk_token: str)
Laplace smoothing is applied to the counts.
We do this by adding 1 to each of the counts. This is so we shift some of the probability mass from the very probable commands/params to the unseen and very unlikely commands/params. The unk_token means we can handle unseen commands, sequences of commands and params
- Parameters
seq1_counts (DefaultDict[str, int]) – individual command counts
seq2_counts (DefaultDict[str, DefaultDict[str, int]]) – sequence command (length 2) counts
param_counts (DefaultDict[str, int]) – individual param counts
cmd_param_counts (DefaultDict[str, DefaultDict[str, int]]) – param conditional on command counts
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)
- Returns
individual command counts, sequence command (length 2) counts, individual param counts, param conditional on command counts
- Return type
tuple of StateMatrix counts
- msticpy.analysis.anomalous_sequence.utils.cmds_params_only.rarest_window_session(session: List[Cmd], prior_probs: StateMatrix, trans_probs: StateMatrix, param_cond_cmd_probs: StateMatrix, window_len: int, use_start_end_tokens: bool, start_token: str, end_token: str, use_geo_mean=False) Tuple[List[Cmd], float]
Find and compute the likelihood of the rarest window of window_len in the session.
- Parameters
session (List[Cmd]) –
list of Cmd datatype an example session:
[Cmd(name='Set-User', params={'Identity', 'Force'}), Cmd(name='Set-Mailbox', params={'Identity', 'AuditEnabled'})]
prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the command
window_len (int) – length of sliding window for likelihood calculations
use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)
- Returns
rarest window part of the session, likelihood of the rarest window
- Return type
Tuple