msticpy.analysis.anomalous_sequence.utils.cmds_only module

Helper module for computations when each session is a list of strings.

msticpy.analysis.anomalous_sequence.utils.cmds_only.compute_counts(sessions: List[List[str]], start_token: str, end_token: str, unk_token: str) Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Compute counts of individual commands and of sequences of two commands.

Parameters:
  • sessions (List[List[str]]) –

    each session is a list of commands (strings) an example session:

    ['Set-User', 'Set-Mailbox']
    

  • start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)

  • unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual command counts, sequence command (length 2) counts

Return type:

tuple of counts

msticpy.analysis.anomalous_sequence.utils.cmds_only.compute_likelihood_window(window: List[str], prior_probs: StateMatrix | dict, trans_probs: StateMatrix | dict, use_start_token: bool, use_end_token: bool, start_token: str | None = None, end_token: str | None = None) float

Compute the likelihood of the input window.

Parameters:
  • window (List[str]) –

    part or all of a session, where a session is a list of commands (strings) an example session:

    ['Set-User', 'Set-Mailbox']
    

  • prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands

  • trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)

  • use_start_token (bool) – if set to True, the start_token will be prepended to the window before the likelihood calculation is done

  • use_end_token (bool) – if set to True, the end_token will be appended to the window before the likelihood calculation is done

  • start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

Return type:

likelihood of the window

msticpy.analysis.anomalous_sequence.utils.cmds_only.compute_likelihood_windows_in_session(session: List[str], prior_probs: StateMatrix | dict, trans_probs: StateMatrix | dict, window_len: int, use_start_end_tokens: bool, start_token: str | None = None, end_token: str | None = None, use_geo_mean: bool = False) List[float]

Compute the likelihoods of a sliding window of length window_len in the session.

Parameters:
  • session (List[str]) –

    list of commands (strings) an example session:

    ['Set-User', 'Set-Mailbox']
    

  • prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands

  • trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)

  • window_len (int) – length of sliding window for likelihood calculations

  • use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done

  • start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

  • use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Return type:

list of likelihoods

msticpy.analysis.anomalous_sequence.utils.cmds_only.laplace_smooth_counts(seq1_counts: DefaultDict[str, int], seq2_counts: DefaultDict[str, DefaultDict[str, int]], start_token: str, end_token: str, unk_token: str) Tuple[StateMatrix, StateMatrix]

Laplace smoothing is applied to the counts.

We do this by adding 1 to each of the counts. This is so when we compute the probabilities from the counts, we shift some of the probability mass from the very probable commands and command sequences to the unseen and very unlikely commands and command sequences. The unk_token means we can handle unseen commands and sequences of commands.

Parameters:
  • seq1_counts (DefaultDict[str, int]) – individual command counts

  • seq2_counts (DefaultDict[str, DefaultDict[str, int]]) – sequence command (length 2) counts

  • start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)

  • unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns:

individual command counts, sequence command (length 2) counts

Return type:

tuple of StateMatrix laplace smoothed counts

msticpy.analysis.anomalous_sequence.utils.cmds_only.rarest_window_session(session: List[str], prior_probs: StateMatrix | dict, trans_probs: StateMatrix | dict, window_len: int, use_start_end_tokens: bool, start_token: str, end_token: str, use_geo_mean: bool = False) Tuple[List[str], float]

Find and compute likelihood of the rarest window in the session.

Parameters:
  • session (List[str]) –

    list of commands (strings) an example session:

    ['Set-User', 'Set-Mailbox']
    

  • prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands

  • trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)

  • window_len (int) – length of sliding window for likelihood calculations

  • use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done

  • start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)

  • end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

  • use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Return type:

(rarest window part of the session, likelihood of the rarest window)