msticpy.analysis.anomalous_sequence.utils package

Submodules

msticpy.analysis.anomalous_sequence.utils.cmds_only module

Helper module for computations when each session is a list of strings.

msticpy.analysis.anomalous_sequence.utils.cmds_only.compute_counts(sessions: List[List[str]], start_token: str, end_token: str, unk_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Compute counts of individual commands and of sequences of two commands.

Parameters

sessions (List[List[str]]) –
each session is a list of commands (strings) an example session:
```
['Set-User', 'Set-Mailbox']
```
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual command counts, sequence command (length 2) counts

Return type

tuple of counts

msticpy.analysis.anomalous_sequence.utils.cmds_only.compute_likelihood_window(window: List[str], prior_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], trans_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], use_start_token: bool, use_end_token: bool, start_token: Optional[str] = None, end_token: Optional[str] = None) → float

Compute the likelihood of the input window.

Parameters

window (List[str]) –
part or all of a session, where a session is a list of commands (strings) an example session:
```
['Set-User', 'Set-Mailbox']
```
prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
use_start_token (bool) – if set to True, the start_token will be prepended to the window before the likelihood calculation is done
use_end_token (bool) – if set to True, the end_token will be appended to the window before the likelihood calculation is done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

Return type

likelihood of the window

msticpy.analysis.anomalous_sequence.utils.cmds_only.compute_likelihood_windows_in_session(session: List[str], prior_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], trans_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], window_len: int, use_start_end_tokens: bool, start_token: Optional[str] = None, end_token: Optional[str] = None, use_geo_mean: bool = False) → List[float]

Compute the likelihoods of a sliding window of length window_len in the session.

Parameters

session (List[str]) –
list of commands (strings) an example session:
```
['Set-User', 'Set-Mailbox']
```
prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
window_len (int) – length of sliding window for likelihood calculations
use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Return type

list of likelihoods

msticpy.analysis.anomalous_sequence.utils.cmds_only.laplace_smooth_counts(seq1_counts: DefaultDict[str, int], seq2_counts: DefaultDict[str, DefaultDict[str, int]], start_token: str, end_token: str, unk_token: str) → Tuple[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix]

Laplace smoothing is applied to the counts.

We do this by adding 1 to each of the counts. This is so when we compute the probabilities from the counts, we shift some of the probability mass from the very probable commands and command sequences to the unseen and very unlikely commands and command sequences. The unk_token means we can handle unseen commands and sequences of commands.

Parameters

seq1_counts (DefaultDict[str, int]) – individual command counts
seq2_counts (DefaultDict[str, DefaultDict[str, int]]) – sequence command (length 2) counts
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual command counts, sequence command (length 2) counts

Return type

tuple of StateMatrix laplace smoothed counts

msticpy.analysis.anomalous_sequence.utils.cmds_only.rarest_window_session(session: List[str], prior_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], trans_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], window_len: int, use_start_end_tokens: bool, start_token: str, end_token: str, use_geo_mean: bool = False) → Tuple[List[str], float]

Find and compute likelihood of the rarest window in the session.

Parameters

session (List[str]) –
list of commands (strings) an example session:
```
['Set-User', 'Set-Mailbox']
```
prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
window_len (int) – length of sliding window for likelihood calculations
use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Return type

(rarest window part of the session, likelihood of the rarest window)

msticpy.analysis.anomalous_sequence.utils.cmds_params_only module

Helper module for computations when modelling sessions.

In particular, this module is for when each session is a list of the Cmd datatype with the params attribute set to a set of accompanying params.

msticpy.analysis.anomalous_sequence.utils.cmds_params_only.compute_counts(sessions: List[List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd]], start_token: str, end_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]], DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Compute the training counts for the sessions.

In particular, computes counts of individual commands and of sequences of two commands. It also computes the counts of individual params as well as counts of params conditional on the command.

Parameters

sessions (List[List[Cmd]]) –
each session is a list of the Cmd datatype. Where the Cmd datatype has a name attribute (command name) and a params attribute (set containing params associated with the command) an example session:
```
[Cmd(name='Set-User', params={'Identity', 'Force'}),
 Cmd(name='Set-Mailbox', params={'Identity', 'AuditEnabled'})]
```
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)

Returns

individual command counts, sequence command (length 2) counts, individual param counts, param conditional on command counts

Return type

tuple of counts

msticpy.analysis.anomalous_sequence.utils.cmds_params_only.compute_likelihood_window(window: List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd], prior_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], trans_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], param_cond_cmd_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], use_start_token: bool, use_end_token: bool, start_token: Optional[str] = None, end_token: Optional[str] = None) → float

Compute the likelihood of the input window.

Parameters

window (List[Cmd]) –

part or all of a session, where a session is a list of the Cmd datatype an example session:

[Cmd(name='Set-User', params={'Identity', 'Force'}), Cmd(name='Set-Mailbox',
params={'Identity', 'AuditEnabled'})]

prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the commands
use_start_token (bool) – if set to True, the start_token will be prepended to the window before the likelihood calculation is done
use_end_token (bool) – if set to True, the end_token will be appended to the window before the likelihood calculation is done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

Return type

likelihood of the window

msticpy.analysis.anomalous_sequence.utils.cmds_params_only.compute_likelihood_windows_in_session(session: List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd], prior_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], trans_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], param_cond_cmd_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], window_len: int, use_start_end_tokens: bool, start_token: Optional[str] = None, end_token: Optional[str] = None, use_geo_mean: bool = False) → List[float]

Compute the likelihoods of a sliding window in the session.

Parameters

session (List[Cmd]) –

list of Cmd datatype an example session:

[Cmd(name='Set-User', params={'Identity', 'Force'}),
Cmd(name='Set-Mailbox', params={'Identity', 'AuditEnabled'})]

prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the command
window_len (int) – length of sliding window for likelihood calculations
use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Returns

list of likelihoods

Return type

List[float]

msticpy.analysis.anomalous_sequence.utils.cmds_params_only.compute_prob_setofparams_given_cmd(cmd: str, params: Union[set, dict], param_cond_cmd_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], use_geo_mean: bool = True) → float

Compute probability of a set of params given the cmd.

Parameters

cmd (str) – name of command (e.g. for Exchange powershell commands: “Set-Mailbox”)
params (Union[set, dict]) – set of accompanying params for the cmd (e.g for Exchange powershell commands: {‘Identity’, ‘ForwardingEmailAddress’}). If params is set to be a dictionary of accompanying params and values, then only the keys of the dict will be used.
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of params conditional on the command
use_geo_mean (bool) – if True, then the likelihood will be raised to the power of (1/K) where K is the number of distinct params which appeared for the given cmd across our training set. See Notes.

Returns

computed likelihood

Return type

float

Notes

use_geo_mean - Some commands may have more params set in general compared with other commands. It can be useful to use the geo mean so that you can compare this probability across different commands with differing number of params

msticpy.analysis.anomalous_sequence.utils.cmds_params_only.laplace_smooth_counts(seq1_counts: DefaultDict[str, int], seq2_counts: DefaultDict[str, DefaultDict[str, int]], param_counts: DefaultDict[str, int], cmd_param_counts: DefaultDict[str, DefaultDict[str, int]], start_token: str, end_token: str, unk_token: str)

Laplace smoothing is applied to the counts.

We do this by adding 1 to each of the counts. This is so we shift some of the probability mass from the very probable commands/params to the unseen and very unlikely commands/params. The unk_token means we can handle unseen commands, sequences of commands and params

Parameters

seq1_counts (DefaultDict[str, int]) – individual command counts
seq2_counts (DefaultDict[str, DefaultDict[str, int]]) – sequence command (length 2) counts
param_counts (DefaultDict[str, int]) – individual param counts
cmd_param_counts (DefaultDict[str, DefaultDict[str, int]]) – param conditional on command counts
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual command counts, sequence command (length 2) counts, individual param counts, param conditional on command counts

Return type

tuple of StateMatrix counts

msticpy.analysis.anomalous_sequence.utils.cmds_params_only.rarest_window_session(session: List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd], prior_probs: msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, trans_probs: msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, param_cond_cmd_probs: msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, window_len: int, use_start_end_tokens: bool, start_token: str, end_token: str, use_geo_mean=False) → Tuple[List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd], float]

Find and compute the likelihood of the rarest window of window_len in the session.

Parameters

session (List[Cmd]) –

list of Cmd datatype an example session:

[Cmd(name='Set-User', params={'Identity', 'Force'}), Cmd(name='Set-Mailbox',
params={'Identity', 'AuditEnabled'})]

prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the command
window_len (int) – length of sliding window for likelihood calculations
use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Returns

rarest window part of the session, likelihood of the rarest window

Return type

Tuple

msticpy.analysis.anomalous_sequence.utils.cmds_params_values module

Helper module for computations when modelling sessions.

In particular, this module is for when each session is a list of the Cmd datatype with the params attribute set to a dictionary of accompanying params and values.

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.compute_counts(sessions: List[List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd]], start_token: str, end_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]], DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]], DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Compute the training counts for the sessions.

In particular, computes counts of individual commands and of sequences of two commands. It also computes the counts of individual params as well as counts of params conditional on the command. It also computes the counts of individual values as well as counts of values conditional on the param.

Parameters

sessions (List[List[Cmd]]) –

each session is a list of the Cmd datatype. Where the Cmd datatype has a name attribute (command name) and a params attribute (dict with the params and values associated with the command) an example session:

[
    Cmd(
        name='Set-User',
        params={'Identity': 'blahblah', 'Force': 'true'}
    ),
    Cmd(
        name='Set-Mailbox',
        params={'Identity': 'blahblah', 'AuditEnabled': 'false'}
    )
]

start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)

Returns

individual command counts, sequence command (length 2) counts, individual param counts, param conditional on command counts individual value counts, value conditional on param counts

Return type

tuple of counts

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.compute_likelihood_window(window: List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd], prior_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], trans_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], param_cond_cmd_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], value_cond_param_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], modellable_params: set, use_start_token: bool, use_end_token: bool, start_token: Optional[str] = None, end_token: Optional[str] = None) → float

Compute the likelihood of the input window.

Parameters

window (List[Cmd]) –

part or all of a session, where a session is a list the Cmd datatype an example session:

[
    Cmd(name='Set-User', params={'Identity': 'blahblah', 'Force': 'true'}),
    Cmd(name='Set-Mailbox',
        params={'Identity': 'blahblah', 'AuditEnabled': 'false'})
]

prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the commands
value_cond_param_probs (Union[StateMatrix, dict]) – computed probabilities of the values conditional on the params
modellable_params (set) – set of params for which we will also include the probabilties of their values in the calculation of the likelihood
use_start_token (bool) – if set to True, the start_token will be prepended to the window before the likelihood calculation is done
use_end_token (bool) – if set to True, the end_token will be appended to the window before the likelihood calculation is done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)

Return type

likelihood of the window

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.compute_likelihood_windows_in_session(session: List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd], prior_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], trans_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], param_cond_cmd_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], value_cond_param_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], modellable_params: set, window_len: int, use_start_end_tokens: bool, start_token: Optional[str] = None, end_token: Optional[str] = None, use_geo_mean: bool = False) → List[float]

Compute the likelihoods of a sliding window of window_len in the session.

Parameters

session (List[Cmd]) –

list of Cmd datatype an example session:

[
    Cmd(
        name='Set-User',
        params={'Identity': 'blahblah', 'Force': 'true'}
    ),
    Cmd(
        name='Set-Mailbox',
        params={'Identity': 'blahblah', 'AuditEnabled': 'false'}
    )
]

prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the commands
value_cond_param_probs (Union[StateMatrix, dict]) – computed probabilities of the values conditional on the params
modellable_params (set) – set of params for which we will also include the probabilties of their values in the calculation of the likelihood
window_len (int) – length of sliding window for likelihood calculations
use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Return type

list of likelihoods

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.compute_prob_setofparams_given_cmd(cmd: str, params_with_vals: Union[dict, set], param_cond_cmd_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], value_cond_param_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], modellable_params: Union[set, list], use_geo_mean: bool = True) → float

Compute probability of a set of params + values given the cmd.

Parameters

cmd (str) – name of command (e.g. for Exchange powershell commands: “Set-Mailbox”)
params_with_vals (Union[dict, set]) –
dict of accompanying params and values for the cmd e.g for Exchange powershell commands:
```
{'Identity': 'an_identity' , 'ForwardingEmailAddress': 'email@email.com'}
```
If params is set to be a set, then an artificial dictionary will be created with the set as the keys and Nones for the values.
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of params conditional on the command
value_cond_param_probs (Union[StateMatrix, dict]) – computed probabilities of values conditional on the param
modellable_params (set) – set of params for which we will also include the probabilties of their values in the calculation of the likelihood
use_geo_mean (bool) – if True, then the likelihood will be raised to the power of (1/K) where K is the number of distinct params which appeared for the given cmd across our training set + the number of values which we included in the modelling for this cmd. Note: some commands may have more params set in general compared with other commands. It can be useful to use the geo mean so that you can compare this probability across different commands with differing number of params.

Return type

computed probability

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.get_params_to_model_values(param_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], param_value_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict]) → set

Determine using heuristics which params take categoricals vs arbitrary strings.

This function helps us decide which params we should model the values of later on.

Parameters

param_counts (Union[StateMatrix, dict]) – counts of each of the individual params
param_value_counts (Union[StateMatrix, dict]) – counts of each value conditional on the params

Return type

set of params which have been determined to be categorical

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.laplace_smooth_counts(seq1_counts: DefaultDict[str, int], seq2_counts: DefaultDict[str, DefaultDict[str, int]], param_counts: DefaultDict[str, int], cmd_param_counts: DefaultDict[str, DefaultDict[str, int]], value_counts: DefaultDict[str, int], param_value_counts: DefaultDict[str, DefaultDict[str, int]], start_token: str, end_token: str, unk_token: str) → Tuple[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix]

Laplace smoothing is applied to the counts.

We do this by adding 1 to each of the counts. This is so we shift some of the probability mass from the very probable commands/params/values to the unseen and very unlikely commands/params/values. The unk_token means we can handle unseen commands, params, values, sequences of commands.

Parameters

seq1_counts (DefaultDict[str, int]) – individual command counts
seq2_counts (DefaultDict[str, DefaultDict[str, int]]) – sequence command (length 2) counts
param_counts (DefaultDict[str, int]) – individual param counts
cmd_param_counts (DefaultDict[str, DefaultDict[str, int]]) – param conditional on command counts
value_counts (DefaultDict[str, int]) – individual value counts
param_value_counts (DefaultDict[str, DefaultDict[str, int]]) – value conditional on param counts
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual command counts, sequence command (length 2) counts, individual param counts, param conditional on command counts individual value counts, value conditional on param counts

Return type

tuple of StateMatrix counts

msticpy.analysis.anomalous_sequence.utils.cmds_params_values.rarest_window_session(session: List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd], prior_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], trans_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], param_cond_cmd_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], value_cond_param_probs: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], modellable_params: set, window_len: int, use_start_end_tokens: bool, start_token: str, end_token: str, use_geo_mean: bool = False) → Tuple[List[msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd], float]

Find and compute likelihood of the rarest window of window_len in the session.

Parameters

session (List[Cmd]) –

list of Cmd datatype an example session:

[
    Cmd(
        name='Set-User',
        params={'Identity': 'blahblah', 'Force': 'true'}
    ),
    Cmd(
        name='Set-Mailbox',
        params={'Identity': 'blahblah', 'AuditEnabled': 'false'}
    )
]

prior_probs (Union[StateMatrix, dict]) – computed probabilities of individual commands
trans_probs (Union[StateMatrix, dict]) – computed probabilities of sequences of commands (length 2)
param_cond_cmd_probs (Union[StateMatrix, dict]) – computed probabilities of the params conditional on the commands
value_cond_param_probs (Union[StateMatrix, dict]) – computed probabilities of the values conditional on the params
modellable_params (set) – set of params for which we will also include the probabilties of their values in the calculation of the likelihood
window_len (int) – length of sliding window for likelihood calculations
use_start_end_tokens (bool) – if True, then start_token and end_token will be prepended and appended to the session respectively before the calculations are done
start_token (str) – dummy command to signify the start of the session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of the session (e.g. “##END##”)
use_geo_mean (bool) – if True, then each of the likelihoods of the sliding windows will be raised to the power of (1/window_len)

Returns

rarest window part of the session, likelihood of the rarest window

Return type

Tuple

msticpy.analysis.anomalous_sequence.utils.data_structures module

Useful helper data structure classes for modelling sessions.

class msticpy.analysis.anomalous_sequence.utils.data_structures.Cmd(name: str, params: Union[set, dict])

Bases: object

Class to store commands with accompanying params (and optionally values).

Instantiate the Cmd class.

Parameters

name (str) – name of the command. e.g. for Exchange online: “Set-Mailbox”

params (Union[set, dict]) –

set of accompanying params or dict of accompanying params and values. e.g.:

{'Identity', 'ForwardingEmailAddress'}

or:

{'Identity': 'some identity', 'ForwardingEmailAddress':
 'an_email@email.com'}

class msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix(states: Union[dict, collections.defaultdict], unk_token: str)

Bases: dict

Class for storing trained counts/probabilities.

Containr for dict of counts/probs or dict of dicts of cond counts/probs.

If you try and retrieve the count/probability for an unseen command/param/value from the resulting object, it will return the value associated with the unk_token key.

Parameters

states (Union[dict, defaultdict]) –
Either a dict representing counts or probabilities. Or a dict of dicts representing conditional counts or conditional probabilities. E.g.:
```
{'Set-Mailbox': 20,'##UNK##': 1}
```
or:
```
{'Set-Mailbox': {'Set-Mailbox': 5, '##UNK##': 1},
'##UNK##': {'Set-Mailbox': 1, '##UNK##': 1}}
```
unk_token (str) – dummy token to signify an unseen command (e.g. “##UNK##”). This token should be present in the states keys. And if states is a dict of dicts, then the unk_token should be present in the keys of the outer dict and all the inner dicts.

clear() → None. Remove all items from D.

copy() → a shallow copy of D

fromkeys(value=None, /): Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /): Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items

keys() → a set-like object providing a view on D's keys

pop(k[, d]) → v, remove specified key and return the corresponding value.: If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a: 2-tuple; but raise KeyError if D is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values

msticpy.analysis.anomalous_sequence.utils.laplace_smooth module

Helper module for laplace smoothing counts.

msticpy.analysis.anomalous_sequence.utils.laplace_smooth.laplace_smooth_cmd_counts(seq1_counts: DefaultDict[str, int], seq2_counts: DefaultDict[str, DefaultDict[str, int]], start_token: str, end_token: str, unk_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Apply laplace smoothing to the input counts for the cmds.

In particular, add 1 to each of the counts, including the unk_token. By including the unk_token, we can handle unseen commands.

Parameters

seq1_counts (DefaultDict[str, int]) – individual command counts
seq2_counts (DefaultDict[str, DefaultDict[str, int]]) – sequence command (length 2) counts
start_token (str) – dummy command to signify the start of a session (e.g. “##START##”)
end_token (str) – dummy command to signify the end of a session (e.g. “##END##”)
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual command counts, sequence command (length 2) counts

Return type

tuple of laplace smoothed counts

msticpy.analysis.anomalous_sequence.utils.laplace_smooth.laplace_smooth_param_counts(cmds: List[str], param_counts: DefaultDict[str, int], cmd_param_counts: DefaultDict[str, DefaultDict[str, int]], unk_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Apply laplace smoothing to the input counts for the params.

In particular, add 1 to each of the counts, including the unk_token. By including the unk_token, we can handle unseen params.

Parameters

cmds (List[str]) – list of all the possible commands (including the unk_token)
param_counts (DefaultDict[str, int]) – individual param counts
cmd_param_counts (DefaultDict[str, DefaultDict[str, int]]) – param conditional on command counts
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual param probabilities, param conditional on command probabilities

Return type

Tuple

msticpy.analysis.anomalous_sequence.utils.laplace_smooth.laplace_smooth_value_counts(params: List[str], value_counts: DefaultDict[str, int], param_value_counts: DefaultDict[str, DefaultDict[str, int]], unk_token: str) → Tuple[DefaultDict[str, int], DefaultDict[str, DefaultDict[str, int]]]

Apply laplace smoothing to the input counts for the values.

In particular, add 1 to each of the counts, including the unk_token. By including the unk_token, we can handle unseen values.

Parameters

params (List[str]) – list of all possible params, including the unk_token
value_counts (DefaultDict[str, int]) – individual value counts
param_value_counts (DefaultDict[str, DefaultDict[str, int]]) – value conditional on param counts
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual value probabilities, value conditional on param probabilities

Return type

Tuple

msticpy.analysis.anomalous_sequence.utils.probabilities module

Helper module for computing training probabilities when modelling sessions.

msticpy.analysis.anomalous_sequence.utils.probabilities.compute_cmds_probs(seq1_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], seq2_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], unk_token: str) → Tuple[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix]

Compute command related probabilities.

In particular, computes the probabilities for the individual commands, and also the probabilities for the transitions of commands.

Parameters

seq1_counts (Union[StateMatrix, dict]) – individual command counts
seq2_counts (Union[StateMatrix, dict]) – sequence command (length 2) counts
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual command probabilities, sequence command (length 2) probabilities

Return type

Tuple

msticpy.analysis.anomalous_sequence.utils.probabilities.compute_params_probs(param_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], cmd_param_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], seq1_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], unk_token: str) → Tuple[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix]

Compute param related probabilities.

In particular, computes the probabilities of the individual params, and also the probabilities of the params conditional on the command.

Note that we will be modelling whether a parameter is present or not for each command. So we make the modelling assumption that the parameters are independent Bernoulii random variables conditional on the command.

Note also that because multiple parameters can appear at a time for a command, and because we are computing the probability that each parameter is present or not, we do NOT expect the probabilities to sum to 1.

Note also that we use laplace smoothing in the counting stage of the calculations. Therefore if you have parameter p which appeared for every occurrence of command c, the resulting probability for param p appearing conditional on command c would NOT equal 1. It would be slightly less due to the laplace smoothing.

Parameters

param_counts (Union[StateMatrix, dict]) – individual param counts
cmd_param_counts (Union[StateMatrix, dict]) – param conditional on command counts
seq1_counts (Union[StateMatrix, dict]) – individual command counts
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual param probabilities, param conditional on command probabilities

Return type

Tuple

msticpy.analysis.anomalous_sequence.utils.probabilities.compute_values_probs(value_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], param_value_counts: Union[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, dict], unk_token: str) → Tuple[msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix, msticpy.analysis.anomalous_sequence.utils.data_structures.StateMatrix]

Compute value related probabilities.

In particular, compute the probabilities of the individual values, and also the probabilities of the values conditional on the param.

Note that we will be modelling the values as categorical conditional on the parameter. Therefore, we DO expect these probabilities to sum to 1.

Note also that each parameter can only take one value at a time (unlike how a command can take multiple parameters at a time).

Parameters

value_counts (Union[StateMatrix, dict]) – individual value counts
param_value_counts (Union[StateMatrix, dict]) – value conditional on param counts
unk_token (str) – dummy command to signify an unseen command (e.g. “##UNK##”)

Returns

individual value probabilities, value conditional on param probabilities

Return type

Tuple

Module contents

MSTIC Anomalous Sequence Modelling Utilities.