msticpy.analysis.outliers module
Outlier detection class. TODO Preliminary.
Similar to the eventcluster module but a little bit more experimental (read ‘less tested’). It uses SkLearn Isolation Forest to identify outlier events in a single data set or using one data set as training data and another on which to predict outliers.
- msticpy.analysis.outliers.identify_outliers(x: ndarray, x_predict: ndarray, contamination: float = 0.05, max_features: int | float | None = None) Tuple[sklearn.ensemble.IsolationForest, ndarray, ndarray]
Identify outlier items using SkLearn IsolationForest.
Arguments:
- xnp.ndarray
Input data
- x_predictnp.ndarray
Model
- contaminationfloat
Percentage contamination (default: {0.05})
- max_featuresint or float, optional
Specifies max num or max rate of features to be randomly selected when building each tree. default: None => {math.floor(math.sqrt(cols))}
- returns:
IsolationForest model, X_Outliers, y_pred_outliers
- rtype:
Tuple[IsolationForest, np.ndarray, np.ndarray]
- msticpy.analysis.outliers.plot_outlier_results(clf: sklearn.ensemble.IsolationForest, x: ndarray, x_predict: ndarray, x_outliers: ndarray, feature_columns: List[int], plt_title: str)
Plot Isolation Forest results.
- Parameters:
clf (IsolationForest) – Isolation Forest model
x (np.ndarray) – Input data
x_predict (np.ndarray) – Prediction
x_outliers (np.ndarray) – Set of outliers
feature_columns (List[int]) – list of feature columns to display
plt_title (str) – Plot title