msticpy.analysis.outliers module
Outlier detection class. TODO Preliminary.
Similar to the eventcluster module but a little bit more experimental (read ‘less tested’). It uses SkLearn Isolation Forest to identify outlier events in a single data set or using one data set as training data and another on which to predict outliers.
- msticpy.analysis.outliers.identify_outliers(x: ndarray, x_predict: ndarray, contamination: float = 0.05) Tuple[sklearn.ensemble.IsolationForest, ndarray, ndarray]
Identify outlier items using SkLearn IsolationForest.
- Parameters:
x (np.ndarray) – Input data
x_predict (np.ndarray) – Model
contamination (float) – Percentage contamination (default: {0.05})
- Returns:
IsolationForest model, X_Outliers, y_pred_outliers
- Return type:
Tuple[IsolationForest, np.ndarray, np.ndarray]
- msticpy.analysis.outliers.plot_outlier_results(clf: sklearn.ensemble.IsolationForest, x: ndarray, x_predict: ndarray, x_outliers: ndarray, feature_columns: List[int], plt_title: str)
Plot Isolation Forest results.
- Parameters:
clf (IsolationForest) – Isolation Forest model
x (np.ndarray) – Input data
x_predict (np.ndarray) – Prediction
x_outliers (np.ndarray) – Set of outliers
feature_columns (List[int]) – list of feature columns to display
plt_title (str) – Plot title