msticpy.analysis.outliers module

Outlier detection class. TODO Preliminary.

Similar to the eventcluster module but a little bit more experimental (read ‘less tested’). It uses SkLearn Isolation Forest to identify outlier events in a single data set or using one data set as training data and another on which to predict outliers.

msticpy.analysis.outliers.identify_outliers(x: ndarray, x_predict: ndarray, contamination: float = 0.05) Tuple[sklearn.ensemble.IsolationForest, ndarray, ndarray]

Identify outlier items using SkLearn IsolationForest.

Parameters:
  • x (np.ndarray) – Input data

  • x_predict (np.ndarray) – Model

  • contamination (float) – Percentage contamination (default: {0.05})

Returns:

IsolationForest model, X_Outliers, y_pred_outliers

Return type:

Tuple[IsolationForest, np.ndarray, np.ndarray]

msticpy.analysis.outliers.plot_outlier_results(clf: sklearn.ensemble.IsolationForest, x: ndarray, x_predict: ndarray, x_outliers: ndarray, feature_columns: List[int], plt_title: str)

Plot Isolation Forest results.

Parameters:
  • clf (IsolationForest) – Isolation Forest model

  • x (np.ndarray) – Input data

  • x_predict (np.ndarray) – Prediction

  • x_outliers (np.ndarray) – Set of outliers

  • feature_columns (List[int]) – list of feature columns to display

  • plt_title (str) – Plot title

msticpy.analysis.outliers.remove_common_items(data: DataFrame, columns: List[str]) DataFrame

Remove rows from input DataFrame.

Parameters:
  • data (pd.DataFrame) – Input dataframe

  • columns (List[str]) – Column list to filter

Returns:

Filtered DataFrame

Return type:

pd.DataFrame