msticpy.analysis.timeseries module

Module for timeseries analysis functions.

class msticpy.analysis.timeseries.MsticpyTimeSeriesAccessor(pandas_obj)

Bases: object

Msticpy pandas accessor for time series functions.

Initialize the extension.

analyze(**kwargs) DataFrame

Return anomalies in Timeseries using STL.

Parameters:
  • time_column (str, optional) – If the input data is not indexed on the time column, use this column as the time index

  • data_column (str, optional) – Use named column if the input data has more than one column.

  • seasonal (int, optional) – Seasonality period of the input data required for STL. Must be an odd integer, and should normally be >= 7 (default).

  • period (int, optional) – Periodicity of the the input data. by default 24 (Hourly).

  • score_threshold (float, optional) – standard deviation threshold value calculated using Z-score used to flag anomalies, by default 3

Returns:

Returns a dataframe with additional columns by decomposing time series data into residual, trend, seasonal, weights, baseline, score and anomalies. The anomalies column will have 0, 1,-1 values based on score_threshold set.

Return type:

pd.DataFrame

Notes

The decomposition method is STL - Seasonal-Trend Decomposition using LOESS

anomaly_periods(**kwargs)

Return list of anomaly period as TimeSpans.

Parameters:
  • time_column (str, optional) – The name of the time column

  • period (str, optional) – pandas-compatible time period designator, by default “1H”

  • pos_only (bool, optional) – If True only extract positive anomaly periods, else extract both positive and negative. By default, True

  • anomalies_column (str, optional) – The column containing the anomalies flag.

Returns:

TimeSpan(start, end)

Return type:

List[TimeSpan]

apply_threshold(**kwargs)

Return DataFrame with anomalies calculated based on new threshold.

Parameters:
  • threshold (float) – Threshold above (beyond) which values will be marked as anomalies. Used as positive and negative threshold unless threshold_low is specified.

  • threshold_low (Optional[float], optional) – The threshold below which values will be reported as anomalies, by default None.

  • anomalies_column (str, optional) – The column containing the anomalies flag.

Returns:

Output DataFrame with recalculated anomalies.

Return type:

pd.DataFrame

kql_periods(**kwargs)

Return KQL filter expression for anomaly time periods.

Parameters:
  • time_column (str, optional) – The name of the time column

  • period (str, optional) – pandas-compatible time period designator, by default “1H”

  • pos_only (bool, optional) – If True only extract positive anomaly periods, else extract both positive and negative. By default, True

Returns:

TimeSpan(start, end)

Return type:

List[TimeSpan]

plot(**kwargs)

Display time series anomalies visualization.

Parameters:
  • value_column (str, optional)

  • to (Name of column holding numeric values to plot against time series)

  • anomalies (determine)

  • 'Total') ((the default is)

ystr, optional

alias for “value_column”

time_columnstr, optional

Name of the timestamp column (the default is ‘TimeGenerated’)

anomalies_columnstr, optional

Name of the column holding binary status(1/0) for anomaly/benign (the default is ‘anomalies’)

periodint, optional

Period of the dataset for hourly-no of days, for daily-no of weeks. This is used to correctly calculate the plot height. (the default is 30)

Parameters:
  • ref_time (datetime, optional) – Input reference line to display (the default is None)

  • title (str, optional) – Title to display (the default is None)

  • legend (str, optional) – Where to position the legend None, left, right or inline (default is None)

  • yaxis (bool, optional) – Whether to show the yaxis and labels

  • range_tool (bool, optional) – Show the the range slider tool (default is True)

  • height (int, optional) – The height of the plot figure (the default is auto-calculated height)

  • width (int, optional) – The width of the plot figure (the default is 900)

  • xgrid (bool, optional) – Whether to show the xaxis grid (default is True)

  • ygrid (bool, optional) – Whether to show the yaxis grid (default is False)

  • color (list, optional) – List of colors to use in 3 plots as specified in order 3 plots- line(observed), circle(baseline), circle_x/user specified(anomalies). (the default is [“navy”, “green”, “firebrick”])

Returns:

The bokeh plot figure.

Return type:

figure

msticpy.analysis.timeseries.create_time_period_kqlfilter(periods: Dict[datetime, datetime]) str

Return KQL time filter expression from anomaly periods.

Parameters:

periods (Dict[datetime, datetime]) – Dict of start, end periods

Returns:

KQL filter clause

Return type:

str

msticpy.analysis.timeseries.extract_anomaly_periods(data: DataFrame, time_column: str = 'TimeGenerated', period: str = '1h', pos_only: bool = True, anomalies_column: str = 'anomalies') Dict[datetime, datetime]

Return dictionary of anomaly periods, merging adjacent ones.

Parameters:
  • data (pd.DataFrame) – The data to process

  • time_column (str, optional) – The name of the time column

  • period (str, optional) – pandas-compatible time period designator, by default “1h”

  • pos_only (bool, optional) – If True only extract positive anomaly periods, else extract both positive and negative. By default, True

  • anomalies_column (str, optional) – The column containing the anomalies flag.

Returns:

start_period, end_period

Return type:

Dict[datetime, datetime]

msticpy.analysis.timeseries.find_anomaly_periods(data: DataFrame, time_column: str = 'TimeGenerated', period: str = '1h', pos_only: bool = True, anomalies_column: str = 'anomalies') List[TimeSpan]

Return list of anomaly period as TimeSpans.

Parameters:
  • data (pd.DataFrame) – The data to process

  • time_column (str, optional) – The name of the time column

  • period (str, optional) – pandas-compatible time period designator, by default “1h”

  • pos_only (bool, optional) – If True only extract positive anomaly periods, else extract both positive and negative. By default, True

  • anomalies_column (str, optional) – The column containing the anomalies flag.

Returns:

TimeSpan(start, end)

Return type:

List[TimeSpan]

msticpy.analysis.timeseries.set_new_anomaly_threshold(data: DataFrame, threshold: float, threshold_low: float | None = None, anomalies_column: str = 'anomalies') DataFrame

Return DataFrame with anomalies calculated based on new threshold.

Parameters:
  • data (pd.DataFrame) – Input DataFrame

  • threshold (float) – Threshold above (beyond) which values will be marked as anomalies. Used as positive and negative threshold unless threshold_low is specified.

  • threshold_low (Optional[float], optional) – The threshhold below which values will be reported as anomalies, by default None.

  • anomalies_column (str, optional) – The column containing the anomalies flag.

Returns:

Output DataFrame with recalculated anomalies.

Return type:

pd.DataFrame

msticpy.analysis.timeseries.timeseries_anomalies_stl(data: DataFrame, **kwargs) DataFrame

Return anomalies in Timeseries using STL.

Parameters:
  • data (pd.DataFrame) – DataFrame as a time series data set retrieved from data connector or external data source. Dataframe must have 2 columns with time column set as index and other numeric value.

  • time_column (str, optional) – If the input data is not indexed on the time column, use this column as the time index

  • data_column (str, optional) – Use named column if the input data has more than one column.

  • seasonal (int, optional) – Seasonality period of the input data required for STL. Must be an odd integer, and should normally be >= 7 (default).

  • period (int, optional) – Periodicity of the the input data. by default 24 (Hourly).

  • score_threshold (float, optional) – standard deviation threshold value calculated using Z-score used to flag anomalies, by default 3

Returns:

Returns a dataframe with additional columns by decomposing time series data into residual, trend, seasonal, weights, baseline, score and anomalies. The anomalies column will have 0, 1,-1 values based on score_threshold set.

Return type:

pd.DataFrame

Notes

The decomposition method is STL - Seasonal-Trend Decomposition using LOESS

msticpy.analysis.timeseries.ts_anomalies_stl(data: DataFrame, **kwargs) DataFrame

Return anomalies in Timeseries using STL.

Parameters:
  • data (pd.DataFrame) – DataFrame as a time series data set retrieved from data connector or external data source. Dataframe must have 2 columns with time column set as index and other numeric value.

  • time_column (str, optional) – If the input data is not indexed on the time column, use this column as the time index

  • data_column (str, optional) – Use named column if the input data has more than one column.

  • seasonal (int, optional) – Seasonality period of the input data required for STL. Must be an odd integer, and should normally be >= 7 (default).

  • period (int, optional) – Periodicity of the the input data. by default 24 (Hourly).

  • score_threshold (float, optional) – standard deviation threshold value calculated using Z-score used to flag anomalies, by default 3

Returns:

Returns a dataframe with additional columns by decomposing time series data into residual, trend, seasonal, weights, baseline, score and anomalies. The anomalies column will have 0, 1,-1 values based on score_threshold set.

Return type:

pd.DataFrame

Notes

The decomposition method is STL - Seasonal-Trend Decomposition using LOESS