msticpy.analysis.timeseries module
Module for timeseries analysis functions.
- class msticpy.analysis.timeseries.MsticpyTimeSeriesAccessor(pandas_obj)
Bases:
object
Msticpy pandas accessor for time series functions.
Initialize the extension.
- analyze(**kwargs) DataFrame
Return anomalies in Timeseries using STL.
- Parameters:
time_column (str, optional) – If the input data is not indexed on the time column, use this column as the time index
data_column (str, optional) – Use named column if the input data has more than one column.
seasonal (int, optional) – Seasonality period of the input data required for STL. Must be an odd integer, and should normally be >= 7 (default).
period (int, optional) – Periodicity of the the input data. by default 24 (Hourly).
score_threshold (float, optional) – standard deviation threshold value calculated using Z-score used to flag anomalies, by default 3
- Returns:
Returns a dataframe with additional columns by decomposing time series data into residual, trend, seasonal, weights, baseline, score and anomalies. The anomalies column will have 0, 1,-1 values based on score_threshold set.
- Return type:
pd.DataFrame
Notes
The decomposition method is STL - Seasonal-Trend Decomposition using LOESS
- anomaly_periods(**kwargs)
Return list of anomaly period as TimeSpans.
- Parameters:
time_column (str, optional) – The name of the time column
period (str, optional) – pandas-compatible time period designator, by default “1H”
pos_only (bool, optional) – If True only extract positive anomaly periods, else extract both positive and negative. By default, True
anomalies_column (str, optional) – The column containing the anomalies flag.
- Returns:
TimeSpan(start, end)
- Return type:
List[TimeSpan]
- apply_threshold(**kwargs)
Return DataFrame with anomalies calculated based on new threshold.
- Parameters:
threshold (float) – Threshold above (beyond) which values will be marked as anomalies. Used as positive and negative threshold unless threshold_low is specified.
threshold_low (Optional[float], optional) – The threshold below which values will be reported as anomalies, by default None.
anomalies_column (str, optional) – The column containing the anomalies flag.
- Returns:
Output DataFrame with recalculated anomalies.
- Return type:
pd.DataFrame
- kql_periods(**kwargs)
Return KQL filter expression for anomaly time periods.
- Parameters:
time_column (str, optional) – The name of the time column
period (str, optional) – pandas-compatible time period designator, by default “1H”
pos_only (bool, optional) – If True only extract positive anomaly periods, else extract both positive and negative. By default, True
- Returns:
TimeSpan(start, end)
- Return type:
List[TimeSpan]
- plot(**kwargs)
Display time series anomalies visualization.
- Parameters:
value_column (str, optional)
to (Name of column holding numeric values to plot against time series)
anomalies (determine)
'Total') ((the default is)
- ystr, optional
alias for “value_column”
- time_columnstr, optional
Name of the timestamp column (the default is ‘TimeGenerated’)
- anomalies_columnstr, optional
Name of the column holding binary status(1/0) for anomaly/benign (the default is ‘anomalies’)
- periodint, optional
Period of the dataset for hourly-no of days, for daily-no of weeks. This is used to correctly calculate the plot height. (the default is 30)
- Parameters:
ref_time (datetime, optional) – Input reference line to display (the default is None)
title (str, optional) – Title to display (the default is None)
legend (str, optional) – Where to position the legend None, left, right or inline (default is None)
yaxis (bool, optional) – Whether to show the yaxis and labels
range_tool (bool, optional) – Show the the range slider tool (default is True)
height (int, optional) – The height of the plot figure (the default is auto-calculated height)
width (int, optional) – The width of the plot figure (the default is 900)
xgrid (bool, optional) – Whether to show the xaxis grid (default is True)
ygrid (bool, optional) – Whether to show the yaxis grid (default is False)
color (list, optional) – List of colors to use in 3 plots as specified in order 3 plots- line(observed), circle(baseline), circle_x/user specified(anomalies). (the default is [“navy”, “green”, “firebrick”])
- Returns:
The bokeh plot figure.
- Return type:
figure
- msticpy.analysis.timeseries.create_time_period_kqlfilter(periods: Dict[datetime, datetime]) str
Return KQL time filter expression from anomaly periods.
- Parameters:
periods (Dict[datetime, datetime]) – Dict of start, end periods
- Returns:
KQL filter clause
- Return type:
str
- msticpy.analysis.timeseries.extract_anomaly_periods(data: DataFrame, time_column: str = 'TimeGenerated', period: str = '1h', pos_only: bool = True, anomalies_column: str = 'anomalies') Dict[datetime, datetime]
Return dictionary of anomaly periods, merging adjacent ones.
- Parameters:
data (pd.DataFrame) – The data to process
time_column (str, optional) – The name of the time column
period (str, optional) – pandas-compatible time period designator, by default “1h”
pos_only (bool, optional) – If True only extract positive anomaly periods, else extract both positive and negative. By default, True
anomalies_column (str, optional) – The column containing the anomalies flag.
- Returns:
start_period, end_period
- Return type:
Dict[datetime, datetime]
- msticpy.analysis.timeseries.find_anomaly_periods(data: DataFrame, time_column: str = 'TimeGenerated', period: str = '1h', pos_only: bool = True, anomalies_column: str = 'anomalies') List[TimeSpan]
Return list of anomaly period as TimeSpans.
- Parameters:
data (pd.DataFrame) – The data to process
time_column (str, optional) – The name of the time column
period (str, optional) – pandas-compatible time period designator, by default “1h”
pos_only (bool, optional) – If True only extract positive anomaly periods, else extract both positive and negative. By default, True
anomalies_column (str, optional) – The column containing the anomalies flag.
- Returns:
TimeSpan(start, end)
- Return type:
List[TimeSpan]
- msticpy.analysis.timeseries.set_new_anomaly_threshold(data: DataFrame, threshold: float, threshold_low: float | None = None, anomalies_column: str = 'anomalies') DataFrame
Return DataFrame with anomalies calculated based on new threshold.
- Parameters:
data (pd.DataFrame) – Input DataFrame
threshold (float) – Threshold above (beyond) which values will be marked as anomalies. Used as positive and negative threshold unless threshold_low is specified.
threshold_low (Optional[float], optional) – The threshhold below which values will be reported as anomalies, by default None.
anomalies_column (str, optional) – The column containing the anomalies flag.
- Returns:
Output DataFrame with recalculated anomalies.
- Return type:
pd.DataFrame
- msticpy.analysis.timeseries.timeseries_anomalies_stl(data: DataFrame, **kwargs) DataFrame
Return anomalies in Timeseries using STL.
- Parameters:
data (pd.DataFrame) – DataFrame as a time series data set retrieved from data connector or external data source. Dataframe must have 2 columns with time column set as index and other numeric value.
time_column (str, optional) – If the input data is not indexed on the time column, use this column as the time index
data_column (str, optional) – Use named column if the input data has more than one column.
seasonal (int, optional) – Seasonality period of the input data required for STL. Must be an odd integer, and should normally be >= 7 (default).
period (int, optional) – Periodicity of the the input data. by default 24 (Hourly).
score_threshold (float, optional) – standard deviation threshold value calculated using Z-score used to flag anomalies, by default 3
- Returns:
Returns a dataframe with additional columns by decomposing time series data into residual, trend, seasonal, weights, baseline, score and anomalies. The anomalies column will have 0, 1,-1 values based on score_threshold set.
- Return type:
pd.DataFrame
Notes
The decomposition method is STL - Seasonal-Trend Decomposition using LOESS
- msticpy.analysis.timeseries.ts_anomalies_stl(data: DataFrame, **kwargs) DataFrame
Return anomalies in Timeseries using STL.
- Parameters:
data (pd.DataFrame) – DataFrame as a time series data set retrieved from data connector or external data source. Dataframe must have 2 columns with time column set as index and other numeric value.
time_column (str, optional) – If the input data is not indexed on the time column, use this column as the time index
data_column (str, optional) – Use named column if the input data has more than one column.
seasonal (int, optional) – Seasonality period of the input data required for STL. Must be an odd integer, and should normally be >= 7 (default).
period (int, optional) – Periodicity of the the input data. by default 24 (Hourly).
score_threshold (float, optional) – standard deviation threshold value calculated using Z-score used to flag anomalies, by default 3
- Returns:
Returns a dataframe with additional columns by decomposing time series data into residual, trend, seasonal, weights, baseline, score and anomalies. The anomalies column will have 0, 1,-1 values based on score_threshold set.
- Return type:
pd.DataFrame
Notes
The decomposition method is STL - Seasonal-Trend Decomposition using LOESS