msticpy.common.data_utils module
Data utility functions.
- msticpy.common.data_utils.df_has_data(data)
Return true if data is a pd.DataFrame and is not empty.
- Return type:
bool
- msticpy.common.data_utils.ensure_df_datetimes(data, columns=None, add_utc_tz=True)
Return dataframe with converted TZ-aware timestamps.
- Parameters:
data (pd.DataFrame) – Input dataframe
columns (str | list[str] | None, optional) – column (str) or list of columns to convert, by default None. If this parameter is not supplied then any column containing the substring “time” is used as a candidate for conversion.
add_utc_tz (bool, optional) – If True any datetime columns in the columns parameter ( (or default ‘.*time.*’ columns) that are timezone-naive, will be converted to Timezone-aware timestamps marked as UTC.
- Returns:
Converted DataFrame.
- Return type:
pd.DataFrame
- msticpy.common.data_utils.ensure_df_timedeltas(data, columns)
Return dataframe with KQL timespan columns converted to timedelta64[ns].
This function converts string columns containing KQL timespan values to pandas timedelta64[ns] dtype. It handles both small timespans (< 1 day) and large timespans (>= 1 day) which use the “d.hh:mm:ss.fffffff” format.
- Parameters:
data (pd.DataFrame) – Input dataframe
columns (str | list[str]) – Column name (str) or list of column names to convert.
- Returns:
Converted DataFrame with timespan columns as timedelta64[ns].
- Return type:
pd.DataFrame
- Raises:
ValueError – If any timespan string in the specified columns cannot be parsed.
Examples
>>> df = pd.DataFrame({"duration": ["1.00:00:00", "00:00:00.001"]}) >>> df_converted = ensure_df_timedeltas(df, columns="duration") >>> df_converted["duration"].dtype dtype('timedelta64[ns]')
>>> # Specify multiple columns >>> df_converted = ensure_df_timedeltas(df, columns=["duration", "elapsed"])
Notes
Uses azure.kusto.data.helpers.parse_timedelta for parsing.
See also
parse_timespanParse individual timespan strings
ensure_df_datetimesSimilar function for datetime conversion