msticpy.init.pivot_core.pivot_pd_accessor module

Pandas DataFrame accessor for Pivot functions.

class msticpy.init.pivot_core.pivot_pd_accessor.PivotAccessor(pandas_obj)

Bases: object

Pandas api extension for Pivot functions.

Instantiate pivot extension class.

display(title: str | None = None, cols: Iterable[str] | None = None, query: str | None = None, head: int | None = None) DataFrame

Display the DataFrame in the middle of a pipeline.

Parameters:
  • title (str, optional) – Title to display for the DataFrame, by default None

  • cols (Iterable[str], optional) – List of columns to display, by default None

  • query (str, optional) – Query to filter the displayed data, by default None This should be a string executable by the DataFrame.query function

  • head (int, optional) – Limit the displayed output to head rows, by default None

Returns:

Passed through input DataFrame.

Return type:

pd.DataFrame

filter(expr: str | Number, match_case: bool = False, numeric_col: bool = False) DataFrame

Filter all columns of DataFrame, return rows with any matches.

Parameters:
  • expr (Union[str, Number]) – String or regular expression to match or a (partial) number. If expr is a string it is matched against any string or object columns using pandas str.contains(..regex=True) If expr is a number or if numeric_col is True, expr is converted to a string and matched as a substring of any numeric columns.

  • match_case (bool, optional) – The match is not case-sensitive by default. Set to True to force case-sensitive matches.

  • numeric_col (bool, optional) – If expr is a numeric string or number this will force a match against only numeric columns, by default False

Returns:

The filtered dataframe

Return type:

pd.DataFrame

Raises:

TypeError – If expr is neither a string or number.

filter_cols(cols: str | Iterable[str], match_case: bool = False, sort_cols: bool = False) DataFrame

Filter output columns matching names in cols expression(s).

Parameters:
  • cols (Union[str, Iterable[str]]) – Either a string or a list of strings with filter expressions. These can be exact matches for column names, wildcard patterns (“*” matches multiple chars and “?” matches a single char), or regular expressions.

  • match_case (bool, optional) – Use case-sensitive matching, by default False

  • sort_cols (bool, optional) – Alphabetically sort column names, by default False

Returns:

The input DataFrame with only columns that match the filtering expressions.

Return type:

pd.DataFrame

list_to_rows(cols: str | Iterable[str]) DataFrame

Expand a list column to individual rows.

Parameters:

cols (Union[str, Iterable[str]]) – The columns to be expanded.

Returns:

The expanded DataFrame

Return type:

pd.DataFrame

parse_json(cols: str | Iterable[str]) DataFrame

Convert JSON string columns to Python types.

Parameters:

cols (Union[str, Iterable[str]]) – Column or iterable of columns to process

Returns:

Processed dataframe

Return type:

pd.DataFrame

run(func: Callable[[...], DataFrame], **kwargs) DataFrame

Run a pivot function on the current DataFrame.

Parameters:
  • func (Callable[..., pd.DataFrame]) – Pivot function to run

  • kwargs – Keyword arguments to pass to func. A column specification (e.g. column=”src_col_name”) is usually the minimum needed. For data queries the column keyword must be the name of the the query parameter (e.g. host_name = “src_col_name”)

Returns:

The output DataFrame from the function.

Return type:

pd.DataFrame

Notes

You can pass the join keyword argument to most pivot functions. Values for join are “inner”, “left”, “right” or “outer”.

sort(cols: str | Iterable[str] | Dict[str, str], ascending: bool | None = None) DataFrame

Sort output by column expression.

Parameters:
  • cols (Union[str, Iterable[str], Dict[str, str]]) – If this is a string, then this should be a column name expression. A column name expression is either a column name, a case-insenstive column name or a regular expression to match one or more column names. Each column name expression can be of the format col_name_expr:desc to sort descending (col_name_expr:asc is the default). The col_name can also be a regular expression or partial column name. If this is a list, then each element should be a column name expression with an optional ‘:asc’ or ‘:desc’ suffix. If this is a dict, then the keys should be column name expressions and the values bools indication ‘ascending’ (True) or ‘descending’ (False) sort.

  • ascending ([type], optional) – Overrides any ordering specified for individual columns and sorts ‘ascending’ if True or ‘descending’ if False. If not supplied and no column-specific ordering is supplied it sorts ascending.

Returns:

The sorted DataFrame

Return type:

pd.DataFrame

Raises:

ValueError – One or more column expressions matched no column name in the input.

tee(var_name: str, clobber: bool = False) DataFrame

Save current dataframe to var_name in the IPython user namespace.

Parameters:
  • var_name (str) – The name of the DF variable to create.

  • clobber (bool, optional) – Whether to overwrite an existing variable of the same name, by default False

Returns:

Passed through input DataFrame.

Return type:

pd.DataFrame

Notes

This function only works in an IPython/Jupyter notebook environment. It will attempt to create a variable in the user local namespace that references the current state of the DataFrame in the pipeline.

By default it will not overwrite an existing variable of the same name (specify clobber=True to overwrite)

tee_exec(df_func: str, *args, **kwargs) DataFrame

Run a dataframe method on the dataframe without changing it.

Parameters:
  • df_func (str) – The name of the function to execute. Accessor methods must be of the form “accessor.method”.

  • args (tuple) – Positional arguments to be passed to the function

  • kwargs (dict) – Keyword arguments to be passed to the function.

Returns:

Passed through input DataFrame.

Return type:

pd.DataFrame

Notes

This function runs the DataFrame method or accessor function. It does not alter the DataFrame (unless the function does any kind of in-place modification). The function is run and the original input DataFrame is returned.