ProcessTree =========== This describes the use of the process tree data and visualization modules. These modules can be used with Windows process creation events (ID 4688), Linux auditd logs or Microsoft Defender for Endpoint (MDE)/Microsoft 365 Defender logs. The ProcessTree visualization is built using the `Bokeh library `__. See the sample `ProcessTree Notebook `__ for full code for the examples shown here. The process tree functionality has two main components: - Process Tree creation - this takes a standard log from a single host and builds the parent-child relationships between processes in the data set. There are a set of utility functions to extract individual and partial trees from the processed data set. - Process Tree visualization - this takes the processed output from the previous component and displays the process tree using Bokeh plots. .. note:: The expected schema for the Linux audit data is as produced by the ``auditdextract.py`` module in ``msticpy``. This module combines related process exec messages into a single combined message that emulates the Windows 4688 event. This retains the audit schema apart from the following additions: - ``cmdline``: this is a concatenation of the ``a0``, ``a1``, etc argument fields - ``EventType``: this is the audit message type (``SYSCALL``, ``EXECVE``, ``CWD``, etc.) - the combined ``SYSCALL_EXECVE`` created by ``auditextract`` is the only type currently supported. Support for other formats such as Microsoft Defender for Endpoint and Sysmon is also included. Plotting process trees ---------------------- Plotting process trees from process event data involves two stages: - Converting the linear event data into an hierarchical tree data structure - Plotting the visualization In most cases you don't need to worry about these two processes - the standard :py:func:`plot_process_tree` function and the pandas accessor function :py:meth:`mp_process_tree.plot` will try to detect if the input data is in the correct format. If it is not, the process tree builder is automatically applied to the data. This should work for Windows events, Linux auditd events and MDE process events. The easiest way to plot process data as a process tree is to use the pandas ``mp_process_tree`` accessor. .. code:: IPython from msticpy.vis import process_tree my_proc_df.mp_plot.process_tree() .. figure:: _static/process_tree1.png :alt: Process tree plot :width: 5in :height: 5in Here is the same thing using the ``plot_process_tree`` function. .. code:: IPython from msticpy.vis import process_tree process_tree.plot_process_tree(procs_df) For full usage, see the later section `Process tree plotting parameters`_ Extracting process trees from logs ---------------------------------- You can build a process tree without plotting it. You might want to do this if you want the intermediate data for analysis or if you want to extract a sub-tree for display. The later section `Process Tree utility functions`_ describes some process tree analysis and manipulation functions that you can use on the built process trees. build_process_tree syntax ^^^^^^^^^^^^^^^^^^^^^^^^^ See :py:func:`build_process_tree` .. code:: python from msticpy.transform import process_tree process_tree.build_process_tree(procs) Parameters ^^^^^^^^^^ procs (pd.DataFrame) Process events (Windows 4688 or Linux Auditd) schema (ProcSchema, optional) The column schema to use, by default None If None, then the schema is inferred show_summary (bool, optional) Shows summary of the built tree, default is False. debug (bool, optional) If True produces extra debugging output, by default False The following example shows importing the require modules and reading in test data. We then call ``build_process_tree`` to extract the parent-child relationships between processes. .. container:: cell code .. code:: python from IPython.display import display import pandas as pd from msticpy.vis import process_tree win_procs = pd.read_pickle("../demos/data/win_proc_test.pkl") p_tree_win = process_tree.build_process_tree(win_procs, show_summary=True) The tree builder process, tries to infer the schema (you can override this with the *schema* parameter) and assembles process parent-child relationships. It creates unique keys (the ``proc_key`` column) for each process, based on the imagepath + process id + timecreated. It then tries to find the parent process in the same dataset or infer the parent from the data in the created process event. How it does this differs slightly between input data formats. It then adds a ``parent_key`` field to each child record for the parent record (found or inferred). This modified dataframe is returned from ``build_process_tree``. If you supply ``show_summary=True`` parameter it will also output some statistics about the created tree. .. container:: output stream stdout :: {'Processes': 1010, 'RootProcesses': 10, 'LeafProcesses': 815, 'BranchProcesses': 185, 'IsolatedProcesses': 0, 'LargestTreeDepth': 7} The example below shows using two of the process tree utility functions to extract the descendants (children, grandchildren, etc) of one of the root process rows and then display the subtree. .. note:: "root" process, in this context means any process whose parent could not be determined. This is not necessarily the actual root process for this tree. A typical data set will have more than one "root" process - this might be better thought of as "earliest discovered ancestor process" but that's a bit of a mouthful. "Root" processes are flagged in the data by an ``IsRoot`` column with the value True. .. code:: ipython proc_tree = process_tree.get_descendents(p_tree_win, process_tree.get_roots(p_tree_win).iloc[2]) process_tree.plot_process_tree(data=proc_tree, legend_col="SubjectUserName", show_table=True) .. figure:: _static/process_tree1.png :alt: Process tree plot :width: 5in :height: 5in Process Tree Plotting Syntax ---------------------------- See :py:func:`plot_process_tree` and :py:func:`build_and_show_process_tree` .. code:: python process_tree.plot_process_tree( data, schema=None, output_var=None, legend_colNone, show_table=False, ) Process tree plotting parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ data (pd.DataFrame) DataFrame containing one or more Process Trees. This should be the output of ``build_process_tree`` described above. schema (Dict | ProcSchema, optional) The data schema to use for the data set, by default None. If None the schema is inferred. A schema object maps generic field names (e.g. ``process_name``) on to a data-specific name (e.g. ``exe`` in the case of Linux audit data). This is usually not required since the function will try to infer the schema from fields in the input DataFrame. This can be supplied as a ProcSchema instance or a dictionary with required schema mappings. output_var (str, optional) Output variable for selected items in the tree, by default None. Setting this lets you return the keys of any items selected in the bokeh plot. For example, if you supply the string "my_results" and then select one or more processes in the tree, the Python variable ``my_results`` will be populated with a list of keys (index items) of the corresponding rows in the input DataFrame. .. note:: Due to restrictions on javascript execution in many notebook environments, this only works in Jupyter classic. legend_col (str, optional) The column used to color the tree items, by default None. If this column is a string, the values will be treated as categorical data and map unique values to different colors and display a legend of the mapping. If this column is a numeric or datetime value, the values will be treated as continuous and a color gradient bar will be displayed indicating the mapping of values on to the color gradient. show_table (bool) Set to True to show the data table, by default False. Shows the source values as a data table beneath the process tree. height (int, optional) The height of the plot figure (the default is 700) width (int, optional) The width of the plot figure (the default is 900) title (str, optional) Title to display (the default is None) hide_legend (bool, optional) Hide the legend box, even if legend_col is specified. pid_fmt (str, optional) Display Process ID as 'dec' (decimal) or 'hex' (hexadecimal), default is 'hex'. .. warning:: **Large data sets** (more than a few hundred processes) These will normally be handled well by the Bokeh plot (up to multiple tens of thousands or more) but it will make navigation of the tree more difficult. In particular, the range tool (on the right of the main plot) will be difficult to manipulate. Split the input data into smaller chunks before plotting. .. note:: **Range Tool and Font Size** Avoid using Range tool to change the size of the displayed plot. The font size does not scale based on how much data is shown. If you use the range tool to select too large a subset of the data in the main plot, the font will become unreadable. If this happens, use the ``reset`` tool to set the plot back to its defaults. Dragging the range box along the tree, rather than dragging individual edges (resulting in resizing the range) will give more readable results. Linux Process Tree ------------------ The process for visualizing Linux process trees is almost identical to visualizing Windows processes. .. note:: This assumes that the Linux audit log has been read from a file using :py:func:`read_from_file` or read from Azure Sentinel/Log Analytics using the LinuxAudit.auditd_all query and processed using :py:func:`extract_events_to_df` function. Using either of these, the audit messages events related to a single process start are merged into a single row. See :doc: `../data_acquisition/CollectingLinuxAuditLogs.rst` for more details. Also, see the section `Adapting the input schema of your data`_ for details about using different input schemas. .. container:: cell code .. code:: python # Process Linux audit events. Show verbose output. p_tree_lx = process_tree.build_process_tree(linux_proc, show_progress=True, debug=True) .. container:: output stream stdout :: Original # procs 34345 Merged # procs 34345 Merged # procs - dropna 11868 Unique merged_procs index in merge 34345 These two should add up to top line Rows with dups 0 Rows with no dups 34345 0 + 34345 = 34345 original: 34345 inferred_parents 849 combined 35194 has parent time 20177 effectivelogonId in subjectlogonId 35190 parent_proc_lc in procs 34345 ProcessId in ParentProcessId 21431 Parent_key in proc_key 34345 Parent_key not in proc_key 845 Parent_key is NA 845 {'Processes': 35190, 'RootProcesses': 845, 'LeafProcesses': 17664, 'BranchProcesses': 16681, 'IsolatedProcesses': 0, 'LargestTreeDepth': 10} .. container:: cell code .. code:: python # Take one of the roots from the process set and get the full tree beneath it t_root = process_tree.get_roots(p_tree_lx).iloc[7] full_tree = process_tree.get_descendents(p_tree_lx, t_root) print("Full tree size:", len(full_tree)) .. container:: output stream stdout :: Full tree size: 3032 .. container:: cell code .. code:: python process_tree.plot_process_tree(data=full_tree[:1000], legend_col="cwd") .. figure:: _static/process_tree2.png :alt: Process tree plot :width: 5in :height: 3in Plotting Using a color gradient ------------------------------- .. container:: cell code .. code:: python # Read in and process some data - this contains a Rarity column indicating # how common the process is in analyzed data set. proc_rarity = pd.read_pickle("../demos/data/procs_with_cluster.pkl") proc_rarity_tree = process_tree.build_process_tree(proc_rarity, show_progress=True) .. container:: output stream stdout :: {'Processes': 22992, 'RootProcesses': 31, 'LeafProcesses': 15587, 'BranchProcesses': 7374, 'IsolatedProcesses': 0, 'LargestTreeDepth': 839} .. container:: cell code .. code:: python # Get the root processes from the process tree data prar_roots = process_tree.get_roots(proc_rarity_tree) # Find the tree with the highest Rarity Score and # calculate the AverageRarity for proceses in that tree. # NOTE: this code is only needed to help us choose likely trees to view # it is not needed for the plotting. tree_rarity = [] for row_num, (ix, row) in enumerate(prar_roots.iterrows()): rarity_tree = process_tree.get_descendents(proc_rarity_tree, row) tree_rarity.append({ "Row": row_num, "RootProcess": prar_roots.loc[ix].NewProcessName, "TreeSize:": len(rarity_tree), "AverageRarity": rarity_tree["Rarity"].mean() }) pd.DataFrame(tree_rarity).sort_values("AverageRarity", ascending=False) .. container:: output execute_result :: Row RootProcess TreeSize: 27 27 C:\Windows\System32\svchost.exe 4 23 23 C:\Windows\System32\svchost.exe 2 22 22 C:\Windows\System32\smss.exe 30 20 20 C:\Windows\SoftwareDistribution\Download\Insta... 2 9 9 C:\Windows\System32\smss.exe 7 7 7 C:\ProgramData\Microsoft\Windows Defender\plat... 46 .... .. container:: cell code .. code:: python # Plot the tree using the Rarity column as the legend_col parameter. svcs_tree = process_tree.get_descendents(proc_rarity_tree, prar_roots.iloc[22]) process_tree.plot_process_tree(svcs_tree, legend_col="Rarity", show_table=True) .. figure:: _static/process_tree3.png :alt: Process tree plot :width: 5in :height: 4in Process Tree utility Functions ------------------------------ The :py:mod:`process_tree_utils` module has a number of functions that may be useful in extracting or manipulating process trees or tree relationships. These typically take a ``procs`` parameter - the DataFrame containing the process trees. Processes that perform navigation relative to another process (get_parent, get_children, etc.) also take a ``source`` parameter - the process that is the origin of the navigation. Some functions also have an ``include_source`` parameter, e.g. get_children. This controls whether the function will include the source process in the results. Functions: - :py:func:`build_process_key` - :py:func:`build_process_tree` - :py:func:`get_ancestors` - :py:func:`get_children` - :py:func:`get_descendents` - :py:func:`get_parent` - :py:func:`get_process` - :py:func:`get_process_key` - :py:func:`get_root` - :py:func:`get_root_tree` - :py:func:`get_roots` - :py:func:`get_siblings` - :py:func:`get_summary_info` - :py:func:`get_tree_depth` - :py:func:`infer_schema` :py:func:`~msticpy.transform.process_tree_utils.get_summary_info` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get summary information. .. container:: cell code .. code:: python process_tree.get_summary_info(p_tree_win) .. container:: output execute_result :: {'Processes': 1010, 'RootProcesses': 10, 'LeafProcesses': 815, 'BranchProcesses': 185, 'IsolatedProcesses': 0, 'LargestTreeDepth': 7} :py:func:`~msticpy.transform.process_tree_utils.get_roots` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get roots of all trees in the data set. .. container:: cell code .. code:: python # Get roots of all trees in the set process_tree.get_roots(p_tree_win).head() :py:func:`~msticpy.transform.process_tree_utils.get_descendents` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get the full tree beneath a process. get_descendents takes an ``include_source`` parameter. Setting this to True returns the source process with the result set. .. container:: cell code .. code:: python # Take one of those roots and get the full tree beneath it t_root = process_tree.get_roots(p_tree_win).loc["c:\windowsazure\guestagent_2.7.41491.901_2019-01-14_202614\waappagent.exe0x19941970-01-01 00:00:00.000000"] full_tree = process_tree.get_descendents(p_tree_win, t_root) full_tree.head() :py:func:`~msticpy.transform.process_tree_utils.get_children` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get the immediate children of a process get_children takes an ``include_source`` parameter. Setting this to True returns the source process with the result set. .. container:: cell code .. code:: python # Just get the immediate children of the root process children = process_tree.get_children(p_tree_win, t_root) children.head() :py:func:`~msticpy.transform.process_tree_utils.get_tree_depth` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get the depth of a tree. .. container:: cell code .. code:: python # Get the depth of the full tree depth = process_tree.get_tree_depth(full_tree) print(f"depth of tree is {depth}") .. container:: output stream stdout :: depth of tree is 4 :py:func:`~msticpy.transform.process_tree_utils.get_parent` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :py:func:`~msticpy.transform.process_tree_utils.get_ancestors` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get the parent process or all ancestors. get_ancestors takes an ``include_source`` parameter. Setting this to True returns the source process with the result set. .. container:: cell code .. code:: python # Get Ancestors # Get a child process that's at the bottom of the tree btm_descnt = full_tree[full_tree["path"].str.count("/") == depth - 1].iloc[0] print("parent") display(process_tree.get_parent(p_tree_win, btm_descnt)[:20]) print("ancestors") process_tree.get_ancestors(p_tree_win, btm_descnt).head() .. container:: output stream stdout :: parent TenantId 52b1ab41-869e-4138-9e40-2a4457f09bf0 Account WORKGROUP\MSTICAlertsWin1$ EventID 4688 TimeGenerated 2019-02-09 23:20:15.547000 Computer MSTICAlertsWin1 SubjectUserSid S-1-5-18 SubjectUserName MSTICAlertsWin1$ SubjectDomainName WORKGROUP SubjectLogonId 0x3e7 NewProcessId 0xccc NewProcessName C:\Windows\System32\cmd.exe TokenElevationType %%1936 ProcessId 0x123c CommandLine "cmd" ParentProcessName C:\WindowsAzure\GuestAgent_2.7.41491.901_2019-... TargetLogonId 0x0 SourceComputerId 263a788b-6526-4cdc-8ed9-d79402fe4aa0 TimeCreatedUtc 2019-02-09 23:20:15.547000 EffectiveLogonId 0x3e7 new_process_lc c:\windows\system32\cmd.exe Name: c:\windows\system32\cmd.exe0xccc2019-02-09 23:20:15.547000, dtype: object .. container:: output stream stdout :: ancestors TenantId \ proc_key c:\windowsazure\guestagent_2.7.41491.901_2019-0... 52b1ab41-869e-4138-9e40-2a4457f09bf0 c:\windowsazure\guestagent_2.7.41491.901_2019-0... 52b1ab41-869e-4138-9e40-2a4457f09bf0 c:\windows\system32\cmd.exe0xccc2019-02-09 23:2... 52b1ab41-869e-4138-9e40-2a4457f09bf0 c:\windows\system32\conhost.exe0x14ec2019-02-09... 52b1ab41-869e-4138-9e40-2a4457f09bf0 .... [4 rows x 35 columns] :py:func:`~msticpy.transform.process_tree_utils.get_process` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ get_process retrieves a process record by its key. The process returned is a single row - a pandas Series. .. container:: cell code .. code:: python proc_key = btm_descnt.name print(proc_key) process_tree.get_process(p_tree_win, proc_key) .. container:: output stream stdout :: c:\windows\system32\conhost.exe0x14ec2019-02-09 23:20:15.560000 .. code:: python process2 = full_tree[full_tree["path"].str.count("/") == depth - 1].iloc[-1] process_tree.build_process_key(process2) .. container:: output execute_result :: 'c:\\windows\\system32\\conhost.exe0x15842019-02-10 15:24:56.050000' :py:func:`~msticpy.transform.process_tree_utils.get_siblings` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get the siblings of a process. get_siblings takes an ``include_source`` parameter. Setting this to True returns the source process with the result set. .. container:: cell code .. code:: python src_proc = process_tree.get_children(p_tree_win, t_root, include_source=False).iloc[0] process_tree.get_siblings(p_tree_win, src_proc, include_source=True).head() .. container:: output execute_result :: TenantId \ proc_key c:\windowsazure\guestagent_2.7.41491.901_2019-0... 52b1ab41-869e-4138-9e40-2a4457f09bf0 c:\windowsazure\guestagent_2.7.41491.901_2019-0... 52b1ab41-869e-4138-9e40-2a4457f09bf0 c:\windowsazure\secagent\wasecagentprov.exe0xda... 52b1ab41-869e-4138-9e40-2a4457f09bf0 ... [5 rows x 35 columns] :py:func:`~msticpy.transform.process_tree_utils.tree_to_text` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Return text rendering of the process tree. This function returns a text rendering of the process tree. .. code:: ipython3 from msticpy.transform.proc_tree_schema import WIN_EVENT_SCHEMA print(process_tree.tree_to_text(p_tree_win, schema=WIN_EVENT_SCHEMA)) .. parsed-literal:: +-- Process: C:\Program Files\Microsoft Monitoring Agent\Agent\MonitoringHost.exe PID: 0x888 Time: 1970-01-01 00:00:00+00:00 Cmdline: nan Account: nan LoginID: 0x3e7 +-- Process: C:\Windows\System32\cscript.exe PID: 0x364 Time: 2019-01-15 04:15:26+00:00 Cmdline: "C:\Windows\system32\cscript.exe" /nologo "MonitorKnowledgeDiscovery.vbs" Account: WORKGROUP\MSTICAlertsWin1$ LoginID: 0x3e7 +-- Process: C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\CT_602681692\NativeDSC\DesiredStateConfiguration\ASMHost.exe PID: 0x1c4 Time: 2019-01-15 04:16:24.007000+00:00 Cmdline: "C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\CT_602681692\NativeDSC\DesiredStateConfiguration\ASMHost.exe" GetInventory "C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\CT_602681692\work\ServiceState\ServiceState.mof" "C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\CT_602681692\work\ServiceState" Account: WORKGROUP\MSTICAlertsWin1$ LoginID: 0x3e7 +-- Process: C:\Windows\System32\conhost.exe PID: 0x99c Time: 2019-01-15 04:16:24.027000+00:00 Cmdline: \??\C:\Windows\system32\conhost.exe 0xffffffff -ForceV1 Account: WORKGROUP\MSTICAlertsWin1$ LoginID: 0x3e7 Create a network from a Tree using Networkx ------------------------------------------- .. container:: cell code .. code:: python import networkx as nx import matplotlib.pyplot as plt p_graph = nx.DiGraph() p_graph = nx.from_pandas_edgelist( df=full_tree.reset_index(), source="parent_key", target="proc_key", edge_attr=["TimeGenerated", "NewProcessName", "NewProcessId"], create_using=nx.DiGraph, ) plt.gcf().set_size_inches((20,20)) pos = nx.circular_layout(p_graph) nx.draw_networkx(p_graph, pos=pos, with_labels=False, node_size=50, fig_size=(10,10)) # Get the root binary name to plot labels (change the split param for Linux) labels = full_tree.apply(lambda x: x.NewProcessName.split("\\")[-1], axis=1).to_dict() nx.draw_networkx_labels(p_graph, pos, labels=labels, font_size=10, font_color='k', font_family='sans-serif', font_weight='normal', alpha=1.0) plt.show() .. figure:: _static/process_tree4.png :alt: Networkx plot of process tree :width: 4in :height: 4in Adapting the input schema of your data -------------------------------------- The process tree builder uses generic names to map common event properties such as process name and process ID between different input schemas. The built-in schemas for Windows 4688, Linux Auditd and Microsoft Defender are shown below. =================== ===================== ===================== =========================== Generic name Win 4688 schema Linux auditd schema MDE schema =================== ===================== ===================== =========================== \*time_stamp TimeGenerated TimeGenerated CreatedProcessCreationTime \*process_name NewProcessName exe CreatedProcessName \*process_id NewProcessId pid CreatedProcessId parent_name ParentProcessName *(not used)* ParentProcessName \*parent_id ProcessId ppid CreatedProcessParentId logon_id SubjectLogonId ses InitiatingProcessLogonId target_logon_id TargetLogonId *(not used)* LogonId cmd_line CommandLine cmdline CreatedProcessCommandLine user_name SubjectUserName acct CreatedProcessAccountName user_id SubjectUserSid uid CreatedProcessAccountSid host_name_column Computer Computer ComputerDnsName event_id_column EventID EventType *(not used)* =================== ===================== ===================== =========================== \* indicates a mandatory field. You must supply mappings from your source data for these items. Others are optional but will provide more information to the user in the plotted tree. If your schema differs from, but is similar to one of the built-in schema mappings you can adapt one of these or supply a custom schema when you build and display the process tree. There are also two schema properties that you might need to add to the schema. =================== ===================== ===================== ===================== Mapping property Win 4688 schema Linux auditd schema MDE schema =================== ===================== ===================== ===================== path_separator ``\\`` ``/`` ``\\`` event_id_identifier 4688* SYSCALL_EXECVE *(not used)* =================== ===================== ===================== ===================== \*The event_id_identifier for Windows 4688 schema must be an integer. The path_separator value is used to extract the process file name (minus the path) in the process tree view. The ``event_id_column`` and ``event_id_identifier`` work together and are useful if your input data contains mixed event types. Using these together will tell the process tree builder to filter on events where event_id_column == event_id_identifier. E.g. ``data[data["EventID"] == 4688]`` The example below shows how to adapt an existing Linux schema for different column names in the source schema. .. code:: ipython from msticpy.transform.proc_tree_builder import LX_EVENT_SCH # also WIN_EVENT_SCH and MDE_EVENT_SCH are available from copy import copy cust_lx_schema = copy(LX_EVENT_SCH) cust_lx_schema.time_stamp = "TimeStamp" cust_lx_schema.host_name_column = "host" # Note these are used to filter events if you have a data # set that contains mixed event types. cust_lx_schema.event_id_column = None cust_lx_schema.event_id_identifier = None # now supply the schema as the schema parameter process_tree.build_process_tree(auditd_df, schema=cust_lx_schema) You can also supply a schema as a Python ``dict``, with the keys being the generic internal name and the values, the names of the columns in the input data. Both keys and values are strings except where otherwise indicated above. Use :py:meth:`~msticpy.transform.process_tree_schema.ProcSchema.blank_schema_dict` to get a blank schema dictionary. .. code:: python3 from msticpy.transform.proc_tree_schema import ProcSchema ProcSchema.blank_schema_dict() .. parsed-literal:: {'process_name': 'required', 'process_id': 'required', 'parent_id': 'required', 'time_stamp': 'required', 'cmd_line': None, 'path_separator': None, 'user_name': None, 'logon_id': None, 'host_name_column': None, 'parent_name': None, 'target_logon_id': None, 'user_id': None, 'event_id_column': None, 'event_id_identifier': None} The ``time_stamp`` column **should** be a pandas ``Timestamp`` (Python ``datetime``) type. If your data is in another format (e.g. Unix timestamp or date string), the process tree module will try to convert it before building the process tree plot. This uses ``pandas`` to convert to native ``Timestamp`` If the auto-conversion does not work, convert the timestamp field before trying to use the process tree tools. The example below shows extracting the timestamp from the auditd ``mssg_id`` field. .. code:: ipython linux_proc["ts"] = pd.to_numeric(linux_proc["mssg_id"].apply(lambda x: x.split(":")[0])) # the "ts" column is now a fixed-point number # Convert to a pandas timestamp. linux_proc["time_stamp"] = pd.to_datetime(linux_proc.ts, utc=True) # set the converted column as your time_stamp column. cust_lx_schema.time_stamp = "time_stamp"