Plot a Network Graph from DataFrame =================================== MSTICPy has functions that let you convert a pandas DataFrame into a `networkx `__ graph or plot directly as a graph using `Bokeh `__ interactive plotting. This uses underlying functionality from NetworkX and Bokeh You pass the functions the column names for the **source** and **target** nodes to build a basic graph. You can also name other columns to be node or edge attributes. When displayed these attributes are visible as popup details courtesy of Bokeh's Hover tool. You can also opt to use different networkx layouts other than the default ``spring`` layout. Note: We use the term ""network graph" in this document. This is shortened to "graph" when used in the context of Networkx e.g. "a Networkx graph". .. code:: ipython3 # Import msticpy import msticpy as mp mp.init_notebook(); # Read in a DataFrame proc_df = pd.read_csv("./data/processes_on_host.csv", index_col=0) proc_df["Process"] = proc_df.NewProcessName.str.extract(r".*\\([^\\]+)") proc_df.head(3) Overview -------- You can create and display a DataFrame as a network graph using the MSTICPy pandas accesssor :py:meth:`mp_plot.network `. Below is an example featuring process creation events using the *SubjecUserName* and *Process* name as nodes. Node and edge attributes are taken from other DataFrame columns. .. tip:: use the **WheelZoom** tool to use the mouse scroll wheel to zoom in and out of the plot. The button to enable/disable the WheelZoom tool is highlighted in illustration below. .. code:: ipython3 proc_df.head(100).mp_plot.network( source_col="SubjectUserName", target_col="Process", source_attrs=["SubjectDomainName", "SubjectLogonId"], target_attrs=["NewProcessName", "ParentProcessName", "CommandLine"], edge_attrs=["TimeGenerated"], ) .. figure:: _static/network-graph-wheelzoom.png :alt: Graph plot of accounts and processes showing which account created which processes. Creating a NetworkX Graph from a DataFrame ------------------------------------------ The :py:meth:`mp.to_graph ` accessor will generate a NetworkX graph from the input data. This is a method that is automatically added to DataFrames by MSTICPy. You can supply the following parameters: - source_col - Column for source nodes. - target_col - Column for target nodes. - source_attrs - Optional list of columns to use as source node attributes, by default None - target_attrs - Optional list of columns to use as target node attributes, by default None - edge_attrs - Optional list of columns to use as edge node attributes, by default None - graph_type - "graph" or "digraph" (for nx.DiGraph) It returns a NetworkX graph that you can perform graph analyses on such as extracting subgraphs, analyzing connectedness, etc See the `NetworkX documentation `__ for more details. Create the NX graph. .. code:: ipython3 nx_graph = proc_df.mp.to_graph( source_col="SubjectUserName", target_col="Process" ) Show the number of nodes and edges. .. code:: ipython3 print("# nodes:", len(nx_graph.nodes())) print("# edges:", len(nx_graph.edges())) .. parsed-literal:: # nodes: 65 # edges: 67 Display a node showing the default attributes. .. code:: ipython3 nx_graph.nodes["MSTICAdmin"] .. parsed-literal:: {'node_role': 'source', 'node_type': 'SubjectUserName'} Use the ``nx.neighbors`` function to show the nodes directly connected to this node. .. code:: ipython3 import networkx as nx # Show neighbors of a node - which processes where executed by this account list(nx.neighbors(nx_graph, "MSTICAdmin"))[:15] .. parsed-literal:: ['reg.exe', 'cmd.exe', 'rundll32.exe', '42424.exe', '1234.exe', 'tsetup.1.exe', 'netsh.exe', 'perfc.dat', 'sdopfjiowtbkjfnbeioruj.exe', 'doubleextension.pdf.exe', 'vssadmin.exe', 'conhost.exe', 'net.exe', 'net1.exe', 'regsvr32.exe'] Adding node and edge attributes. .. code:: ipython3 nx_graph = proc_df.mp.to_graph( source_col="SubjectUserName", target_col="Process", source_attrs=["SubjectDomainName", "SubjectLogonId"], target_attrs=["NewProcessName", "ParentProcessName", "CommandLine"], edge_attrs=["TimeGenerated"], ) Display the node with added attributes. .. code:: ipython3 nx_graph.nodes["MSTICAdmin"] .. parsed-literal:: {'SubjectDomainName': 'MSTICAlertsWin1', 'SubjectLogonId': '0xfaac27', 'node_role': 'source', 'node_type': 'SubjectUserName'} Instead of using the pandas accessor, you can import and use the underlying function :py:func:`df_to_networkx `. This has the same functionality as the pandas accessor method. .. code:: ipython3 msticpy.transform.network.df_to_networkx nx_graph = df_to_networkx( data=proc_df, source_col="SubjectUserName, target_col="Process" ) Built-in Networkx Plotting ~~~~~~~~~~~~~~~~~~~~~~~~~~ You can use the matplotlib built-in networkx plotting backend. You can also use NetworkX functions to export the graph to a variety of more flexibly visualization tools such as GraphViz. .. code:: ipython3 nx.draw(nx_graph) .. figure:: _static/network-graph-nx-plot.png :alt: Basic Matplotlib plot of accounts and processes network graph. Plotting a Network Graph ------------------------ Using Bokeh plotting gives you interactivity as well as a more informative and richer display. You can build and plot a graph in a single operation using the :py:meth:`mp_plot.network ` accessor method. Use the standard Bokeh tools on the created plot to select nodes and edges, to zoom and pan around the network graph and to hover over elements to reveal attribute values. .. note:: Bokeh graph plotting does not support interactive dragging of nodes and recalculation of the layout. .. code:: ipython3 proc_df.head(70).mp_plot.network( source_col="ParentProcessName", target_col="Process" ) .. figure:: _static/network-graph1.png :alt: Graph plot of accounts and processes showing which account created which processes. This has the same options as the ``np.to_graph`` accessor method: - source_col - Column for source nodes. - target_col - Column for target nodes. - source_attrs - Optional list of columns to use as source node attributes, by default None - target_attrs - Optional list of columns to use as target node attributes, by default None - edge_attrs - Optional list of columns to use as edge node attributes, by default None - graph_type - "graph" or "digraph" (for nx.DiGraph) In this case, they also cause the node and edge attributes to be displayed on the plot using the Bokeh HoverTool. Moving the mouse cursor over a node or edge will display the attributes. Note the attributes for source and target nodes are both shown with the HoverTool but only the values of populated attributes for that node type (source or target) are shown. .. code:: ipython3 proc_df.head(70).mp_plot.network( source_col="ParentProcessName", target_col="Process", source_attrs=["SubjectDomainName", "SubjectLogonId"], target_attrs=["NewProcessName", "ParentProcessName", "CommandLine"], edge_attrs=["TimeGenerated"], ) .. figure:: _static/network-graph2.png :alt: Graph plot of accounts and processes showing which account created which processes. This shows hovering over one process node and seeing the attributes such as Parent Process name and process command line. There are a number of other parameters to control the display of the graph. - title - Title for the plot, by default 'Data Graph' - node_size - Size of the nodes in pixels, by default 25 - font_size - Font size for node labels, by default 10. Can be an integer (point size) or a string (e.g. "10pt") - width - Plot width in pixels, by default 800 - height - Plot height (the default is 800) - plot scale - Position scale (the default is 2) - hide - Don 't show the plot, by default False. If True, just return the figure. - source_color - The color of the source nodes, by default 'light-blue' - target_color - The color of the source nodes, by default 'light-green ' - edge_color - The color of the edges, by default 'black' - node_size - Size of the nodes in pixels, by default 25 - font_size - Font size for node labels, by default 10. Can be an integer (point size) or a string (e.g. "10pt") - \**kwargs - other keyword arguments will be passed to the Networkx layout function. References ---------- - `Networkx from_pandas_edgelist `__ - `Bokeh graph visualization `__