The LocalData provider
LocalData driver documentation
LocalData data provider is intended primarily for testing or demonstrations
where you may not be able to connect to an online data source reliably.
The data backing this driver can be in the form of a pickled pandas DataFrame or a CSV file. In either case the data is converted to a DataFrame to be returned from the query. Usage of this driver is a little different to most other drivers:
You will need to provide a path to your data files when initializing the query provider (by default it will search in the current folder).
You will also need to provide a query definition file (see following example) that maps the data file names that you are using to query names. The path to search for this is specified in the
query_pathsparameter (see code examples below).
Parameters to queries are ignored.
LocalData Configuration in MSTICPy
You can store your connection details in msticpyconfig.yaml.
The settings in the file should look like the following:
DataProviders: ... LocalData: data_paths: - /home/user1/sample_data - /home/shared/sample_data
Creating a LocalData Query File
To define the queries you need to create a query definition file. This is an example of a LocalData yaml query file. It is similar to the query definition files for other providers but simpler. It only requires a description, the data family that the query should be grouped under and the name of the file containing the data.
metadata: version: 1 description: Local Data Alert Queries data_environments: [LocalData] data_families: [SecurityAlert, WindowsSecurity, Network] tags: ['alert', 'securityalert', 'process', 'account', 'network'] defaults: sources: list_alerts: description: Retrieves list of alerts metadata: data_families: [SecurityAlert] args: query: alerts_list.pkl parameters: list_host_logons: description: List logons on host metadata: data_families: [WindowsSecurity] args: query: host_logons.csv parameters: list_network_alerts: description: List network alerts. args: query: net_alerts.csv parameters:
In this example the value for the “query” is just the file name.
If the queries in your file are a mix of data from different data families,
you can group them by specifying one or more values for
If this isn’t specified for an individual query, it will inherit the setting
data_families in the global
metadata section at the top of the file.
Specifying more than one value for
will add links to the query under each data family grouping. This is to allow
for cases where a query may be relevant to multiple categories.
data_families control only how the queries appear in query provider and
don’t affect any other aspects of the query operation.
In the example shown, the
list_alerts query has been added to the
attribute of the query provider, while
list_host_logons is member of
WindowsSecurity. The entry for
list_network_alerts had no
attribute so inherits the values from the file’s
metadata. Since this has multiple
values, the query is added to all three families.
# Structure of the query attributes added to the query provider qry_prov.list_queries()
Network.list_host_logons Network.list_network_alerts ...<other queries> SecurityAlert.list_alerts SecurityAlert.list_network_alerts ...<other queries> WindowsSecurity.list_host_logons WindowsSecurity.list_network_alerts
Preparing to use the LocalData provider
Collect your data files into one or more directories or directory trees (the default location to search for data file is the current directory). Subdirectories are searched for “.pkl” and “.csv” files but only file names matching your query definitions will loaded.
Create one or more query definition yaml files (following the pattern above) and place these in a directory (this can be the same as the data files). The query provider will load and merge definitions from multiple YAML files.
QueryProvider defaults to searching for data files in the current directory and subdirectories. The default paths for query definition files are a) the built-in package queries path (msticpy/data/queries) and b) any custom paths that you have added to msticpyconfig.yaml (see msticpy Package Configuration).
The query definition files must have a
Loading a QueryProvider for LocalData
This loads a LocalData query provider using configuration defaults.
qry_prov = QueryProvider("LocalData")
Unless you have configured mstipyconfig to look in specific locations for your localdata query and data files, you will need to specify these as parameters to QueryProvider.
data_path = "./my_data" query_path = "./myqueries" qry_prov = QueryProvider("LocalData", data_paths=[data_path], query_paths=[query_path]) # list the queries loaded print(qry_prov.list_queries())
Connecting to LocalData
There is no connection step for the LocalData driver.
Example usage of LocalData driver
# list the queries loaded print(qry_prov.list_queries()) # run a query my_alerts = qry_prov.SecurityAlert.list_alerts() # Specify path to look for data files data_path = "./my_data" qry_prov = QueryProvider("LocalData", data_paths=[data_path]) # Show the schema of the data files read in print(qry_prov.schema) # Specify both data and query locations data_path = "./my_data" query_path = "./myqueries" qry_prov = QueryProvider("LocalData", data_paths=[data_path], query_paths=[query_path]) host_logons_df = qry_prov.WindowsSecurity.list_host_logons() # parameters are accepted but ignored host_logons_df = qry_prov.WindowsSecurity.list_host_logons( start=st_date, end=end_date, host_name="myhost.com", )
Other LocalData Documentation
Built-in Queries for Local Data.
LocalData driver API documentation