Azure Data Explorer/Kusto Provider

The Azure Data Explorer/Kusto QueryProvider uses the azure-kusto-data SDK to connect to Azure Data Explorer clusters and provide query capability.

Warning

This provider replaces the an earlier implementation, which used KqlMagic as the underlying data connector. The previous driver is still available but to use it you must specify Kusto_Legacy as the provider name when creating the QueryProvider instance.

For more information about the previous driver see Azure Data Explorer/Kusto Provider - Legacy Version

Changes from the previous implementation

  • The driver supports asynchronous execution of queries. This is used when you create a Query provider with multiple connections (e.g. to different clusters) and when you split queries into time chunks. See Running a query across multiple connections and Splitting Query Execution into Chunks for for more details.

  • The settings format has changed (although the existing format is still supported albeit with some limited functionality).

  • Supports user-specified timeout for queries.

  • Supports proxies (via MSTICPy config or the proxies parameter to the connect method)

  • You could previously specify a new cluster to connect to in when executing a query. This is no longer supported. Once the provider is connected to a cluster it will only execute queries against that cluster. (You can however, call the connect() function to connect the provider to a new cluster before running the query.)

  • Some of the previous parameters have been deprecated:

    • The mp_az_auth parameter is replaced by auth_types (the former still works but will be removed in a future release).

    • mp_az_auth_tenant_id is replaced by tenant_id (the former is no longer supported).

Kusto Configuration

Kusto Configuration in MSTICPy

You can store your connection details in msticpyconfig.yaml.

For more information on using and configuring msticpyconfig.yaml see msticpy Package Configuration and MSTICPy Settings Editor

Note

The settings for the new Kusto provider are stored in the KustoClusters section of the configuration file. This cannot currently be edited from the MSTICPy Settings Editor - please edit the msticpyconfig.yaml in a text editor to change these.

To accommodate the use of multiple clusters, the new provider supports a different configuration format.

The basic settings in the file should look like the following:

KustoClusters:
  ...
  Cluster1:
    Args:
      Cluster: https://uscluster.kusto.windows.net
  Cluster2:
    Args:
      Cluster: https://eucluster.kusto.windows.net
      IntegratedAuth: True  # This is default and is optional

You can have any number of cluster entries in this section.

Specifying additional parameters for a cluster

You can add authentication and other parameters to the Args sub-key of a cluster definition. In the following example, the TenantId is specified along with Client app ID and client secret for clientsecret authentication.

KustoClusters:
  DataClusterX:
    Args:
      Cluster: https://xxx.kusto.windows.net
      ClientId: 69d28fd7-42a5-48bc-a619-af56397b1111
      TenantId: 69d28fd7-42a5-48bc-a619-af56397b9f28
      ClientSecret:
        KeyVault:

The ClusterDefaults section

If you have parameters that you want to apply to all clusters, you can add these to a ClusterDefaults section.

KustoClusters:
  ClusterDefaults:
    Args:
      TenantId: 69d28fd7-42a5-48bc-a619-af56397b9f28
  Cluster1:
    Args:
      Cluster: https://uscluster.kusto.windows.net
  Cluster2:
    Args:
      Cluster: https://eucluster.kusto.windows.net

Creating ClusterGroups

You can create a group of clusters that you can reference by cluster group name. This is useful if you have clusters in different regions that share the same schema and you want to run the same queries against all of them.

ClusterGroups are used primarily to support query templates, to match queries to the correct cluster. See Writing query templates for Kusto clusters later in this document.

Loading a QueryProvider for Kusto

import msticpy as mp
kql_prov = mp.QueryProvider("Kusto")

Optional parameters

timeout: Query timeout in seconds, default is 240 seconds (4 minutes) Maximum is 3600 seconds (1 hour). This can also be set in the connect call (see below) and overridden in query methods.

proxies: Proxy settings for Kusto queries. Dictionary format is {protocol: proxy_url} Where protocol is https, http, etc. and proxy_url can contain optional authentication information in the format “https://username:password@proxy_host:port” If you have a proxy configuration in msticpyconfig.yaml and you do not want to use it, set this to an empty dictionary. This can be overridden in connect call (see below).

Note

Proxy settings can also be configured globally in msticpyconfig.yaml in the Proxies key of the msticpy section. This will be used automatically if set unless you override it in the proxies parameter in the connect call. Set to an empty dictionary to disable global proxy settings.

Connecting to a Kusto cluster

Before running queries you need to connect to a cluster using the connect() method.

See connect()

The parameters required for connection to a Kusto cluster can be passed to connect() in several of ways. You can provide a full connection string or parameters for cluster (and optionally, database). In the latter case, you must have configured settings for the cluster defined in your msticpyconfig.yaml.

If you have the cluster details configured in msticpy, the cluster parameter can be one of the following:

  • The section name (“Cluster1” or “Cluster2” in the configuration example above)

  • The full URL of the cluster either the actual cluster name

  • The host name of the cluster (e.g. “uscluster”, “eucluster” in the example)

In all cases these are case-insensitive.

These are all equivalent:

kql_prov.connect(cluster="Cluster2")
kql_prov.connect(cluster="eucluster")
kql_prov.connect(cluster="https://eucluster.kusto.windows.net")

If the cluster is not in your configuration you must use the full URL of the cluster.

You can optionally specify a default database to connect to. The database can be changed with each query (either by specifying a database parameter or by using the database metadata property in a query definition file (see Writing query templates for Kusto clusters) below)

You can also pass authentication parameters in the connect call:

  • auth_types - to override the configured Azure credential types

  • tenant_id - to override your default tenant_id

kql_prov.connect(
    cluster="Cluster2",
    auth_types=["device_code"],
    tenant_id="69d28fd7-42a5-48bc-a619-af56397b9f28"
)

For more details on Azure Authentication in MSTICPy see Azure Authentication

Other parameters

timeout: Query timeout in seconds, default is 240 seconds (4 minutes) Maximum is 3600 seconds (1 hour). This can also be set in the connect call (see below) and overridden in query methods. connection_str: Provide a full connection string, including authentication credentials. This can be used instead of the cluster parameter.

Kusto QueryProvider methods and properties

The Kusto QueryProvider has the following methods and properties in addition to those inherited from the base QueryProvider class.

Running Ad Hoc queries

You can run ad hoc queries using the exec_query() method of the QueryProvider.

Note

You usually need to specify a database parameter when running ad hoc queries.

Writing query templates for Kusto clusters

The details for configuring and connecting to Kusto clusters are enough to allow you to run ad hoc queries. However, if you want to create and use parameterized queries there are some additional steps that you need to take.

Please read the general section on Creating new queries if you are not familiar with the general process of creating query templates for MSTICPy.

The queries for Kusto work in the same way as for many other data providers except that you can (and should) specify the cluster(s) and database for the query to use.

Controlling which queries are displayed and runnable for a provider

Since Kusto clusters have widely varying schemas, it only makes sense to run a query on a cluster for which it was designed. MSTICPy enforces this by allowing you to specify parameters in both the query template definitions and the cluster configuration in msticpyconfig.yaml that correctly match queries to providers connected to appropriate clusters.

When you first instantiate a Kusto QueryProvider, it will read all queries files available for the Kusto DataEnvironment. However, when you connect to a cluster, these queries and filtered so that only ones compatible with this cluster are available.

If you have query definition files (query templates) you can try this by creating a Kusto QueryProvider and running the list_queries() method. Then connect to a cluster and run list_queries() again. In the first case, you should see all queries that you have defined but in the second case, you should only see queries that have been built to run on that cluster.

from msticpy.data import QueryProvider
kql_prov = QueryProvider("Kusto")
kql_prov.list_queries()
# new cell
kql_prov.connect(cluster="Cluster2")
kql_prov.list_queries()

This is explained more in the later sections on Kusto cluster specifier and

Basic Kusto query format

The query template format for Kusto queries should look like the following. The data_environments item must include “Kusto” in the list of applicable environments.

This example show the metadata section for a query file, highlighting the items that are specific Kusto queries. (data_families is common to other query types but has some Kusto-specific usage that is different as explained later.)

metadata:
    version: 1
    description: Kusto Queries
    data_environments: [Kusto]
    data_families: [DeviceEvents.hostdata]
    cluster: https://uscluster.kusto.windows.net
    clusters:
    - https://uscluster.kusto.windows.net
    - https://eucluster.kusto.windows.net
    cluster_groups:
    - Group1
    database: hostdata
    tags: ["user"]
defaults:
  parameters:
    table:
      # ....
sources:
    list_host_processes:
      description: Lists all process creations for a host
        # ....

Most of the query file is identical to queries for other drivers. However, the metadata section has some additional items. These are explained in the following sections.

Kusto database specifier

You can use the database item to specify the cluster database to use. For backward compatibility you can also specify this in the data_families entry using a dotted notation. data_families is also used to group queries in the query provider, so using this to specify the database name is not recommended.

The following examples show the different ways of configuring this.

For the following two configurations, the database used is DeviceEvents and the queries are grouped under the hostdata family (the queries are attached as methods to the QueryProvider).

 metadata:
     version: 1
     description: Kusto Queries
     data_environments: [Kusto]
     data_families: [hostdata]
     database: DeviceEvents
     cluster: https://uscluster.kusto.windows.net
# Deprecated format
metadata:
    version: 1
    description: Kusto Queries
    data_environments: [Kusto]
    data_families: [hostdata.DeviceEvents]
    cluster: https://uscluster.kusto.windows.net

For this configuration the database used is DeviceEvents and the queries will also be grouped under the DeviceEvents container.

# Deprecated format
metadata:
    version: 1
    description: Kusto Queries
    data_environments: [Kusto]
    data_families: [DeviceEvents]
    cluster: https://uscluster.kusto.windows.net

Note

The when using the data_families entry to specify the database name, only the first entry in the list is used for this. Subsequent items still work for creating data query groupings.

Kusto cluster specifier

Adding a cluster specifier matches queries to the right cluster and prevents a query from being used with a cluster and database for which it was not intended.

You can specify the cluster to use in three ways:

  • Including a cluster_groups item in the metadata section. This is a list of cluster group names that are defined in the msticpyconfig.yaml file. Queries with one or more cluster_groups entries can be used against any of the cluster definitions in msticpyconfig.yaml that have matching cluster group names.

  • Including a clusters item in the metadata section. This is a list of cluster identifiers (URIs, names or configuration section names that are defined in the msticpyconfig.yaml file). These queries can be used with any cluster configuration entry that matches one of the IDs in the clusters item.

  • Including a cluster item in the metadata section. This is a single cluster identifier (URI, name or configuration section name that is defined in the msticpyconfig.yaml file). These queries can only be used with the cluster configuration entry that matches the ID in the cluster item.

The cluster specifiers are used in the order above until a match is found. You can include more than one cluster specifier in a query definition file. If no match is found, the query will not be added to the query provider.

Note

For queries that have no cluster specifier, they will be added to the query provider but but may not work.

Tip

If you want to avoid these queries being added use the parameter strict_query_match=True when creating the Kusto QueryProvider as shown in the following example

import msticpy as mp
kql_prov = mp.QueryProvider("Kusto", strict_query_match=True)

The following examples show the different ways of configuring clusters to match queries:

 metadata:
     version: 1
     description: Kusto Queries
     data_environments: [Kusto]
     data_families: [hostdata]
     cluster_groups:
     - Group1
     database: DeviceEvents
 metadata:
     version: 1
     description: Kusto Queries
     data_environments: [Kusto]
     data_families: [hostdata]
     clusters:
     - https://uscluster.kusto.windows.net
     - https://eucluster.kusto.windows.net
     database: DeviceEvents
 metadata:
     version: 1
     description: Kusto Queries
     data_environments: [Kusto]
     data_families: [hostdata]
     cluster: https://uscluster.kusto.windows.net
     database: DeviceEvents

Note

you can also use cluster specifiers (using the same syntax as show above) for individual query metadata. Each query has it’s own optional metadata sub-key. Setting cluster specifiers at the query level, with different queries assigned to different clusters in the same file may make organizing your queries more difficult, so we recommend only using cluster specifiers at the file level. However, it is possible to do this if you need to.

Logical flow used to determine if a query is shown

This flowchart shows the logic applied using the query definition and configuration parameters to determine whether a query is shown or not (i.e. whether it appears in list_queries() and as attached to the QueryProvider as a query function.)

Flow chart showing how queries are filtered based on query metadata and configuration settings.

Other Kusto Documentation

For examples of using the Kusto provider, see the samples Kusto Analysis Notebook and Kusto Ingest Notebook

Kusto driver API documentation