Setting up Process Auditing for Linux in Azure Sentinel

This is a provisional set of instructions for the preview release of Azure Sentinel.

Add your Linux VMs to the Log Analytics Workspace

Browse to the Log Analytics blade for your workspace and select the option to configure your Azure virtual machines.

Connect a data source

This brings up a list of Virtual machines which you can connect and disconnect from Log Analytics. Click on the Connect icon to add the Log Analytics data collection agent.

Connect a VM to Log Analytics

Configure Auditing on your Linux VMs

Follow the instructions at Configuring and auditing Linux systems with Audit daemon

Add audit filter rules to capture successful process executions

$ sudo auditctl -a always,exit -F arch=b32 -S execve,execveat
$ sudo auditctl -a always,exit -F arch=b64 -S execve,execveat

Your rules should look something like this when added

$ sudo auditctl -l
-w /bin/kmod -p x -k kernelmodules
-w /var/log/audit -p wxa -k audittampering
-w /etc/audit -p wxa -k audittampering
-w /etc/passwd -p wxa -k usergroup
-w /etc/group -p wxa -k usergroup
-w /etc/pam.d -p wxa -k pam
-a always,exit -F arch=b32 -S execve,execveat
-a always,exit -F arch=b64 -S execve,execveat

See Scott Pack’s blog auditd By Example - Monitoring Process Execution

After a few minutes (or hours depending on how busy your hosts are), save a sample from your audit log. A few hundred lines is probably enough.

$ sudo tail -500 /var/log/audit/audit.log > ~/auditsample.txt

You will need to copy this file to whereever you run the next step from.

Add Auditd as a Custom log in Log Analytics

Go back to your Log Analytics configuration blade and choose the “Windows, Linux and other sources” option.

Connect a custom source

Now add a custom log type.

Create a custom log definition

Click the Add+ button and follow the steps

  1. Upload your audit log sample

  2. Select New Line as the record delimiter

  3. Add the path to the audit log (select Linux as the type) /var/log/audit/audit.log

  4. Add a name (e.g. Auditlog_CL) and description

In a while (logs are harvested every hour) you should see a log in Custom Logs showing up in Log Analytics.

Audit data in Azure Sentinel table

At this stage the logs are both verbose (e.g. a process creation event will result in 5 or more audit entries) and not always useful to a casual browser. Several fields are hex encoded (to prevent problems with embedded strings and spaces) and the timestamp of the actual event (as opposed to the TimeGenerated field, which records the log ingestion time) is a Unix timestamp (number of seconds since 1/1/1970). You can use audit tools such as aureport to decode and make sense of the logs.

The msticpy library contains a module to decode and reorganize auditd logs from Log Analytics.

Reading Audit Data from Log Analytics

We can do part of the work using Kusto query language (KQL). This example uses a Kql query executed by KqlMagic in Python.

 1linux_events = r'''
 3| where Computer has '{hostname}'
 4| where TimeGenerated >= datetime({start})
 5| where TimeGenerated <= datetime({end})
 6| extend mssg_parts = extract_all(@"type=(?P<type>[^\s]+)\s+msg=audit\((?P<mssg_id>[^)]+)\):\s+(?P<mssg>[^\r]+)\r?",
 7    dynamic(['type', 'mssg_id', 'mssg']), RawData)
 8| extend mssg_type = tostring(mssg_parts[0][0]), mssg_id = tostring(mssg_parts[0][1])
 9| project TenantId, TimeGenerated, Computer, mssg_type, mssg_id, mssg_parts
10| extend mssg_content = split(mssg_parts[0][2],' ')
11| extend typed_mssg = pack(mssg_type, mssg_content)
12| summarize AuditdMessage = makelist(typed_mssg) by TenantId,
13    TimeGenerated, Computer, mssg_id
14'''.format(start=host1_q_times.start, end=host1_q_times.end,
15        hostname=security_alert.hostname)
16print('getting data...')
17%kql -query linux_events
18linux_events_df = _kql_raw_result_.to_dataframe()
19print(f'{len(linux_events_df)} raw auditd mssgs downloaded')

An explanation of some more involved lines of the query:

  • lines 6-8: Split the rawdata field into message type, message Id and timestamp and message data fields

  • line 9: get rid of unwanted columns

  • line 10: split the message body into an array of key=value strings

  • line 11: pack the message type and list of contents into a dictionary {‘Type’: [k1=v1, k2=v2…]}

  • line 12-13: group by messageId and pack the individual typed_mssg dictionaries into a list of dictionaries

The processing library is used as follows. Note with large data sets this can take some time to process.

from msticpy.transform.auditdextract import extract_events_to_df, get_event_subset
linux_events_all = extract_events_to_df(linux_events_df, verbose=True)

The call to extract_events_to_df() does the following:

  • splits the key=value string

  • hex decoding of any encoded strings

  • type conversion for int fields

  • for SYSCALL/EXECVE rows, some extract processing to identify the executable that ran and re-assemble the commandline arguments

  • extracts the real timestamp and replacing the original TimeGenerated columns (since this was just the log import time, not the event time, which is what we are after)

This example splits out Process call and Login events into two separate data streams:

lx_proc_create = get_event_subset(linux_events_all,'SYSCALL_EXECVE')
print(f'{len(lx_proc_create)} Process Create Events')

lx_login = (get_event_subset(linux_events_all, 'LOGIN')
        .merge(get_event_subset(linux_events_all, 'CRED_ACQ'),
            left_on=['old-ses', 'pid', 'uid'],
            right_on=['ses', 'pid', 'uid'],
            suffixes=('', '_cred')).drop(['old-ses','TenantId_cred',
                                            'Computer_cred'], axis=1)
        .dropna(axis=1, how='all'))
print(f'{len(lx_login)} Login Events')

You can also use the auditdextract module to extract raw text logs. See the module help for more information.