IoC Extraction

This class allows you to extract IoC patterns from a string or a DataFrame. Several patterns are built in to the class and you can override these or supply new ones.

# Imports
import sys
MIN_REQ_PYTHON = (3,6)
if sys.version_info < MIN_REQ_PYTHON:
    print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')
    print('or later is selected as the active kernel.')
    sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON)

from IPython.display import display, HTML
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', 100)
# Load test data
process_tree = pd.read_csv('data/process_tree.csv')
process_tree[['CommandLine']].head()
CommandLine
0 .\ftp -s:C:\RECYCLER\xxppyy.exe
1 .\reg not /domain:everything that /sid:shines is /krbtgt:golden !
2 cmd /c "systeminfo && systeminfo"
3 .\rundll32 /C 12345.exe
4 .\rundll32 /C c:\users\MSTICAdmin\12345.exe

Looking for IoC in a String

Just pass the string as a parameter to the extract() method.

Get a commandline from our data set.

# get a commandline from our data set
cmdline = process_tree['CommandLine'].loc[78]
cmdline
'netsh  start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\\Users\\user\\AppData\\Local\\Temp\\bzzzzzz.txt'

Instantiate an IoCExtract instance and pass the string to the extract() method.

# Instantiate an IoCExtract object
from msticpy.sectools import IoCExtract
ioc_extractor = IoCExtract()

# any IoCs in the string?
iocs_found = ioc_extractor.extract(cmdline)

if iocs_found:
    print('\nPotential IoCs found in alert process:')
    display(iocs_found)
Potential IoCs found in alert process:
defaultdict(set,
            {'ipv4': {'1.2.3.4'},
             'windows_path': {'C:\\Users\\user\\AppData\\Local\\Temp\\bzzzzzz.txt'}})

The following IoC patterns are searched for:

  • ipv4
  • ipv6
  • dns
  • url
  • windows_path
  • linux_path
  • md5_hash
  • sha1_hash
  • sha256_hash

Using a DataFrame as Input

You can use the data= parameter to IoCExtract.extract() to pass a DataFrame. Use the columns parameter to specify which column or columns that you want to search.

Note

When searching a DataFrame the following types are not included in the search by default windows_path and linux_path because of the likely high volume of results and number of false positive matches. You can include them by specifing include_paths=True as a parameter to extract().

You can also use the ioc_types parameter to explicitly list the ioc_types that you want to search for. This should be a list of strings of valid types. See ioc_types

ioc_extractor = IoCExtract()
ioc_df = ioc_extractor.extract(data=process_tree, columns=['CommandLine'])
if len(ioc_df):
    display(HTML("<h3>IoC patterns found in process tree.</h3>"))
    display(ioc_df)

IoC patterns found in process tree.

IoCType Observable SourceIndex
48 windows_path .\powershell 36
49 url http://somedomain/best-kitten-names-1.jpg' 37
53 windows_path .\pOWErS^H^ElL^.eX^e^ 37
58 md5_hash 81ed03caf6901e444c72ac67d192fb9c 44
59 url http://badguyserver/pwnme" 46
68 windows_path .\reg query add mscfile\\\\open 59
72 windows_path \system\CurrentControlSet\Control\Terminal 63
92 ipv4 1.2.3.4 78
108 ipv4 127.0.0.1 102
109 url http://127.0.0.1/ 102
110 windows_path \SOFTWARE\Microsoft\Windows NT\CurrentVersion\Svchost\MyNastySvcHostConfig 103

IoCExtractor API

See IoCExtract and See IoCExtract

Predefined Regex Patterns

from html import escape
extractor = IoCExtract()

for ioc_type, pattern in extractor.ioc_types.items():
    esc_pattern = escape(pattern.comp_regex.pattern)
    display(HTML(f'<b>{ioc_type}</b>'))
    display(HTML(f'<div style="margin-left:20px"><pre>{esc_pattern}</pre></div>)'))
IoCType Regex
ipv4
(?P<ipaddress>(?:[0-9]{1,3}\\.){3}[0-9]{1,3})
ipv6
(?<![:.\\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\\w])
dns
((?=[a-z0-9-]{1,63}\\.)[a-z0-9]+(-[a-z0-9]+)*\\.){2,}[a-z]{2,63}
url
(?P<protocol>(https?|ftp|telnet|ldap|file)://)
(?P<userinfo>([a-z0-9-._~!$&\\'()*+,;=:]|%[0-9A-F]{2})*@)?
(?P<host>([a-z0-9-._~!$&\\'()*+,;=]|%[0-9A-F]{2})*)
windows_path

(?P<root>[a-z]:|\\\\\\\\[a-z0-9_.$-]+||[.]+)
(?P<folder>\\\\(?:[^\\/:*?"\\\'<>|\\r\\n]+\\\\)*)
>
(?P<file>[^\\\\/*?""<>|\\r\\n ]+)
linux_path
(?P<root>/+||[.]+)
(?P<folder>/(?:[^\\\\/:*?<>|\\r\\n]+/)*)
(?P<file>[^/\\0<>|\\r\\n ]+)
md5_hash
(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{32})(?:$|[^A-Fa-f0-9])
sha1_hash
(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{40})(?:$|[^A-Fa-f0-9])
ipv6
(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{64})(?:$|[^A-Fa-f0-9])

Adding your own pattern(s)

See add_ioc_type

Add an IoC type and regular expression to use to the built-in set.

Warning

Adding an ioc_type that exists in the internal set will overwrite that item

Regular expressions are compiled with re.I | re.X | re.M (Ignore case, Verbose and MultiLine)

add_ioc_type parameters:

  • ioc_type{str} - a unique name for the IoC type
  • ioc_regex{str} - a regular expression used to search for the type
import re
rcomp = re.compile(r'(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)')
extractor.add_ioc_type(ioc_type='win_named_pipe', ioc_regex=r'(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)')

# Check that it added ok
print(extractor.ioc_types['win_named_pipe'])

# Use it in our data set
ioc_extractor.extract(data=process_tree, columns=['CommandLine']).query('IoCType == \'win_named_pipe\'')
IoCPattern(ioc_type='win_named_pipe', comp_regex=re.compile('(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)', re.IGNORECASE|re.MULTILINE|re.VERBOSE), priority=0)
IoCType Observable SourceIndex
116 win_named_pipe \\.\pipe\blahtest" 107

extract_df()

extract_df functions identically to extract with a data parameter. It may be more convenient to use this when you know that your input is a DataFrame

ioc_extractor.extract_df(process_tree, columns=['NewProcessName', 'CommandLine']).head(10)

Merging output with source data

The SourceIndex column allows you to merge the results with the input DataFrame Where an input row has multiple IoC matches the output of this merge will result in duplicate rows from the input (one per IoC match). The previous index is preserved in the second column (and in the SourceIndex column).

Note: you will need to set the type of the SourceIndex column. In the example below case we are matching with the default numeric index so we force the type to be numeric. In cases where you are using an index of a different dtype you will need to convert the SourceIndex (dtype=object) to match the type of your index column.

input_df = data=process_tree.head(20)
output_df = ioc_extractor.extract(data=input_df, columns=['NewProcessName', 'CommandLine'])
# set the type of the SourceIndex column. In this case we are matching with the default numeric index.
output_df['SourceIndex'] = pd.to_numeric(output_df['SourceIndex'])
merged_df = pd.merge(left=input_df, right=output_df, how='outer', left_index=True, right_on='SourceIndex')
merged_df.head()
TenantId Account EventID TimeGenerated Computer SubjectUserSid SubjectUserName SubjectDomainName SubjectLogonId NewProcessId NewProcessName TokenElevationType ProcessId CommandLine ParentProcessName TargetLogonId SourceComputerId TimeCreatedUtc NodeRole Level ProcessId1 NewProcessId1 IoCType Observable SourceIndex
0 802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin 4688 2019-01-15 05:15:15.677 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x1580 C:DiagnosticsUserTmpftp.exe %%1936 0xbc8 .ftp -s:C:RECYCLERxxppyy.exe C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:15.677 source 0 nan nan nan nan 0
1 802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin 4688 2019-01-15 05:15:16.167 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x16fc C:DiagnosticsUserTmpreg.exe %%1936 0xbc8 .reg not /domain:everything that /sid:shines is /krbtgt:golden ! C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:16.167 sibling 1 nan nan nan nan 1
2 802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin 4688 2019-01-15 05:15:16.277 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x1700 C:DiagnosticsUserTmpcmd.exe %%1936 0xbc8 cmd /c “systeminfo && systeminfo” C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:16.277 sibling 1 nan nan nan nan 2
3 802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin 4688 2019-01-15 05:15:16.340 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x1728 C:DiagnosticsUserTmprundll32.exe %%1936 0xbc8 .rundll32 /C 12345.exe C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:16.340 sibling 1 nan nan nan nan 3
4 802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin 4688 2019-01-15 05:15:16.400 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x175c C:DiagnosticsUserTmprundll32.exe %%1936 0xbc8 .rundll32 /C c:usersMSTICAdmin12345.exe C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:16.400 sibling 1 nan nan nan nan 4

IPython magic

You can use the line magic %ioc or cell magic %%ioc to extract IoCs from text pasted directly into a cell

The ioc magic supports the following options:

--out OUT, -o OUT
    The variable to return the results in the variable `OUT`
    Note: the output variable is a dictionary iocs grouped by IoC Type
--ioc_types IOC_TYPES, -i IOC_TYPES
    The types of IoC to search for (comma-separated string)
%%ioc --out ioc_capture
netsh  start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\Users\user\AppData\Local\Temp\bzzzzzz.txt
hostname    customers-service.ddns.net              Feb 5, 2020, 2:20:35 PM         7
URL https://two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=password         Feb 5, 2020, 2:20:35 PM         1
hostname    mobile.phonechallenges-submit.site              Feb 5, 2020, 2:20:35 PM         8
hostname    youtube.service-activity-checkup.site           Feb 5, 2020, 2:20:35 PM         8
hostname    www.drive-accounts.com          Feb 5, 2020, 2:20:35 PM         7
hostname    google.drive-accounts.com               Feb 5, 2020, 2:20:35 PM         7
domain      niaconucil.org          Feb 5, 2020, 2:20:35 PM         11
domain      isis-online.net         Feb 5, 2020, 2:20:35 PM         11
domain      bahaius.info            Feb 5, 2020, 2:20:35 PM         11
domain      w3-schools.org          Feb 5, 2020, 2:20:35 PM         12
domain      system-services.site            Feb 5, 2020, 2:20:35 PM         11
domain      accounts-drive.com              Feb 5, 2020, 2:20:35 PM         8
domain      drive-accounts.com              Feb 5, 2020, 2:20:35 PM         10
domain      service-issues.site             Feb 5, 2020, 2:20:35 PM         8
domain      two-step-checkup.site           Feb 5, 2020, 2:20:35 PM         8
domain      customers-activities.site               Feb 5, 2020, 2:20:35 PM         11
domain      seisolarpros.org                Feb 5, 2020, 2:20:35 PM         11
domain      yah00.site              Feb 5, 2020, 2:20:35 PM         4
domain      skynevvs.com            Feb 5, 2020, 2:20:35 PM         11
domain      recovery-options.site           Feb 5, 2020, 2:20:35 PM         4
domain      malcolmrifkind.site             Feb 5, 2020, 2:20:35 PM         8
domain      instagram-com.site              Feb 5, 2020, 2:20:35 PM         8
domain      leslettrespersanes.net          Feb 5, 2020, 2:20:35 PM         11
domain      software-updating-managers.site         Feb 5, 2020, 2:20:35 PM         8
domain      cpanel-services.site            Feb 5, 2020, 2:20:35 PM         8
domain      service-activity-checkup.site           Feb 5, 2020, 2:20:35 PM         7
domain      inztaqram.ga            Feb 5, 2020, 2:20:35 PM         8
domain      unirsd.com              Feb 5, 2020, 2:20:35 PM         8
domain      phonechallenges-submit.site             Feb 5, 2020, 2:20:35 PM         7
domain      acconut-verify.com              Feb 5, 2020, 2:20:35 PM         11
domain      finance-usbnc.info              Feb 5, 2020, 2:20:35 PM         8
FileHash-MD5        542128ab98bda5ea139b169200a50bce                Feb 5, 2020, 2:20:35 PM         3
FileHash-MD5        3d67ce57aab4f7f917cf87c724ed7dab                Feb 5, 2020, 2:20:35 PM         3
hostname    x09live-ix3b.account-profile-users.info         Feb 6, 2020, 2:56:07 PM         0
hostname    www.phonechallenges-submit.site         Feb 6, 2020, 2:56:07 PM
[('ipv4', ['1.2.3.4']),
 ('dns',
  ['malcolmrifkind.site',
   'w3-schools.org',
   'niaconucil.org',
   'software-updating-managers.site',
   'isis-online.net',
   'accounts-drive.com',
   'cpanel-services.site',
   'service-activity-checkup.site',
   'service-issues.site',
   'recovery-options.site',
   'instagram-com.site',
   'mobile.phonechallenges-submit.site',
   'youtube.service-activity-checkup.site',
   'google.drive-accounts.com',
   'phonechallenges-submit.site',
   'drive-accounts.com',
   'www.phonechallenges-submit.site',
   'yah00.site',
   'seisolarpros.org',
   'customers-activities.site',
   'bahaius.info',
   'system-services.site',
   'two-step-checkup.site',
   'x09live-ix3b.account-profile-users.info',
   'customers-service.ddns.net',
   'leslettrespersanes.net',
   'www.drive-accounts.com',
   'acconut-verify.com',
   'finance-usbnc.info',
   'unirsd.com',
   'skynevvs.com',
   'inztaqram.ga']),
 ('url',
  ['https://two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=password']),
 ('windows_path', ['C:\Users\user\AppData\Local\Temp\bzzzzzz.txt']),
 ('linux_path',
  ['//two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=passwordttFeb']),
 ('md5_hash',
  ['3d67ce57aab4f7f917cf87c724ed7dab', '542128ab98bda5ea139b169200a50bce'])]
%%ioc --ioc_types "ipv4, ipv6, linux_path, md5_hash"
netsh  start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\Users\user\AppData\Local\Temp\bzzzzzz.txt
tracefile2=/usr/localbzzzzzz.sh
hostname    customers-service.ddns.net              Feb 5, 2020, 2:20:35 PM         7
URL https://two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=password         Feb 5, 2020, 2:20:35 PM         1
hostname    mobile.phonechallenges-submit.site              Feb 5, 2020, 2:20:35 PM         8
hostname    youtube.service-activity-checkup.site           Feb 5, 2020, 2:20:35 PM         8
hostname    www.drive-accounts.com          Feb 5, 2020, 2:20:35 PM         7
hostname    google.drive-accounts.com               Feb 5, 2020, 2:20:35 PM         7
domain      niaconucil.org          Feb 5, 2020, 2:20:35 PM         11
domain      isis-online.net         Feb 5, 2020, 2:20:35 PM         11
domain      bahaius.info            Feb 5, 2020, 2:20:35 PM         11
domain      w3-schools.org          Feb 5, 2020, 2:20:35 PM         12
domain      system-services.site            Feb 5, 2020, 2:20:35 PM         11
domain      accounts-drive.com              Feb 5, 2020, 2:20:35 PM         8
domain      drive-accounts.com              Feb 5, 2020, 2:20:35 PM         10
domain      service-issues.site             Feb 5, 2020, 2:20:35 PM         8
domain      two-step-checkup.site           Feb 5, 2020, 2:20:35 PM         8
domain      customers-activities.site               Feb 5, 2020, 2:20:35 PM         11
domain      seisolarpros.org                Feb 5, 2020, 2:20:35 PM         11
domain      yah00.site              Feb 5, 2020, 2:20:35 PM         4
domain      skynevvs.com            Feb 5, 2020, 2:20:35 PM         11
domain      recovery-options.site           Feb 5, 2020, 2:20:35 PM         4
domain      malcolmrifkind.site             Feb 5, 2020, 2:20:35 PM         8
domain      instagram-com.site              Feb 5, 2020, 2:20:35 PM         8
domain      leslettrespersanes.net          Feb 5, 2020, 2:20:35 PM         11
domain      software-updating-managers.site         Feb 5, 2020, 2:20:35 PM         8
domain      cpanel-services.site            Feb 5, 2020, 2:20:35 PM         8
domain      service-activity-checkup.site           Feb 5, 2020, 2:20:35 PM         7
domain      inztaqram.ga            Feb 5, 2020, 2:20:35 PM         8
domain      unirsd.com              Feb 5, 2020, 2:20:35 PM         8
domain      phonechallenges-submit.site             Feb 5, 2020, 2:20:35 PM         7
domain      acconut-verify.com              Feb 5, 2020, 2:20:35 PM         11
domain      finance-usbnc.info              Feb 5, 2020, 2:20:35 PM         8
FileHash-MD5        542128ab98bda5ea139b169200a50bce                Feb 5, 2020, 2:20:35 PM         3
FileHash-MD5        3d67ce57aab4f7f917cf87c724ed7dab                Feb 5, 2020, 2:20:35 PM         3
hostname    x09live-ix3b.account-profile-users.info         Feb 6, 2020, 2:56:07 PM         0
hostname    www.phonechallenges-submit.site         Feb 6, 2020, 2:56:07 PM
[('ipv4', ['1.2.3.4']),
 ('linux_path',
  ['//two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=passwordttFeb',
   '/usr/localbzzzzzz.sh']),
 ('md5_hash',
  ['3d67ce57aab4f7f917cf87c724ed7dab', '542128ab98bda5ea139b169200a50bce'])]

Pandas Extension

The decoding functionality is also available in a pandas extension mp_ioc. This supports a single method extract().

This supports the same syntax as extract (described earlier).

process_tree.mp_ioc.extract(columns=['CommandLine'])
IoCType Observable SourceIndex
0 dns microsoft.com 24
1 url http://server/file.sct 31
2 dns server 31
3 dns evil.ps 35
4 url http://somedomain/best-kitten-names-1.jpg' 37
5 dns somedomain 37
6 dns blah.ps 40
7 md5_hash aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 40
8 dns blah.ps 41
9 md5_hash aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 41
10 md5_hash 81ed03caf6901e444c72ac67d192fb9c 44
11 url http://badguyserver/pwnme 46
12 dns badguyserver 46
13 url http://badguyserver/pwnme 47
14 dns badguyserver 47
15 dns Invoke-Shellcode.ps 48
16 dns Invoke-ReverseDnsLookup.ps 49
17 dns Wscript.Shell 67
18 url http://system.management.automation.amsiutils').getfield('amsiinitfailed','nonpublic,static').se... 77
19 dns system.management.automation.amsiutils').getfield('amsiinitfailed','nonpublic,static').setvalue(... 77
20 ipv4 1.2.3.4 78
21 dns wscript.shell 81
22 dns abc.com 90
23 ipv4 127.0.0.1 102
24 url http://127.0.0.1/ 102
25 win_named_pipe \\.\pipe\blahtest" 107