[Security Assistant] Updates ESQL example queries used in ESQL Query Generation by spong · Pull Request #188492 · elastic/kibana

spong · 2024-07-16T22:13:54Z

Summary

This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here.

Note

Since we're enabling the new kbDataClient with #188168 for 8.15, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to 8.15.

…assistantKnowledgeBaseByDefault feature flag

github-actions · 2024-07-16T22:14:10Z

A documentation preview will be available soon.

🔨 Buildkite builds
📚 HTML diff
📙 Preview page

Request a new doc build by commenting

Rebuild this PR: run docs-build
Rebuild this PR and all Elastic docs: run docs-build rebuild

_{run docs-build is much faster than run docs-build rebuild. A rebuild should only be needed in rare situations.}

_{If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here.}

elasticmachine · 2024-07-16T23:04:52Z

⏳ Build in-progress

cc @spong

elasticmachine · 2024-07-17T17:02:46Z

⏳ Build in-progress

History

💚 Build #222136 succeeded fbaa71d
💚 Build #221933 succeeded 7040a90

cc @spong

patrykkopycinski

Thank you @spong 🙇

YulNaumenko

LGTM

@jamesspi

…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </details> (cherry picked from commit 6137f81)

kibanamachine · 2024-07-17T19:31:50Z

💔 All backports failed

Status	Branch	Result
❌	8.15	Could not create pull request: Validation Failed: {"resource":"Issue","code":"custom","field":"body","message":"body is too long (maximum is 65536 characters)"}

Manual backport

To create the backport manually run:

node scripts/backport --pr 188492

Questions ?

Please refer to the Backport tool documentation

@jamesspi

…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </details> (cherry picked from commit 6137f81)

@jamesspi

…Generation (elastic#188492) ## Summary This PR updates the pre-packaged ESQL examples used by the ESQL Query Generation tool as provided by @jamesspi. The number of examples have stayed the same, as have the file names -- so I've only updated the raw content here. > [!NOTE] > Since we're enabling the new `kbDataClient` with elastic#188168 for `8.15`, there is no need for a delete/re-install for pre-existing deployments to use these new example queries, as the Knowledge Base will be rebuilt on an upgrade to `8.15`. Token length changes as calculated using the [GPT-4 Tokenizer](https://platform.openai.com/tokenizer): <details><summary>Existing Example Queries / Tokens: 1,108 / Characters: 4151</summary> ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | grok dns.question.name "%{DATA}\\.%{GREEDYDATA:dns.question.registered_domain:string}" | stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name | where unique_queries > 5 | sort unique_queries desc ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.code is not null | stats event_code_count = count(event.code) by event.code,host.name | enrich win_events on event.code with EVENT_DESCRIPTION | where EVENT_DESCRIPTION is not null and host.name is not null | rename EVENT_DESCRIPTION as event.description | sort event_code_count desc | keep event_code_count,event.code,host.name,event.description ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where event.category == "file" and event.action == "creation" | stats filecount = count(file.name) by process.name,host.name | dissect process.name "%{process}.%{extension}" | eval proclength = length(process.name) | where proclength > 10 | sort filecount,proclength desc | limit 10 | keep host.name,process.name,filecount,process,extension,fullproc,proclength ``` [[esql-example-queries]] The following is an example ES|QL query: ``` from logs-* | where process.name == "curl.exe" | stats bytes = sum(destination.bytes) by destination.address | eval kb = bytes/1024 | sort kb desc | limit 10 | keep kb,destination.address ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metrics-apm* | WHERE metricset.name == "transaction" AND metricset.interval == "1m" | EVAL bucket = AUTO_BUCKET(transaction.duration.histogram, 50, <start-date>, <end-date>) | STATS avg_duration = AVG(transaction.duration.histogram) BY bucket ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM packetbeat-* | STATS doc_count = COUNT(destination.domain) BY destination.domain | SORT doc_count DESC | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM employees | EVAL hire_date_formatted = DATE_FORMAT(hire_date, "MMMM yyyy") | SORT hire_date | KEEP emp_no, hire_date_formatted | LIMIT 5 ``` [[esql-example-queries]] The following is NOT an example of an ES|QL query: ``` Pagination is not supported ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM logs-* | WHERE @timestamp >= NOW() - 15 minutes | EVAL bucket = DATE_TRUNC(1 minute, @timestamp) | STATS avg_cpu = AVG(system.cpu.total.norm.pct) BY bucket, host.name | LIMIT 10 ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM traces-apm* | WHERE @timestamp >= NOW() - 24 hours | EVAL successful = CASE(event.outcome == "success", 1, 0), failed = CASE(event.outcome == "failure", 1, 0) | STATS success_rate = AVG(successful), avg_duration = AVG(transaction.duration), total_requests = COUNT(transaction.id) BY service.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM metricbeat* | EVAL cpu_pct_normalized = (system.cpu.user.pct + system.cpu.system.pct) / system.cpu.cores | STATS AVG(cpu_pct_normalized) BY host.name ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM postgres-logs | DISSECT message "%{} duration: %{query_duration} ms" | EVAL query_duration_num = TO_DOUBLE(query_duration) | STATS avg_duration = AVG(query_duration_num) ``` [[esql-example-queries]] The following is an example ES|QL query: ``` FROM nyc_taxis | WHERE DATE_EXTRACT(drop_off_time, "hour") >= 6 AND DATE_EXTRACT(drop_off_time, "hour") < 10 | LIMIT 10 ``` ``` </details> <details><summary>8.15 Example Queries / Tokens: 4,847 / Characters:16671</summary> ``` // 1. regex to extract from dns.question.registered_domain // Helpful when asking how to use GROK to extract values via REGEX from logs-* | where dns.question.name like "?*" | grok dns.question.name """(?<dns_registered_domain>[a-zA-Z0-9]+\.[a-z-A-Z]{2,3}$)""" | keep dns_registered_domain | limit 10 // 2. hunting scheduled task with suspicious actions via registry.data.bytes // Helpful when answering questions on regex based searches and replacements (RLIKE and REPLACE), base64 conversions, and dealing with case sensitivity from logs-* | where host.os.type == "windows" and event.category == "registry" and event.action == "modification" and registry.path like """HKLM\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\Tasks\\*Actions*""" | eval scheduled_task_action = replace(TO_LOWER(FROM_BASE64(registry.data.bytes)), """\u0000""", "") | eval scheduled_task_action = replace(scheduled_task_action, """(\u0003\fauthorfff|\u0003\fauthorff\u000e)""", "") | where scheduled_task_action rlike """.*(users\\public\\|\\appdata\\roaming|programdata|powershell|rundll32|regsvr32|mshta.exe|cscript.exe|wscript.exe|cmd.exe|forfiles|msiexec).*""" and not scheduled_task_action like "localsystem*" | keep scheduled_task_action, registry.path, agent.id | stats count_agents = count_distinct(agent.id) by scheduled_task_action | where count_agents == 1 // 3. suspicious powershell cmds from base64 encoded cmdline // Helpful when answering questions on regex based searches and replacements, base64 conversions, and dealing with case sensitivity (TO_LOWER and TO_UPPER commands) from logs-* | where host.os.type == "windows" and event.category == "process" and event.action == "start" and TO_LOWER(process.name) == "powershell.exe" and process.command_line rlike ".+ -(e|E).*" | keep agent.id, process.command_line | grok process.command_line """(?<base64_data>([A-Za-z0-9+/]+={1,2}$|[A-Za-z0-9+/]{100,}))""" | where base64_data is not null | eval decoded_base64_cmdline = replace(TO_LOWER(FROM_BASE64(base64_data)), """\u0000""", "") | where decoded_base64_cmdline rlike """.*(http|webclient|download|mppreference|sockets|bxor|.replace|reflection|assembly|load|bits|start-proc|iwr|frombase64).*""" | keep agent.id, process.command_line, decoded_base64_cmdline //4. Detect masquerading attempts as native Windows binaries //MITRE Tactics: "Defense Evasion" from logs-* | where event.type == "start" and event.action == "start" and host.os.name == "Windows" and not starts_with(process.executable, "C:\\Program Files\\WindowsApps\\") and not starts_with(process.executable, "C:\\Windows\\System32\\DriverStore\\") and process.name != "setup.exe" | keep process.name.caseless, process.executable.caseless, process.code_signature.subject_name, process.code_signature.trusted, process.code_signature.exists, host.id | eval system_bin = case(starts_with(process.executable.caseless, "c:\\windows\\system32") and starts_with(process.code_signature.subject_name, "Microsoft") and process.code_signature.trusted == true, process.name.caseless, null), non_system_bin = case(process.code_signature.exists == false or process.code_signature.trusted != true or not starts_with(process.code_signature.subject_name, "Microsoft"), process.name.caseless, null) | stats count_system_bin = count(system_bin), count_non_system_bin = count(non_system_bin) by process.name.caseless, host.id | where count_system_bin >= 1 and count_non_system_bin >= 1 //5. Detect DLL Hijack via Masquerading as Microsoft Native Libraries // Helpful when asking how to use ENRICH query results with enrich policies from logs-* | where host.os.family == "windows" and event.action == "load" and process.code_signature.status == "trusted" and dll.code_signature.status != "trusted" and not dll.path rlike """[c-fC-F]:\\(Windows|windows|WINDOWS)\\(System32|SysWOW64|system32|syswow64)\\[a-zA-Z0-9_]+.dll""" | keep dll.name, dll.path, dll.hash.sha256, process.executable, host.id | ENRICH libs-policy-defend | where native == "yes" and not starts_with(dll.path, "C:\\Windows\\assembly\\NativeImages") | eval process_path = replace(process.executable, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", ""), dll_path = replace(dll.path, """([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}|ns[a-z][A-Z0-9]{3,4}\.tmp|DX[A-Z0-9]{3,4}\.tmp|7z[A-Z0-9]{3,5}\.tmp|[0-9\.\-\_]{3,})""", "") | stats host_count = count_distinct(host.id) by dll.name, dll_path, process_path, dll.hash.sha256 | sort host_count asc //6. Potential Exfiltration by process total egress bytes // Helpful when asking how to filter/search on IP address (CIDR_MATCH) fields and aggregating/grouping //MITRE Tactics: "Command and Control", "Exfiltration" from logs-* | where host.os.family == "windows" and event.category == "network" and event.action == "disconnect_received" and not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.executable, process.entity_id | stats total_bytes_out = sum(source.bytes) by process.entity_id, destination.address, process.executable /* more than 1GB out by same process.pid in 8 hours */ | where total_bytes_out >= 1073741824 //7. Windows logon activity by source IP // Helpful when answering questions about the CASE command (as well as conditional outputs/if statements) //MITRE Tactics: "Credential Access" from logs-* | where host.os.family == "windows" and event.category == "authentication" and event.action in ("logon-failed", "logged-in") and winlog.logon.type == "Network" and source.ip is not null and /* noisy failure status codes often associated to authentication misconfiguration */ not (event.action == "logon-failed" and winlog.event_data.Status in ("0xC000015B", "0XC000005E", "0XC0000133", "0XC0000192")) | eval failed = case(event.action == "logon-failed", source.ip, null), success = case(event.action == "logged-in", source.ip, null) | stats count_failed = count(failed), count_success = count(success), count_user = count_distinct(winlog.event_data.TargetUserName) by source.ip /* below threshold should be adjusted to your env logon patterns */ | where count_failed >= 100 and count_success <= 10 and count_user >= 20 //8. High count of network connection over extended period by process //Helpful when answering questions about IP searches/filters, field converstions(to_double, to_int), and running multiple aggregations //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "windows" and event.category == "network" and network.direction == "egress" and (process.executable like "C:\\\\Windows\\\\System32*" or process.executable like "C:\\\\Windows\\\\SysWOW64\\\\*") and not user.id in ("S-1-5-19", "S-1-5-20") and /* multiple Windows svchost services perform long term connection to MS ASN, can be covered in a dedicated hunt */ not (process.name == "svchost.exe" and user.id == "S-1-5-18") and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 1 and number_of_con_per_hour >= 120 //9. Persistence via Suspicious Launch Agent or Launch Daemon with low occurrence //Helpful when answering questions on concatenating fields, dealing with time based searches //MITRE Tactics: "Persistence" from logs-* | where @timestamp > now() - 7 day | where host.os.family == "macos" and event.category == "file" and event.action == "launch_daemon" and (Persistence.runatload == true or Persistence.keepalive == true) and process.executable is not null | eval args = MV_CONCAT(Persistence.args, ",") /* normalizing users home profile */ | eval args = replace(args, """/Users/[a-zA-Z0-9ñ\.\-\_\$~ ]+/""", "/Users/user/") | stats agents = count_distinct(host.id), total = count(*) by process.name, Persistence.name, args | where starts_with(args, "/") and agents == 1 and total == 1 //10. Suspicious Network Connections by unsigned macO //Helpful when answering questions on IP filtering, calculating the time difference between timestamps, aggregations, and field conversions //MITRE Tactics: "Command and Control" from logs-* | where host.os.family == "macos" and event.category == "network" and (process.code_signature.exists == false or process.code_signature.trusted != true) and /* excluding private IP ranges */ not CIDR_MATCH(destination.ip, "10.0.0.0/8", "127.0.0.0/8", "169.254.0.0/16", "172.16.0.0/12", "192.0.0.0/24", "192.0.0.0/29", "192.0.0.8/32", "192.0.0.9/32", "192.0.0.10/32", "192.0.0.170/32", "192.0.0.171/32", "192.0.2.0/24", "192.31.196.0/24", "192.52.193.0/24", "192.168.0.0/16", "192.88.99.0/24", "224.0.0.0/4", "100.64.0.0/10", "192.175.48.0/24","198.18.0.0/15", "198.51.100.0/24", "203.0.113.0/24", "240.0.0.0/4", "::1","FE80::/10", "FF00::/8") | keep source.bytes, destination.address, process.name, process.entity_id, @timestamp /* calc total duration , total MB out and the number of connections per hour */ | stats total_bytes_out = sum(source.bytes), count_connections = count(*), start_time = min(@timestamp), end_time = max(@timestamp) by process.entity_id, destination.address, process.name | eval dur = TO_DOUBLE(end_time)-TO_DOUBLE(start_time), duration_hours=TO_INT(dur/3600000), MB_out=TO_DOUBLE(total_bytes_out) / (1024*1024), number_of_con_per_hour = (count_connections / duration_hours) | keep process.entity_id, process.name, duration_hours, destination.address, MB_out, count_connections, number_of_con_per_hour /* threshold is set to 120 connections per minute , you can adjust it to your env/FP rate */ | where duration_hours >= 8 and number_of_con_per_hour >= 120 //11. Unusual file creations by web server user //Helpful when answering questions on using the LIKE command (wildcard searches) and aggregations FROM logs-* | WHERE @timestamp > NOW() - 50 day | WHERE host.os.type == "linux" and event.type == "creation" and user.name in ("www-data", "apache", "nginx", "httpd", "tomcat", "lighttpd", "glassfish", "weblogic") and ( file.path like "/var/www/*" or file.path like "/var/tmp/*" or file.path like "/tmp/*" or file.path like "/dev/shm/*" ) | STATS file_count = COUNT(file.path), host_count = COUNT(host.name) by file.path, host.name, process.name, user.name // Alter this threshold to make sense for your environment | WHERE file_count <= 5 | SORT file_count asc | LIMIT 100 //12. Segmentation Fault & Potential Buffer Overflow Hunting //Helpful when answering questions on extractions with GROK FROM logs-* | WHERE host.os.type == "linux" and process.name == "kernel" and message like "*segfault*" | GROK message "\\[%{NUMBER:timestamp}\\] %{WORD:process}\\[%{NUMBER:pid}\\]: segfault at %{BASE16NUM:segfault_address} ip %{BASE16NUM:instruction_pointer} sp %{BASE16NUM:stack_pointer} error %{NUMBER:error_code} in %{DATA:so_file}\\[%{BASE16NUM:so_base_address}\\+%{BASE16NUM:so_offset}\\]" | KEEP timestamp, process, pid, so_file, segfault_address, instruction_pointer, stack_pointer, error_code, so_base_address, so_offset //13. Persistence via Systemd (timers) //Helpful when answering questions on using the CASE command (conditional statements), searching lists using the IN command, wildcard searches with the LIKE command and aggregations FROM logs-* | WHERE host.os.type == "linux" and event.type in ("creation", "change") and ( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*" ) and not ( process.name in ( "dpkg", "dockerd", "yum", "dnf", "snapd", "pacman", "pamac-daemon", "netplan", "systemd", "generate" ) or process.executable == "/proc/self/exe" or process.executable like "/dev/fd/*" or file.extension in ("dpkg-remove", "swx", "swp") ) | EVAL persistence = CASE( // System-wide/user-specific services/timers (root permissions required) file.path like "/run/systemd/system/*" or file.path like "/etc/systemd/system/*" or file.path like "/etc/systemd/user/*" or file.path like "/usr/local/lib/systemd/system/*" or file.path like "/lib/systemd/system/*" or file.path like "/usr/lib/systemd/system/*" or file.path like "/usr/lib/systemd/user/*" or // user-specific services/timers (user permissions required) file.path like "/home/*/.config/systemd/user/*" or file.path like "/home/*/.local/share/systemd/user/*" or // System-wide generators (root permissions required) file.path like "/etc/systemd/system-generators/*" or file.path like "/usr/local/lib/systemd/system-generators/*" or file.path like "/lib/systemd/system-generators/*" or file.path like "/etc/systemd/user-generators/*" or file.path like "/usr/local/lib/systemd/user-generators/*" or file.path like "/usr/lib/systemd/user-generators/*", process.name, null ) | STATS cc = COUNT(*), pers_count = COUNT(persistence), agent_count = COUNT(agent.id) by process.executable, file.path, host.name, user.name | WHERE pers_count > 0 and pers_count <= 20 and agent_count <= 3 | SORT cc asc | LIMIT 100 //14. Low Frequency AWS EC2 Admin Password Retrieval Attempts from Unusual ARNs //Helpful when answering questions on extracting fields with the dissect command and aggregations. Also an example for hunting for cloud threats from logs-* | where event.provider == "ec2.amazonaws.com" and event.action == "GetPasswordData" and aws.cloudtrail.error_code == "Client.UnauthorizedOperation" and aws.cloudtrail.user_identity.type == "AssumedRole" | dissect aws.cloudtrail.request_parameters "{%{key}=%{instance_id}}" | dissect aws.cloudtrail.user_identity.session_context.session_issuer.arn "%{?keyword1}:%{?keyword2}:%{?keyword3}::%{account_id}:%{keyword4}/%{arn_name}" | dissect user.id "%{principal_id}:%{session_name}" | keep aws.cloudtrail.user_identity.session_context.session_issuer.principal_id, instance_id, account_id, arn_name, source.ip, principal_id, session_name, user.name | stats instance_counts = count_distinct(arn_name) by instance_id, user.name, source.ip, session_name | where instance_counts < 5 | sort instance_counts desc ``` </details> (cherry picked from commit 6137f81)

spong · 2024-07-17T20:54:54Z

Manual backport to 8.15 here: #188599

…Generation (#188599) ## Summary Manual backport of #188492, as I accidentally performed some 🪄 ✨ voodoo to the PR/commit description and both the Kibanamachine bot and manual backports were unsuccessful.

kibanamachine · 2024-07-19T21:49:03Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 188492 locally

kibanamachine · 2024-07-23T21:48:47Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add the label auto-backport or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 188492 locally

spong · 2024-07-24T16:51:11Z

Manually backported and merged into 8.15 in #188599. Removing backport missing and 8.15 labels to prevent further failed automation checks.

spong added 2 commits July 16, 2024 15:51

Updates esql example queries

0da7b08

Increasing ci:cloud-deploy ml node for ELSER deployment, and enables …

3f1dd2e

…assistantKnowledgeBaseByDefault feature flag

spong added release_note:enhancement Feature:Security Assistant Security Assistant Team:Security Generative AI Security Generative AI v8.15.0 v8.16.0 labels Jul 16, 2024

spong self-assigned this Jul 16, 2024

spong requested review from a team as code owners July 16, 2024 22:13

spong added the ci:cloud-deploy Create or update a Cloud deployment label Jul 16, 2024

Merge branch 'main' into default-esql-examples

7040a90

spong added 2 commits July 17, 2024 09:27

Revert feature flag

1b047e1

Revert ML node memory increase

fbaa71d

spong added ci:cloud-redeploy Always create a new Cloud deployment ci:cloud-persist-deployment Persist cloud deployment indefinitely and removed ci:cloud-deploy Create or update a Cloud deployment labels Jul 17, 2024

Merge branch 'main' into default-esql-examples

06e154a

spong removed ci:cloud-redeploy Always create a new Cloud deployment ci:cloud-persist-deployment Persist cloud deployment indefinitely labels Jul 17, 2024

spong removed the request for review from a team July 17, 2024 18:27

patrykkopycinski approved these changes Jul 17, 2024

View reviewed changes

YulNaumenko approved these changes Jul 17, 2024

View reviewed changes

spong merged commit 6137f81 into elastic:main Jul 17, 2024

spong deleted the default-esql-examples branch July 17, 2024 19:25

spong mentioned this pull request Jul 17, 2024

[8.15] [Security Assistant] Updates ESQL example queries used in ESQL Query Generation #188599

Merged

kibanamachine mentioned this pull request Jul 17, 2024

Remove API doc workflow, add codeowners #188169

Merged

kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jul 19, 2024

spong removed backport missing Added to PRs automatically when the are determined to be missing a backport. v8.15.0 labels Jul 24, 2024

kibanamachine added backport:skip This PR does not require backporting labels Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Assistant] Updates ESQL example queries used in ESQL Query Generation#188492

[Security Assistant] Updates ESQL example queries used in ESQL Query Generation#188492
spong merged 6 commits intoelastic:mainfrom
spong:default-esql-examples

spong commented Jul 16, 2024 •

edited by mistic

Loading

Uh oh!

github-actions Bot commented Jul 16, 2024

Uh oh!

elasticmachine commented Jul 16, 2024

Uh oh!

elasticmachine commented Jul 17, 2024

Uh oh!

patrykkopycinski left a comment

Uh oh!

YulNaumenko left a comment

Uh oh!

kibanamachine commented Jul 17, 2024

Uh oh!

spong commented Jul 17, 2024

Uh oh!

kibanamachine commented Jul 19, 2024

Uh oh!

kibanamachine commented Jul 23, 2024

Uh oh!

spong commented Jul 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

spong commented Jul 16, 2024 • edited by mistic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented Jul 16, 2024

Uh oh!

elasticmachine commented Jul 16, 2024

⏳ Build in-progress

Uh oh!

elasticmachine commented Jul 17, 2024

⏳ Build in-progress

History

Uh oh!

patrykkopycinski left a comment

Choose a reason for hiding this comment

Uh oh!

YulNaumenko left a comment

Choose a reason for hiding this comment

Uh oh!

kibanamachine commented Jul 17, 2024

💔 All backports failed

Manual backport

Questions ?

Uh oh!

spong commented Jul 17, 2024

Uh oh!

kibanamachine commented Jul 19, 2024

Uh oh!

kibanamachine commented Jul 23, 2024

Uh oh!

spong commented Jul 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

spong commented Jul 16, 2024 •

edited by mistic

Loading