Nginx has become one of the most popular web servers today, powering over 30% of all websites on the internet. As a high-performance server, Nginx produces detailed access logs containing valuable information about all client requests received by your websites and applications. Parsing and analyzing these access logs on a regular basis is key to monitoring the health of your web servers, identifying issues proactively and improving performance.

In this comprehensive guide, we will understand what Nginx access logs contain, why parsing them is important and look at effective tools and techniques to parse, analyze and generate insights from access logs.

Understanding Nginx Access Logs

Nginx access logs record all requests handled by your Nginx web server, saved to a log file in real-time. By default, the access logs are found at /var/log/nginx/access.log. Below is an example log entry in the default combined format:

127.0.0.1 - - [28/Feb/2023:11:15:38 +0530] "GET /index.html HTTP/1.1" 200 247 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"

Let‘s break down what each field here means:

  • 127.0.0.1 – The client IP address
      • The RFC 1413 identity of client
      • User ID from authentication
  • [28/Feb/2023:11:15:38 +0530] – Date, time & timezone of request
  • "GET /index.html HTTP/1.1" – The request line from client
  • 200 – The HTTP status code returned to client
  • 247 – Size of response in bytes sent to client
  • "-" – Referrer request header
  • "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" – User agent request header

Other Common Log Formats

Nginx supports additional log formats like:

  • JSON – For structured ingestion into log management tools
  • Upstream – Captures upstream request time in proxy model
  • Elasticsearch – Custom variables to feed into ELK stack
  • GRPC – Metadata from gRPC services

Choose log formats wisely based on your downstream analysis needs.

Log File Management

Due to the verbose nature of access logs, the log files can grow very quickly depending on traffic. Best practices include:

  • Log file rotation to prevent huge files. Daily/hourly log chunks.
  • Log compression using gzip or zip. Saves storage space.
  • Only retain last 30-90 days of access logs based on use cases. Archive older logs.

Why Parse and Analyze Access Logs?

Instead of simply logging this valuable data, parsing and analyzing access logs helps unlock many benefits like:

  • Monitor overall traffic – Volume trends, top pages, browsers etc.
  • Application performance – Errors, response times, latency issues etc.
  • User behavior analysis – Most visited pages, usage trends etc.
  • Security auditing – Anomalies, malicious requests etc.
  • SEO optimization – Crawl stats, referrers data for better rankings
  • Compliance – Records mandated by regulatory standards

Sample Nginx Configuration for Access Logs

Key parts of Nginx configuration related to enabling access logs:

http {

  log_format main ‘$remote_addr - $remote_user [$time_local] "$request" ‘
                  ‘$status $body_bytes_sent "$http_referer" ‘
                  ‘"$http_user_agent" "$http_x_forwarded_for"‘;

    access_log /var/log/nginx/access.log main;

    server {
        # Web server config
    }
}

This demonstrates the log format syntax and activating access log directive. Customizations like adding $request_time possible.

Without centralized log analysis, it becomes extremely difficult to track these metrics across multiple servers. Technologies like the ELK stack have emerged just to collect, aggregate and analyze logs at scale.

ELK stack for log analysis

Popular open source stack for logging – Elasticsearch, Logstash, Kibana

Streaming Log Analysis vs Batch Processing

Two popular models have emerged for ingesting and analyzing access logs:

Streaming

  • Logs are consumed in real-time as they are written
  • Enables live traffic monitoring, alerts for issues
  • Requires log forwarders like Logstash or Beats

Batch

  • Logs are parsed/ingested as batches on schedule
  • Works well for historical analysis at lower frequency
  • Tools run cron jobs to process accumulated logs

Based on use cases, a blend of streaming and batch pipelines may be required.

Parsing Access Logs with Shell Commands

Now that we understand why access log analysis matters, let‘s look at a few common techniques to parse access logs using Linux shell commands for simple analysis tasks:

1. Extract all client IP addresses

cat access.log | awk ‘{print $1}‘ | sort | uniq -c | sort -n

2. Count requests per minute

cat access.log | cut -d‘]‘ -f2 | cut -d‘ ‘ -f2 | sort | uniq -c  

3. Check 404 errors

grep ‘ 404 ‘ access.log 

4. Top 10 Referrers

awk ‘{print $11}‘ access.log | sort | uniq -c | sort -n | tail

However, while these basic log parsing snippets help illustrate Nginx log analysis, they only scratch the surface of getting maximum value from your access logs. For advanced analysis, we need a specialized tool.

Introducing GoAccess – Open-source Log Analyzer

GoAccess Log Analyzer

GoAccess is arguably the most popular open-source, terminal-based log analyzer and interactive viewer for Nginx, Apache and other web servers. It can parse either live traffic or access logs in formats like Nginx, Apache, Amazon S3, Elastic Load Balancing etc.

Let‘s go through the key capabilities of GoAccess:

  • Real-time analysis – Great for detecting immediate threats or issues
  • Static or dynamic sites – Supports static and dynamic websites
  • Visual reports – Terminal dashboard, JSON, HTML reports
  • Nginx, Apache logs – Parses logs from most popular web servers
  • Geography mapping – Identify visitor hot spots across globe
  • Media types – Breakdown by HTML, CSS, JS, images etc.
  • Crawler statistics – Bot traffic and SEO data
  • Hundreds of metrics! – Time served, traffic sources, response codes and more!

In addition, GoAccess has good support for GeoIP location data, custom log formats, report filtering, session analysis and more. The HTML reports allow easy sharing of data with non-technical stakeholders too.

Let‘s compare some features of popular open source log analyzers:

Tool Nginx Support Real-time Analysis Custom Dashboards Reports
GoAccess Yes Yes No Multiple formats
AWStats Yes No Yes HTML, PDF, CSV
Webalizer Yes No No HTML

Installing GoAccess on Ubuntu / Debian

As a pre-requisite, GoAccess needs Nginx set up on the server with access logs enabled. To install GoAccess on Ubuntu 22.04 / Debian:

sudo apt update
sudo apt install goaccess

To install on other Linux distros like CentOS / RHEL:

sudo yum install goaccess  

For features like GeoIP lookup, additional compilation flags are needed:

./configure --enable-geoip=mmdb ...

Generating GoAccess Reports for Nginx Logs

With GoAccess installed, let‘s dive into parsing a sample Nginx access log:

Note: We are using a publicly available example log available here. This contains one month of requests to a website, perfect for demonstrating GoAccess‘ capabilities.

Step 1: Launch Interactive Terminal Dashboard

Launch GoAccess on the log file:

goaccess /var/log/nginx/access.log

This brings up the interactive terminal dashboard with live parsing in progress:

GoAccess Interactive Terminal

The default view shows overall metrics like:

  • Requests: Total requests
  • Valid Requests: Success requests
  • Failed Requests: Client / Server errors
  • Unique Visitors: Total unique IPs
  • Unique Files: Total unique URLs / pages accessed
  • Static Files: JS, CSS, Images requests
  • Log Size: Size of log processed

Use arrow keys to scroll vertically and horizontally to all metrics. Press Enter on any section for more details, q to quit.

Step 2: Generate HTML Report

For a more permanent report that‘s easier to share or publish, we can generate a standalone HTML report:

goaccess -a -f html -o /var/www/html/report.html /var/log/nginx/access.log

The color-coded report has been written to the path we specified above. Access it at http://your-server-ip/report.html. Here are some key sections:

GoAccess HTML Report

The main dashboard with traffic summary, top visitors, requests etc. Drill-down further for details:

GoAccess HTML Report URLs

View of top URLS, HTTP status codes returned, download times etc. Helps identify slow pages.

GoAccess HTML Report Hosts

Analyze visitors by hostnames / IPs, user agents like browsers, operating systems.

GoAccess HTML Report GeoIP

Geographic distribution of visitors across countries. Requires GeoIP module.

There are many more helpful views around traffic sources, 404 errors, crawlers, static vs dynamic content etc. that technical and non-technical teams can benefit from.

Custom Reports in GoAccess

The data views in GoAccess can also be customized significantly through configuration tweaks. Some examples:

  • Add or remove entire modules like GeoIP, hosts etc.
  • Set custom names for metrics e.g. Visitors as Total Users
  • Exclude specific IP addresses or status codes
  • Filter by date ranges or requests thresholds
  • Additional styling like colors, padding etc.

This enables more focused reports around security, performance etc. Dashboards tailored to business needs.

Analyzing Nginx Logs at Scale with ELK Stack

While GoAccess works very well for single server log analysis, for large-scale log processing across 1000s of servers, pipelines like the ELK (Elasticsearch + Logstash + Kibana) are utilized.

Some helpful diagrams explaining the flow:

ELK Stack Pipeline

Nginx logs ingested via Beats / Logstash event processing pipeline into ElasticSearch datastore, with Kibana analytics and visualizations on top

Benefits include:

  • Centralised logging across 100s of hosts
  • Scalable datastore not bounded by single node
  • Custom indexing and enrichments during ingestion
  • Enterprise-grade access controls, security etc.
  • Correlate across multiple data sources

Of course, operating at scale brings its own complexities.

Additional Tips for Effective Analysis

To further enhance the reports generated from access logs, keep these tips in mind:

  • When installing tools like GoAccess, enable GeoIP for visitor mapping
  • Generate baselines of metrics during initial monitoring weeks
  • Pay attention to suspicious 404 errors and error spikes
  • Set up alerts around sudden traffic changes or latency thresholds
  • Compare stats across modules – Geo vs hosts vs browsers etc.
  • Export JSON / CSV data to feed into external tools
  • Customize main config for filtering reports, adding metrics etc.
  • Obscure IP addresses before sharing samples publicly

Conclusion

They say data is power! For Nginx servers, comprehensive access logs when effectively parsed and visualized unlock many hidden trends and insights that may otherwise go unnoticed in today‘s complex web properties. By adopting tools like GoAccess and methodologies outlined here, you can really get the most out of Nginx access logs!

Similar Posts