Rebuilding the Nginx Logging Pipeline

2/9/2026

Today I finally replaced the original nginx logging pipeline that has been running on this site since the beginning.

When the site first went live, I needed immediate visibility into nginx access logs. I already had MariaDB running for another project, so I wrote a small Perl script to tail the access logs, parse each line, and insert the results into a relational table. It was simple, explicit, and under my control. For the traffic levels at the time, it worked fine.

That approach was always a compromise.

From the start, the intent was to treat access logs as immutable events and store them in a system designed for time-based querying. MariaDB was convenient, not ideal. It required careful indexing to stay fast, schema changes were invasive, and time-based queries were something I had to fight for rather than get naturally.

As the site grew and the questions I wanted to ask became more temporal and exploratory, the friction became harder to ignore.

The pipeline is now rebuilt around Vector and OpenSearch.

Vector handles ingestion directly from nginx access logs on disk as well as syslog input. It applies a deterministic parsing and normalization step that mirrors the nginx log format exactly. If a line does not match the expected structure, it is rejected instead of being partially parsed. Numeric fields are converted to real numeric types. The request line is decomposed into method, path, and protocol. Minimal metadata is added to identify environment and pipeline origin.

OpenSearch is now the system of record for these events.

An explicit index template defines the schema up front. Dynamic mapping is disabled. Field types are locked. Timestamps are stored as true date fields using the original nginx timestamp format. This prevents schema drift and ensures that time-based queries, sorting, and aggregations behave correctly.

Vector does not need to understand OpenSearch mappings. It emits structured events with well-defined field names. OpenSearch applies the index template automatically at index creation time and enforces the schema. Parsing and storage are intentionally decoupled.

With that foundation in place, all backend routes that previously queried MariaDB were rewritten to query OpenSearch instead. Frontend components were adjusted where necessary to reflect the new query patterns, particularly around time windows and aggregation behavior. Functionally, the views look similar. Architecturally, they are far more aligned with the nature of the data.

The original Perl parser is now retired. It did exactly what it was supposed to do when it was written. It just no longer fits where the system has grown.

This is the architecture I had in mind from day one. The initial MariaDB-based solution bought speed and simplicity at the beginning. This change replaces that expedient choice with something that scales cleanly, enforces correctness, and matches how access logs are actually used.

I have started documenting the nginx logging pipeline in detail, from raw log format through parsing, normalization, storage, and querying. That documentation is still a work in progress, which is why it is not linked from the menu yet. If you are curious and do not mind rough edges, the draft lives here:

https://dangerousmetrics.com/anvil/nginx-logs

Consider that link a warning label. It is actively evolving, but it captures the reasoning and mechanics behind this rebuild while they are still fresh.

More refinements will follow, but this was the major inflection point. The logging system now reflects how the data behaves instead of forcing it into a shape that was never meant for it.