What is the position file for?

The pos_file records how far Fluentd has read in each log file. If the agent restarts, it resumes from the saved offset instead of re-reading the whole file or skipping new lines, preventing duplicate or lost records.

How does tag-based routing work?

Each source assigns a tag to its records, and match blocks select records by tag, supporting wildcards like app.**. The first matching block handles the record, so order and tag specificity decide where logs go.

Why does Fluentd buffer output?

The buffer collects records and flushes them in batches on the flush_interval, which smooths bursts and lets Fluentd retry on failure. A file buffer also survives restarts, so in-flight logs are not lost if the agent crashes.

Which parser should I choose?

Use json when your app already logs structured JSON, regexp with a named-capture expression for custom plain-text lines, and the built-in apache2 or nginx parsers for those web servers. The parser turns raw text into queryable fields.

td-agent is the packaged, stable distribution of Fluentd maintained by the vendor, with a bundled Ruby and plugins. The configuration syntax is identical, so this fluent.conf works in either; only the install path and service name differ.

Fluentd Configuration Builder

A fluent.conf that reads files and ships them out

Fluentd routes logs from a source, through optional parsing, to a destination, all driven by tags. This tool builds that flow: a tail source with position tracking, a parser, and a match block that writes to Elasticsearch or S3 with reliable buffering.

How it works

The <source> with @type tail watches a file or glob and emits each new line as a record stamped with a tag. The pos_file stores the read offset so a restart resumes exactly where it left off rather than re-reading or skipping. Inside the source, a <parse> directive turns raw text into structured fields — json for already-structured logs, regexp with named captures for custom formats, or the built-in apache2 and nginx parsers.

Records then flow to a <match> block selected by tag; wildcards like app.** let one rule capture many sources, and the first match wins. The output plugin determines the destination: elasticsearch with logstash_format rolls records into daily indices, while s3 writes batched objects under a time-partitioned key path. Every output wraps a <buffer> that batches records and flushes them on flush_interval. A file buffer persists to disk so in-flight logs survive a crash, and retry_max_times bounds how often Fluentd retries a failing destination.

Tips and example

Give each source a distinct tag prefix so match rules stay unambiguous, and keep the buffer on disk in production for durability. A file-to-Elasticsearch config:

<source>
  @type tail
  path /var/log/app/*.log
  pos_file /var/log/td-agent/app.log.pos
  tag app.logs
  <parse>
    @type json
  </parse>
</source>

<match app.logs>
  @type elasticsearch
  host elasticsearch
  port 9200
  logstash_format true
  logstash_prefix app-logs
  <buffer>
    @type file
    path /var/log/td-agent/buffer/out
    flush_interval 5s
  </buffer>
</match>

Watch the buffer directory in production — if a destination is down, the buffer grows, so size your disk and retry settings accordingly.

Fluentd Configuration Builder

Email me this result

A fluent.conf that reads files and ships them out

How it works

Tips and example