A fluent.conf that reads files and ships them out
Fluentd routes logs from a source, through optional parsing, to a destination, all driven by tags. This tool builds that flow: a tail source with position tracking, a parser, and a match block that writes to Elasticsearch or S3 with reliable buffering.
How it works
The <source> with @type tail watches a file or glob and emits each new line as a record stamped with a tag. The pos_file stores the read offset so a restart resumes exactly where it left off rather than re-reading or skipping. Inside the source, a <parse> directive turns raw text into structured fields — json for already-structured logs, regexp with named captures for custom formats, or the built-in apache2 and nginx parsers.
Records then flow to a <match> block selected by tag; wildcards like app.** let one rule capture many sources, and the first match wins. The output plugin determines the destination: elasticsearch with logstash_format rolls records into daily indices, while s3 writes batched objects under a time-partitioned key path. Every output wraps a <buffer> that batches records and flushes them on flush_interval. A file buffer persists to disk so in-flight logs survive a crash, and retry_max_times bounds how often Fluentd retries a failing destination.
Tips and example
Give each source a distinct tag prefix so match rules stay unambiguous, and keep the buffer on disk in production for durability. A file-to-Elasticsearch config:
<source>
@type tail
path /var/log/app/*.log
pos_file /var/log/td-agent/app.log.pos
tag app.logs
<parse>
@type json
</parse>
</source>
<match app.logs>
@type elasticsearch
host elasticsearch
port 9200
logstash_format true
logstash_prefix app-logs
<buffer>
@type file
path /var/log/td-agent/buffer/out
flush_interval 5s
</buffer>
</match>
Watch the buffer directory in production — if a destination is down, the buffer grows, so size your disk and retry settings accordingly.