Big (Data) Logs

When logs are too big it is a perfect timing to thing about something bigger. An ideal place for large amounts of data is Big Data. In this case it will be Hadoop which works well with Syslog-ng.

When you have no power to build BigData environment, a good idea is to use Azure Data Lake Storage Gen2. More information about common features between ADLS Gen2 and Hadoop You can find in the documentation. Michał Smereczyński suggested to me the idea to start sending logs to ADLS by syslog-ng with HDFS.

Preparation of the environment

When you have environment build on something else, for example Graylog or ELK Stack, you need to change it a little bit. Syslog-ng can possibly collect logs and it can be sent to a given localization. Service provider suggests in its post that you can send logs to HDFS. I don't have HDFS but in ABFS (Azure Blob Filesystem) it isn't a problem because Hadoop supports it.

The configuration is very similar to the one available on the provider post:

destination d_hdfs {
        hdfs(
		# directory with Hadoop classes
                client-lib-dir("/opt/hadoop/")
                hdfs-uri("abfs://logs@biglogslinux.dfs.core.windows.net")
                hdfs-file("/user/log/log-$DAY-$HOUR.txt")
		# authorization file with properties like core-site.xml
                hdfs-resources("/opt/hdfs.xml")
                hdfs-append-enabled(true)
        );
};

Authorization file and access to logs will be described next time.

In the next steps you should configure filter and source, for example:

source log_net {
    tcp();
    udp();
};

log {
    source(log_net);
    destination(d_hdfs);
};

After that, you send logs from other servers to BigData.