Databricks Log4j Configuration

Ashish Kumar Databricks January 15, 2020

System.out.println("Caught Error")

This will write output on console window. This is fine when you are developing or running manually but what if you are scheduling as job or automating your application. In that case these output or log should go to some persistent location. You can persist it anywhere database table, email server or in a log file.

So here we are discussing how to write our log into log file and the one of solution is Log4j.

Here I won't be explaining much about Log4j. I'm sure you must be knowing or you can easily get several blogs & tutorial for this on internet.

Here I will be focusing on how to configure log4j logging on Databricks. If you are using some different spark distribution (HDP, Cloudera, HDInsight etc) or if you are running spark-submit instead Databrick notebook. Please follow below link.

So let's have little basic understanding about log4j. If you want to skip this part and can directly jump down to Databricks log4j configuration section.

Log4j is a Java library that specializes in logging. log4j is highly configurable through external configuration files at runtime.

Log4j has three main components, which are the following:

1. Logger

2. Appender

3. Layout

1. loggers:

It's Responsible for capturing logging information.

Logger logger = Logger.getLogger(CurrentClass.class.getName()).

Level:

A logger can have multiple level of logging and each level has a different priority order, like this:

ALL < DEBUG < INFO < WARN < ERROR < FATAL < OFF.

If you have set Level.INFO then we will get only INFO, WARN,ERROR & FATAL log only. We won't get DEBUG log.

Ex .

public static void main(String[] args) {
log.setLevel(Level.WARN);

      log.trace("Trace Message!");
      log.debug("Debug Message!");
      log.info("Info Message!");
      log.warn("Warn Message!");
      log.error("Error Message!");
      log.fatal("Fatal Message!");
   }

Output :

Warn Message!

Error Message!

Fatal Message!

2. Appender

Appender is used to write messages into a specified file.

There several type of appender.

Here are some of the most useful types of appenders.

ConsoleAppender – logs messages to the System console

FileAppender – writes log messages to a file

RollingFileAppender – writes the messages to a rolling log file

JDBCAppender – uses a relational database for logs

Ex-

val appender=new RollingFileAppender();

appender.setName("AppLogAppender");

appender.setFile("logs/mylog4j-active.log");

appender.setAppend(true);

appender.activateOptions();

3. Layout

Layout define formatting/pattern of logging information.

Ex.

val layout=new PatternLayout("%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n")

appender.setLayout(layout);

Type of Log

There are two type of log/logger.

1. Rootlogger - It write all system/cluster generated logs. Like if consider system as spark cluster then it will write all the information about running process in cluster (Driver/Executor).

2. Application Logger - This is logger created by developer and it write manual log for application white in Application code.

Note - we can write application log too using rootlogger but messages will come same file in between system generated logs and because of that it very difficult find and understand when multiple processes are running on same cluster.

Now let's see how we can configure it on Databricks

The two most common configuration options are in practice i.e. using log4j.xml configuration or using log4j.properties configuration.

1. Using Log4j XML.

MyLog4j.xml

<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">

<log4j:configuration debug="true" xmlns:log4j='http://jakarta.apache.org/log4j/'>

<appender name="console" class="org.apache.log4j.ConsoleAppender">

<param name="Target" value="System.out"/>

<layout class="org.apache.log4j.PatternLayout">

<param name="ConversionPattern" value="%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n" />

</layout>

</appender>

<appender name="myfileAppender" class="org.apache.log4j.RollingFileAppender">

<param name="File" value="logs/mylog4j-active.log"/>

<layout class="org.apache.log4j.PatternLayout">

<param name="ConversionPattern" value="%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n" />

</layout>

</appender>

<root>

<priority value ="debug"></priority>

<appender-ref ref="console"></appender>

<appender-ref ref="myfileAppender"></appender>

</root>

</log4j:configuration>

import org.apache.log4j.Logger;

import org.apache.log4j.xml.DOMConfigurator;

DOMConfigurator.configure("MyLog4j.xml");

//Logging on console and file too.

logger.debug("Log4j Logging Test");

logger.info("Log4j Logging Test");

2. Using Log4j.properties file.

MyLog4j.properties

# Root logger option

log4j.rootLogger=INFO, myfileAppender, console

# configuration to print on console

log4j.appender.console=org.apache.log4j.ConsoleAppender

log4j.appender.console.Target=System.out

log4j.appender.console.layout=org.apache.log4j.PatternLayout

log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# configuration to print into file

log4j.appender.myfileAppender=org.apache.log4j.RollingFileAppender

log4j.appender.myfileAppender.File=logs/mylog4j-active.log

log4j.appender.myfileAppender.layout=org.apache.log4j.PatternLayout

log4j.appender.myfileAppender.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

import org.apache.log4j.Logger;

import org.apache.log4j.PropertyConfigurator;

//PropertiesConfigurator is used to configure logger from properties file

PropertyConfigurator.configure("MyLog4j.properties");

//Log in console in and log file

logger.debug("Log4j Logging Test");

logger.info("Log4j Logging Test");

Databricks log4j Configuration

Let see what is default log4j configuration of Databricks cluster.

Cluster Driver Logs:

Go to Azure Databricks Workspace > Select the cluster > Click on Driver Logs

Log4j Driver Properties:

Inside Notebook run below command

On the driver:

%sh

cat /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

On the worker:

%sh

cat /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties

driver/log4j.properties

# The driver logs will be divided into three different logs: stdout, stderr, and log4j. The stdout

# and stderr are rolled using StdoutStderrRoller. The log4j logs are again split into two: public

# and private. Stdout, stderr, and only the public log4j logs are shown to the customers.

log4j.rootCategory=INFO, publicFile

# Use the private logger method from the ConsoleLogging trait to log to the private file.

# All other logs will go to the public file.

log4j.logger.privateLog=INFO, privateFile

log4j.additivity.privateLog=false

# privateFile

log4j.appender.privateFile=com.databricks.logging.RedactionRollingFileAppender

log4j.appender.privateFile.layout=org.apache.log4j.PatternLayout

log4j.appender.privateFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.appender.privateFile.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy

log4j.appender.privateFile.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.log.gz

log4j.appender.privateFile.rollingPolicy.ActiveFileName=logs/active.log

# publicFile

log4j.appender.publicFile=com.databricks.logging.RedactionRollingFileAppender

log4j.appender.publicFile.layout=org.apache.log4j.PatternLayout

log4j.appender.publicFile.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.appender.publicFile.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy

log4j.appender.publicFile.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}.log.gz

log4j.appender.publicFile.rollingPolicy.ActiveFileName=logs/log4j-active.log

How to add custom appender and logger

dbutils.fs.mkdirs("dbfs:/databricks/scripts/")

dbutils.fs.put("dbfs:/databricks/scripts/<clustername>/myLog4j-config.sh", """

#!/bin/bash

echo "" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "#myfileAppender" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "log4j.logger.myAppLogger=INFO, myfileAppender" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "log4j.additivity.myAppLogger=false" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "log4j.appender.myfileAppender=com.databricks.logging.RedactionRollingFileAppender" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "log4j.appender.myfileAppender.layout=org.apache.log4j.PatternLayout" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "log4j.appender.myfileAppender.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "log4j.appender.myfileAppender.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "log4j.appender.myfileAppender.rollingPolicy.FileNamePattern=logs/log4j-mylog4j-%d{yyyy-MM-dd-HH}.log.gz" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

echo "log4j.appender.myfileAppender.rollingPolicy.ActiveFileName=logs/log4j-mylog4j-active.log" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

""",true)

*Custome Logger - myAppLogger , Custome Appender- myfileAppender

Here I'm not registering my appender to rootloger but if you want you can add your appender with exit below line.

log4j.rootCategory=INFO, publicFile, myfileAppender

Or you can enable/disable at run time in code.

val appender= Logger.getLogger("myAppLogger").getAppender("myfileAppender")

val root = Logger.getRootLogger();

root.addAppender(appender);

** your log4j log file should start with log4j- only , like log4j-mylog4j-active.log unless it will not show on UI under Driver logs tab

and if you have enabled log shipping functionality with cluster to DBFS or Storage Account, it won't ship your custom log file.

Once you created myLog4j-config.sh, you need to add this file path to cluster Init Scripts.

Go to Databricks workspace > cluster > Advanced Options > Init Scripts

Add your file DBSF path dbfs:/databricks/scripts/<clustername>/myLog4j-config.sh

After adding it will ask you to restart the cluster.

Note - you should not remove it once configuration is added to Log4j.propeties file.

As this is local configuration for driver and executor and will be lost at every restart of cluster. So to make your configuration you init script should be configured.

What is Init Script ?

https://docs.databricks.com/clusters/init-scripts.html

Once cluster is restarted, check log4j.properites file, our configuration should be added there.

%sh

cat /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties

Now you are ready to use your logger. Let see how we can use it in notebook.

import org.apache.log4j.Level;

import org.apache.log4j.Logger;

//for myAppLogger Logging

val logger = Logger.getLogger("myAppLogger");

logger.error("Log4j Logging Test");

logger.info("Log4j Logging Test");

logger.debug("Log4j Logging Test");

Now see the log generated.

%sh

cat logs/log4j-event-raw-active.log

Or You can see on cluster UI too.

Now how can we change Log level to debug issue or registering our appender to rootlogger.

logger.setLevel(Level.DEBUG)

logger.debug("Log4j Debug Logging Test");

//after Finding Issue, you have to set log Level to info. it will not automatically move to default level.

logger.setLevel(Level.INFO)

//To enable root logging for current appender. after this it will start writing all the other activity too that happening on cluster.

val appender= Logger.getLogger("myAppLogger").getAppender("myfileAppender")

val root = Logger.getRootLogger();

root.addAppender(appender);

// once issue got resolved and if you want set back to normal way of logging, only application log. Remove appender to root logger.

root.removeAppender(appender)

How to ship log to DBFS or storage.

By default log get created at local (Driver/ worker machine ) path log/log4j-active.log but we can enable to ship it to dbfs or storage.

https://docs.databricks.com/clusters/configure.html#cluster-log-delivery

https://kb.databricks.com/clusters/overwrite-log4j-logs.html

https://www.programcreek.com/java-api-examples/?api=org.apache.log4j.rolling.TimeBasedRollingPolicy

0 Comments

Be first to comment on this post.

You must be logged in to post a comment..

River IQ

Databricks Log4j Configuration

0 Comments