Databricks Log4j Configuration

System.out.println("Caught
Error")
This
will write output on console window. This is fine when you are developing or
running manually but what if you are scheduling as job or automating your
application. In that case these output or log should go to some persistent
location. You can persist it anywhere database table, email server or in a log
file.
So
here we are discussing how to write our log into log file and the one of
solution is Log4j.
Here
I won't be explaining much about Log4j. I'm sure you must be knowing or you can
easily get several blogs & tutorial for this on internet.
Here
I will be focusing on how to configure log4j logging on Databricks. If you are
using some different spark distribution (HDP, Cloudera, HDInsight etc) or if
you are running spark-submit instead Databrick notebook. Please follow below
link.
So
let's have little basic understanding about log4j. If you want to skip this
part and can directly jump down to Databricks log4j configuration section.
Log4j
is a Java library that specializes in logging. log4j is highly
configurable through external configuration files at runtime.
Log4j
has three main components, which are the following:
1.
Logger
2.
Appender
3.
Layout
1.
loggers:
It's Responsible
for capturing logging information.
Logger logger =
Logger.getLogger(CurrentClass.class.getName()).
Level:
A logger can have
multiple level of logging and each level has a different priority order, like
this:
ALL < DEBUG <
INFO < WARN < ERROR < FATAL < OFF.
If you have set
Level.INFO then we will get only INFO, WARN,ERROR & FATAL log only. We
won't get DEBUG log.
Ex .
public static void main(String[] args) {
log.setLevel(Level.WARN);
log.trace("Trace Message!");
log.debug("Debug Message!");
log.info("Info Message!");
log.warn("Warn Message!");
log.error("Error Message!");
log.fatal("Fatal Message!");
}
Output :
Warn Message!
Error Message!
Fatal Message!
2.
Appender
Appender is used to
write messages into a specified file.
There several type
of appender.
Here are some of
the most useful types of appenders.
ConsoleAppender –
logs messages to the System console
FileAppender –
writes log messages to a file
RollingFileAppender
– writes the messages to a rolling log file
JDBCAppender – uses
a relational database for logs
Ex-
val appender=new
RollingFileAppender();
appender.setName("AppLogAppender");
appender.setFile("logs/mylog4j-active.log");
appender.setAppend(true);
appender.activateOptions();
3.
Layout
Layout define
formatting/pattern of logging information.
Ex.
val layout=new
PatternLayout("%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n")
appender.setLayout(layout);
Type
of Log
There
are two type of log/logger.
1.
Rootlogger - It write all system/cluster
generated logs. Like if consider system as spark cluster then it will write all
the information about running process in cluster (Driver/Executor).
2.
Application Logger - This is logger created by
developer and it write manual log for application white in Application code.
Note
- we can write application log too using rootlogger but messages will come same
file in between system generated logs and because of that it very difficult
find and understand when multiple processes are running on same cluster.
Now
let's see how we can configure it on Databricks
The
two most common configuration options are in practice i.e. using log4j.xml configuration or using log4j.properties configuration.
1.
Using Log4j XML.
MyLog4j.xml
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration debug="true"
xmlns:log4j='http://jakarta.apache.org/log4j/'>
<appender
name="console"
class="org.apache.log4j.ConsoleAppender">
<param
name="Target" value="System.out"/>
<layout
class="org.apache.log4j.PatternLayout">
<param
name="ConversionPattern" value="%d{yy/MM/dd HH:mm:ss} %p
%c{1}: %m%n" />
</layout>
</appender>
<appender
name="myfileAppender"
class="org.apache.log4j.RollingFileAppender">
<param
name="File" value="logs/mylog4j-active.log"/>
<layout
class="org.apache.log4j.PatternLayout">
<param
name="ConversionPattern" value="%d{yy/MM/dd HH:mm:ss} %p
%c{1}: %m%n" />
</layout>
</appender>
<root>
<priority
value ="debug"></priority>
<appender-ref
ref="console"></appender>
<appender-ref
ref="myfileAppender"></appender>
</root>
</log4j:configuration>
import org.apache.log4j.Logger;
import org.apache.log4j.xml.DOMConfigurator;
DOMConfigurator.configure("MyLog4j.xml");
//Logging on console and file too.
logger.debug("Log4j Logging Test");
logger.info("Log4j Logging Test");
2.
Using Log4j.properties file.
MyLog4j.properties
# Root logger option
log4j.rootLogger=INFO,
myfileAppender, console
# configuration to print on console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# configuration to print into file
log4j.appender.myfileAppender=org.apache.log4j.RollingFileAppender
log4j.appender.myfileAppender.File=logs/mylog4j-active.log
log4j.appender.myfileAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myfileAppender.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;
//PropertiesConfigurator is used to configure logger from
properties file
PropertyConfigurator.configure("MyLog4j.properties");
//Log in console in and log file
logger.debug("Log4j
Logging Test");
logger.info("Log4j Logging Test");
Databricks
log4j Configuration
Let
see what is default log4j configuration of Databricks cluster.
Cluster Driver Logs:
Go to
Azure Databricks Workspace > Select the cluster > Click on Driver Logs
Log4j
Driver Properties:
Inside
Notebook run below command
On
the driver:
%sh
cat
/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
On
the worker:
%sh
cat
/home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
driver/log4j.properties
#
The driver logs will be divided into three different logs: stdout, stderr, and
log4j. The stdout
#
and stderr are rolled using StdoutStderrRoller. The log4j logs are again split
into two: public
#
and private. Stdout, stderr, and only the public log4j logs are shown to the
customers.
log4j.rootCategory=INFO, publicFile
#
Use the private logger method from the ConsoleLogging trait to log to the
private file.
#
All other logs will go to the public file.
log4j.logger.privateLog=INFO,
privateFile
log4j.additivity.privateLog=false
#
privateFile
log4j.appender.privateFile=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.privateFile.layout=org.apache.log4j.PatternLayout
log4j.appender.privateFile.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{1}: %m%n
log4j.appender.privateFile.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.privateFile.rollingPolicy.FileNamePattern=logs/%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.privateFile.rollingPolicy.ActiveFileName=logs/active.log
#
publicFile
log4j.appender.publicFile=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.publicFile.layout=org.apache.log4j.PatternLayout
log4j.appender.publicFile.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{1}: %m%n
log4j.appender.publicFile.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.publicFile.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.publicFile.rollingPolicy.ActiveFileName=logs/log4j-active.log
How
to add custom appender and logger
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("dbfs:/databricks/scripts/<clustername>/myLog4j-config.sh",
"""
#!/bin/bash
echo
"" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"#myfileAppender" >>
/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"log4j.logger.myAppLogger=INFO, myfileAppender" >>
/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"log4j.additivity.myAppLogger=false" >>
/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"log4j.appender.myfileAppender=com.databricks.logging.RedactionRollingFileAppender"
>> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"log4j.appender.myfileAppender.layout=org.apache.log4j.PatternLayout"
>> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"log4j.appender.myfileAppender.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{1}: %m%n" >>
/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"log4j.appender.myfileAppender.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy"
>> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"log4j.appender.myfileAppender.rollingPolicy.FileNamePattern=logs/log4j-mylog4j-%d{yyyy-MM-dd-HH}.log.gz"
>> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo
"log4j.appender.myfileAppender.rollingPolicy.ActiveFileName=logs/log4j-mylog4j-active.log"
>> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
""",true)
*Custome
Logger - myAppLogger , Custome Appender- myfileAppender
Here
I'm not registering my appender to rootloger but if you want you can add your
appender with exit below line.
log4j.rootCategory=INFO, publicFile,
myfileAppender
Or
you can enable/disable at run time in code.
val
appender= Logger.getLogger("myAppLogger").getAppender("myfileAppender")
val
root = Logger.getRootLogger();
root.addAppender(appender);
**
your log4j log file should start with log4j- only , like
log4j-mylog4j-active.log unless it will not show on UI under Driver logs tab
and
if you have enabled log shipping functionality with cluster to DBFS or Storage
Account, it won't ship your custom log file.
Once
you created myLog4j-config.sh, you need to add this file path to cluster
Init Scripts.
Go
to Databricks workspace > cluster > Advanced Options > Init Scripts
Add
your file DBSF path
dbfs:/databricks/scripts/<clustername>/myLog4j-config.sh
After
adding it will ask you to restart the cluster.
Note
- you should not remove it once configuration is added to Log4j.propeties file.
As
this is local configuration for driver and executor and will be lost at every
restart of cluster. So to make your configuration you init script should be
configured.
What
is Init Script ?
https://docs.databricks.com/clusters/init-scripts.html
Once
cluster is restarted, check log4j.properites file, our configuration should be
added there.
%sh
cat
/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
Now
you are ready to use your logger. Let see how we can use it in notebook.
import
org.apache.log4j.Level;
import
org.apache.log4j.Logger;
//for
myAppLogger Logging
val
logger = Logger.getLogger("myAppLogger");
logger.error("Log4j
Logging Test");
logger.info("Log4j
Logging Test");
logger.debug("Log4j
Logging Test");
Now
see the log generated.
%sh
cat
logs/log4j-event-raw-active.log
Or
You can see on cluster UI too.
Now
how can we change Log level to debug issue or registering our appender to
rootlogger.
logger.setLevel(Level.DEBUG)
logger.debug("Log4j
Debug Logging Test");
//after
Finding Issue, you have to set log Level to info. it will not automatically
move to default level.
logger.setLevel(Level.INFO)
//To
enable root logging for current appender. after this it will start writing all
the other activity too that happening on cluster.
val
appender= Logger.getLogger("myAppLogger").getAppender("myfileAppender")
val
root = Logger.getRootLogger();
root.addAppender(appender);
//
once issue got resolved and if you want set back to normal way of logging, only
application log. Remove appender to root logger.
root.removeAppender(appender)
How
to ship log to DBFS or storage.
By
default log get created at local (Driver/ worker machine ) path log/log4j-active.log but we can enable to
ship it to dbfs or storage.
https://docs.databricks.com/clusters/configure.html#cluster-log-delivery
https://kb.databricks.com/clusters/overwrite-log4j-logs.html
https://www.programcreek.com/java-api-examples/?api=org.apache.log4j.rolling.TimeBasedRollingPolicy
0 Comments
Be first to comment on this post.