River IQ

Image

spark udf with withColumn

  Ashish Kumar      Spark February 14, 2020

import org.apache.spark.sql.functions._ val events = Seq ( (1,1,2,3,4), (2,1,2,3,4), (3,1,2,3,4), (4,1,2,3,4), (5,1,2,3,4)).toDF("ID","amt1","amt2","amt3","amt4") var prev_amt5=0 var i=1 def getamt5value(ID:Int,amt1:Int,amt2:Int,amt3:Int,amt4:Int) : Int = {     if(i==1){ i=i+1 prev_amt5=0   }else{ i=i+1   }   if (ID == 0)   { if(amt1==0) {   val cur_amt5= 1   prev_amt5=cur_amt5   cur_amt5 }else{   val cur_amt5=1*(amt2+amt3)   prev_amt5=cur_amt5   cur_amt5 }   }el...

Read more
Image

Databricks Log4j Configuration

  Ashish Kumar      Databricks January 15, 2020

System.out.println("Caught Error") This will write output on console window. This is fine when you are developing or running manually but what if you are scheduling as job or automating your application. In that case these output or log should go to some persistent location. You can persist it anywhere database table, email server or in a log file. So here we are discussing how to write our log into log file and the one of solution is Log4j. Here I won't be explaining much about Log4j. I'm sure you must be knowing or you can ...

Read more
Image

Spark Performance Tuning

  Ashish Kumar      Spark September 26, 2018

Apache Spark overview Analytics is increasingly an integral part of day-to-day operations at today's leadingbusinesses, and transformation is also occurring through huge growth in mobile and digitalchannels. Previously acceptable response times and delays for analytic insight are no longerviable, with more push toward real-time and in-transaction analytics. In addition, data science skills are increasingly in demand. As a result, enterprise organizations are attempting to leverage analytics in new ways and transition existing analy...

Read more
Image

ADF Spark Activity

  Ashish Kumar      Azure September 21, 2018

Introduction Spark Activity is one of the data transformation activities supported by Azure Data Factory. This activity runs the specified Spark program on your Apache Spark cluster in Azure HDInsight. Prerequisite for ADF Spark Activity1.     Create a Azure Storage Account and select account type as Blob storage.2.     Create an Apache Spark cluster in Azure HDInsight and Associate the Azure storage account (Blob storage). While creating HDInsight Spar...

Read more