River IQ

Image

spark udf with withColumn

  Ashish Kumar      Spark February 14, 2020

import org.apache.spark.sql.functions._ val events = Seq ( (1,1,2,3,4), (2,1,2,3,4), (3,1,2,3,4), (4,1,2,3,4), (5,1,2,3,4)).toDF("ID","amt1","amt2","amt3","amt4") var prev_amt5=0 var i=1 def getamt5value(ID:Int,amt1:Int,amt2:Int,amt3:Int,amt4:Int) : Int = {     if(i==1){ i=i+1 prev_amt5=0   }else{ i=i+1   }   if (ID == 0)   { if(amt1==0) {   val cur_amt5= 1   prev_amt5=cur_amt5   cur_amt5 }else{   val cur_amt5=1*(amt2+amt3)   prev_amt5=cur_amt5   cur_amt5 }   }el...

Read more
Image

transfer files widow to Linux

  Ashish Kumar      other February 14, 2020

riveriq_copytoedge.bat@echo off set timestamp=%DATE:/=-%_%TIME::=-%C:\"Program Files (x86)"\WinSCP\WinSCP /script="E: iveriqrtifactscopytoedgeconfig iveriq_copytoedge.txt" /log="E: iveriqloggingcopytoedgeconfig%timestamp%.log" riveriq_copytoedge.txt# Being intended for interactive session, we are not enabling batch mode# Connectopen sftp://testuser:password123@testserver.riveriq.com/# Synchronize paths provided via environment variablessynchronize remote E: iveriq estsourcedatacleansedDataconfig /data/home/riveriq/testsource/data...

Read more
Image

Read JCEKS Containing Secret Keys using java

  Ashish Kumar      java February 14, 2020

package com.riveriq.db2con.driver;import java.sql.Connection;import java.sql.DriverManager;import java.sql.SQLException;import java.util.Properties;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import com.riveriq.exception.CustomException;import com.riveriq.util.ReadJceks;public class DB2Connection { private Connection conn; private static DB2Connection db2connection; private static Logger LOGGER = LoggerFactory.getLogger(DB2Connection.class); private DB2Connection() { } public Connection getConnection(Properties prop) throws SQLExcept...

Read more
Image

How to remove new lines within double quotes

  Ashish Kumar      other February 14, 2020

!/usr/bin/perluse warnings;use strict;use Path::Tiny; use Text::CSV;use Time::Piece;use File::Path qw( make_path );use diagnostics;use Try::Tiny;#use File::NCopy;use File::Copy::Recursive qw(fcopy rcopy dircopy fmove rmove dirmove);use Time::HiRes qw( time );my $start = time();my $date = localtime->strftime('%Y%m%d');my $feed_date = $date;if(exists($ARGV[3])){  $feed_date = $ARGV[3];}# build source directory path ==>my $source_feed_dir = $ARGV[0];my $source_feed_dir_path = path($source_feed_dir);# process i.e. current date ...

Read more
Image

Databricks Log4j Configuration

  Ashish Kumar      Databricks January 15, 2020

System.out.println("Caught Error") This will write output on console window. This is fine when you are developing or running manually but what if you are scheduling as job or automating your application. In that case these output or log should go to some persistent location. You can persist it anywhere database table, email server or in a log file. So here we are discussing how to write our log into log file and the one of solution is Log4j. Here I won't be explaining much about Log4j. I'm sure you must be knowing or you can ...

Read more
Image

Log4j Configuration with spark-submit

  Ashish Kumar      java January 15, 2020

This is 2nd part of log4j configuration for spark application. For more understanding about log4j you can follow below link.https://www.linkedin.com/pulse/databricks-log4j-configuration-ashish-kumar/Spark-submit/usr/hdp/3.0.1.0-187/spark2/bin/spark-submit --master yarn --queue dev --deploy-mode client --class com.riveriq.log4jExample --driver-java-options "-Dlog4j.configuration=file:/home/riveriq/log4j/conf/log4j.xml" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/home/riveriq/log4j/conf/log4j.xml" --num-exe...

Read more
Image

Azure Databricks Notebook - How to get current workspace name

  Ashish Kumar      Databricks January 15, 2020

Sometimes you also have been in some situation where you feel something should be very easy but once you started looking for that, you found it's not Here some sort of things happened with me also and sharing my learning with you all. I was looking to get current workspace name from notebook. As wanted to get environment(dev/test/stg/prod) type from workspace name and using same in notebook configuration. I did some research but couldn't succeed or I would say it won't be possible to get workspace details from notebook and reason be...

Read more
Image

Sqoop import to Text, Avro, Parquet, Sequence

  Ashish Kumar      sqoop January 27, 2019

In my previous article I explained how we can sqoop data in avro file, what kind of error it can throw and how we can resolve them...Now here I am going to show you how we can sqoop import into multiple file format and build table on top of that. As we know that we can sqoop data into multiple file format but sqoop support direct import for four file format. File Format Argument Description Avro Data Files --as-avrodatafile Imports data to Avro Data Files ...

Read more
Image

Hive Integration with Spark

  Ashish Kumar      Spark January 22, 2019

Are you struggling to access hive using spark?Is your hive table is not showing in spark?No worry here I am going to show you the key changes made in HDP 3.0 for hive and how we can access hive using spark. Now in HDP 3.0 both spark and hive ha their own meta store. Hive uses the "hive" catalog, and Spark uses the "spark" catalog. With HDP 3.0 in Ambari you can find below configuration for spark.As we know before we could access hive table in spark using HiveContext/SparkSession but now in HDP 3.0 we can access hive using Hive ...

Read more
Image

Sqoop Import in Avro Files

  Ashish Kumar      sqoop January 22, 2019

Here today I will show you how we can sqoop data into avro file format.Yeah, we know it very simple put --as-avrodatafile with your sqoop import command as per all apache documentation but in real life does all documented command works as simple as written???Defiantly not…And it happened same as others… so no worry here I’m goanna show you all probable issue you can face and how you need to debug and resolution for the same and if you have some different issue please comment. We will try to solve it together.But before talking to th...

Read more
Image

RECOMMENDATION ENGINE CONTENT-BASED FILTERING & COLLABORATIVE FILTERING

  Ashish Kumar      Machine Learning October 03, 2018

RECOMMENDATION ENGINE content-based filtering & COLLABORATIVE FILTERING  Recommendation engines are probably among the best types of machine learning model known to the general public. Even if people do not know exactly what a recommendation engine is, they have most likely experienced one through the use of popular websites such as Amazon, Netflix, YouTube, Twitter, LinkedIn, and Facebook. Recommendations are a core part of all these businesses, and in some cases, they drive significant percentages of their revenue.The idea be...

Read more
Image

CRUD Operation Using Xrm.WebApi

  Dipali Vaish      Microsoft CRM September 30, 2018

In Dynamics 365 One of the important enhancement is Xrm.WebApi. Dynamics 365 v9.0 which will help to make developers life simple. We can perform below operations in Xrm.WebApi.Note: Supports Dynamics 365 (online), version 9.xCreateCreate a record in an entity.Syntax:-Xrm.WebApi.createRecord(entityLogicalName, data).then(successCallback, errorCallback); Example:-// define the data to create new account var _accountData =     {         "name": "Sam test", ...

Read more
Image

Apache Hive 3.0

  Ashish Kumar      Hive September 26, 2018

Apache Hive 3.0 Workload managementManaging resources was always a concern to gain hive performance. With LLAP it improved the performance. With hive 3.0 using workload management, you can create resource pools and allocate resources to match your needs and prevent contention for those resources.  Workload management improves parallel query execution and cluster sharing for queries running on Hive LLAP, and improves performance of non-LLAP queries. Transaction processing improvementsMature versions of ACID (Atomicity, Consistency, I...

Read more
Image

Spark Performance Tuning

  Ashish Kumar      Spark September 26, 2018

Apache Spark overview Analytics is increasingly an integral part of day-to-day operations at today's leadingbusinesses, and transformation is also occurring through huge growth in mobile and digitalchannels. Previously acceptable response times and delays for analytic insight are no longerviable, with more push toward real-time and in-transaction analytics. In addition, data science skills are increasingly in demand. As a result, enterprise organizations are attempting to leverage analytics in new ways and transition existing analy...

Read more
Image

Facebook Page Analytics

  Ashish Kumar      other September 23, 2018

Facebook Insights has helped us determine the tastes and interests of our customer demographics.     OverviewThis section provides a snapshot of the last seven days of your Pages performance. It focuses on 3 core areas:·         Page Likes: Total and new likes for your Page·         Post Reach: Total number of unique people who were shown your Page and posts·  &...

Read more
Image

Page Insights and Post Insights Matrices mapping

  Ashish Kumar      other September 22, 2018

/{object-id}/insightsFacebook Insights is a product available to all Pages and Apps on Facebook using the Insights dashboard. This object represents a single Insights metric that is tied to another particular Graph API object (Page, Post, etc.). This object is returned by the following edges:/{page-id}/insights/{metric}/{post-id}/insights/{metric} Key Metrics.Column Name(Page Level) Metrics Name Lifetime Total Likes page_fans Daily New Likes page_fan_adds_unique Daily Unlikes ...

Read more
Image

ADF Spark Activity

  Ashish Kumar      Azure September 21, 2018

Introduction Spark Activity is one of the data transformation activities supported by Azure Data Factory. This activity runs the specified Spark program on your Apache Spark cluster in Azure HDInsight. Prerequisite for ADF Spark Activity1.     Create a Azure Storage Account and select account type as Blob storage.2.     Create an Apache Spark cluster in Azure HDInsight and Associate the Azure storage account (Blob storage). While creating HDInsight Spar...

Read more
Image

Blockchain

  Ashish Kumar      Blockchain September 20, 2018

What is Blockchain?Blockchain is a decentralized/distributed database system that acts as an “open ledger” to store and manage transactions. Each record in the database is called a block and contains details such as the transaction timestamp as well as a link to the previous block. This makes it impossible for anyone to alter information about the records retrospectively.It’s a public ledger that records everything in a secure and transparent manner. Unlike banks that facilitate transactions with traditional currencies, the blockchain al...

Read more
Image

Dynamic Allocation in Spark

  Ashish Kumar      Spark August 26, 2018

Why spark is faster than MapReduce?Here today I will give you deep dive about Spark Resource Allocation (Static and dynamic allocation of resources).Whenever this question arose, we have come up with below explanation that Spark does in-memory processing of data or it does better or effective utilization of YARN resources than MapReduce.How and when dynamic allocation of resource will give faster and effective utilization of resources.Effective utilization of cluster or yarn memory.What is Executors?Before we start talking about stati...

Read more