River IQ

Azure Databricks Notebook - How to get current workspace name

  Ashish Kumar      Databricks January 15, 2020
Image

Sometimes you also have been in some situation where you feel something should be very easy but once you started looking for that, you found it's not

Here some sort of things happened with me also and sharing my learning with you all.

 

I was looking to get current workspace name from notebook. As wanted to get environment(dev/test/stg/prod) type from workspace name and using same in notebook configuration. I did some research but couldn't succeed or I would say it won't be possible to get workspace details from notebook and reason behind is databricks workspace is an azure resource and you cannot get resource details from notebook.

I tried to get workspace tags too but it's not allowing to get those also. You can get cluster user tags but not workspace tags.

 

Now here is the option I tried get the same and some extent I got ways to do this. Here I'm giving all steps tried.

 

Step 1:

print(spark.conf.getAll)

Or

spark.conf.get("spark.databricks.clusterUsageTags.clusterOwnerOrgId")

 

above command will give you whole bunch of variable and tags. It will give you databricks workspace id but not name and believe me I tried that also that you are thinking to get name using id but unfortunately no luck.

 

"key":"ClusterName","value":"dev-eastus2-dbs-test"},{"key":"ClusterId","value":"45555"},{"key":"DatabricksEnvironment","value":"workerenv-554556556

 

Steps 2:

After this I thought to use databricks rest api to see if I can get it there, but you are right there also no luck :)

But there I got to learn something.

 

%sh

curl --location --request GET 'https://eastus2.azuredatabricks.net/api/2.0/clusters/get?cluster_id=0323-3435-test'

--header 'Authorization: Bearer <workspace token>'

%sh

curl --location --request GET 'https://eastus2.azuredatabricks.net/api/2.0/workspace/get-status?path=/'

--header 'Authorization: Bearer <workspace token>'

 

Replace <workspace token> with your token that you can generate under user settings.

Below URL will help you to know more.

https://docs.databricks.com/dev-tools/api/latest/workspace.html

 

Step 3:

Now I am trying to hit Azure resource rest api to get workspace as I mentioned it's azure resource. And this time again :) :)

Noooo……… this time I got it…. hahahaha

Let see how.

So, as you are hitting azure resource rest api instead databricks rest api so have to authenticate yourself by azure authentication mechanism.

I sort you have to provide  Bearer token…. Nope like above Databricks token will not work here.

Here I am using service principle to authenticate and get resource details. Yeah it sounds too much to get a workspace name or tag you have to create service principle but in my case, I already had, were using for datalake mounting so I used the same.

 

 let's see how to get workspace name using service principle.

  • First get Bearer toke authenticate yourself using service principle

%sh

curl --request POST "https://login.microsoftonline.com/{Directory (tenant) ID}/oauth2/token" --data-urlencode "resource=https://management.core.windows.net" --data-urlencode "client_id={Application (client) ID}" --data-urlencode "grant_type=client_credentials" --data-urlencode "client_secret={Client secrets}"

 

Replace below value in above command

Directory (tenant) ID, Application (client) ID, Client secrets

From where I will get these???

Oh… I know you are going to ask this :) :). I'm just kidding but let me put for those who don't know.

Go to below path look for Directory (tenant) ID, Application (client) ID

Home > App registrations > find your service principle

 

for Client secrets you have to look inside Certificates & secrets tab on your left panel. If there is no client secrete create yours or if it's there contacts to your admin as you cannot get this by yourself.

Now once you get token from above command you need to pass it into below command.


  • Get resource name (workspace name) using Azure resource rest api

%sh

curl --location --request GET 'https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Databricks/workspaces?api-version=2018-04-01'

--header 'Authorization: Bearer {Put token generated by above command}'

 

Replace above with your azure subscription id & resourceGroupName (in which you have created your databricks workspace)

 

Once you will run this command you will get a json having all resource for that resource group and you can get names and tags too.

Like below

"id":"/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Databricks/workspaces/{Workspace Name}","name":"dev-eastus2-test","type":"Microsoft.Databricks/workspaces","sku":{"name":"standard"},"location":"eastus2","tags":{"resource-status":"active","resource-owner":"Ashish","test":"test","envType":"dev","costCenter":"4587887","departmentName":"Technology","productName":"myproj"

 

Note- To get specific value from curl command output that is json. You need to parse the json and that you can do from jq utility eg :-

< curl command > | jq --raw-output '.AssetID'

 

Some helping URLs:

https://docs.microsoft.com/en-us/rest/api/databricks/workspaces/listbyresourcegroup

https://abcdazure.azurewebsites.net/how-to-authenticate-in-azure-rest-api/

https://medium.com/@mauridb/calling-azure-rest-api-via-curl-eb10a06127

0 Comments

Be first to comment on this post.