Unity Catalog (UC) is the foundation for all governance and management of data objects in Databricks Data Intelligence Platform. Since its launch several years ago Unity Catalog has become the best way to experience Azure Databricks.
For most of the time Databricks has existed the primary method of managing data objects like tables is the built-in Hive Metastore (HMS) attached to each Azure Databricks Workspace and by default UC was not enabled on new workspaces. This introduces complexity when onboarding new workspaces and requires retrospective configuration in the Databricks Account console. In order to ease the path to getting started with Unity Catalog Azure Databricks now allows the automatic assignment of a workspace catalog for each new workspace at provisioning time.
This article will present an overview of exactly what happens when a workspace catalog is automatically provisioned for a new Azure Databricks workspace. There is an assumption the reader has an understanding of what Unity Catalog is and how to set it up on Azure, along with how its securables (data objects) and permissions are managed.
Part two of this article will show the differences between the Azure implementation of UC automatic assignment and the AWS implementation.
When a new Azure Databricks Workspace is automatically enabled for Unity Catalog an initial catalog is created called a workspace catalog. This catalog allows workspace users to easily get started using UC by granting some initial permissions to both the Workspace Administrators and Workspace Users.
The workspace catalog has the following properties:
The workspace catalog is made up of three Unity Catalog securables:
All three of these UC securables are bound to the workspace and not by default available to any other workspace sharing the metastore.
All new Azure Databricks Accounts created after 9th November 2023 are enabled for Automatic Workspace Assignment, which means that when a new workspace is created it will have a workspace catalog provisioned for it. The process of enabling older Databricks Accounts is ongoing at the time of writing. Organisations who have not had Automatic Workspace Assignment enabled on their Account can request to opt-in by contacting their Databricks Account team.
When creating a workspace in a region where Automatic Workspace Assignment is enabled on the Account but there is no metastore a metastore will be created for you. The properties of this metastore are:
If required a Metastore Owner can be allocated by an Account Administrator.
In order to automatically enable all new workspaces in a region for Unity Catalog on an existing metastore in that region the checkbox in Workspace assignment under the metastore settings in the Catalog section of the Account Console has to be checked.
When a metastore is assigned to a workspace a default catalog name is set for all users of that workspace. If the workspace is created via the UI and automatically enabled for UC then the default catalog will be the workspace catalog. If the workspace is created via an API (including using Terraform or an SDK) the default catalog will be the hive_metastore.
There are several items in Azure that need to have been created in order to allow objects to be physically stored in the Workspace managed storage account:
In the Azure Portal you can go to the managed resource group attached to the Azure Databricks Workspace and see the Access Connector called unity-catalog-access-connector.
There are system owned groups that are provisioned with the workspace in order to be granted enough permissions to ensure the workspace catalog and other securable objects created with the workspace can be managed. These groups do not appear in most surfaces in the Workspace UI, Account Console or APIs and can not be used to grant Unity Catalog privileges to other securables. The membership of these groups is kept in sync with all the users who have been pushed to the workspace as either the ADMIN or USER role using Identity Federation.
Group | Name | Unity Catalog Grants |
Workspace Admin |
_workspace_admins_${workspace_name}_${workspace_id} |
OWNER on credential, external location and workspace catalog in addition to the metastore level rights listed in the next section |
Workspace Users |
_workspace_users_${workspace_name}_${workspace_id} |
Usage (USE_CATALOG) rights on workspace catalog and usage rights on default schema (see below) |
The following shows the grants on the default schema.
In order to create all these Unity Catalog securables the Workspace Admins system owned group needs some grants on the Unity Catalog Metastore (the screenshot below also shows this workspace was provisioned via Terraform so it has hive_metastore as a default catalog)
These grants do not include ownership of the metastore, meaning the workspace admin can not delete metastore level UC securables that were created or owned by other identities, including the workspace catalog and securables on other workspaces created with UC by default.
These grants also allow the Workspace Administrators to create other catalogs and related underlying securables like credentials and external locations. By default any securable created will be owned by the individual identity that created that securable and ownership allows transfer of ownership to a group.
While the provisioning of a workspace catalog greatly simplifies the initial setup of Unity Catalog for new workspaces it does also tie the catalog directly to the lifecycle of the workspace: if the workspace is decommissioned the managed storage account that contains any unity catalog objects in the workspace catalog will also be lost.
The recommendation is to adhere to existing best practices for creating catalogs, aligning them with SDLC (Software Development Lifecycle), business units, and/or projects. This allows more flexibility to segregate storage away from the workspace and to bind these catalogs to multiple workspaces where required. It also means that the addition or removal of a workspace does not impact the lifecycle of any data stored in Unity Catalog.
The metastore permissions granted to the system owned Workspace Admins group give enough permissions to create the required securables (credentials, external locations, catalogs etc) to achieve the required catalog design for your organisation.
To recap, when Azure Databricks workspaces are auto-enabled for Unity Catalog, a default workspace catalog is created along with necessary cloud resources and permissions – all without manual effort. This makes it much easier for the users of new workspaces to start using Unity Catalog immediately, however it is still important to follow best practices in catalog design.
Unity Catalog will continue to be the foundation that the Databricks Data Intelligence Platform is built on. The ability to automatically enable Unity Catalog for all new workspaces greatly reduces the friction to start getting all the benefits of the platform.
For details on what happens when using UC automatic assignment on AWS please see part two of this article.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.