Vantage Updates Databricks Integration to Use System Tables

by Vantage Team


Vantage Updates Databricks Integration to Use System Tables

Today, Vantage is launching an update to its Databricks integration that uses the newly released System Tables to ingest detailed, post-discount billing data. This upgrade gives Vantage customers a more accurate, granular, and native way to visualize Databricks costs alongside the rest of their infrastructure.

Databricks cost and usage visualized in a Cost Report

Previously, the Databricks integration with Vantage used Billable Usage Logs to ingest cost data. These logs provided SKU-level usage but only reflected list pricing, leaving enterprise customers to manually input discounts in Vantage to approximate their actual costs. This approach often introduced discrepancies and required continual adjustment to maintain accuracy.

Now, with support for System Tables, Vantage retrieves authoritative billing data directly from Databricks. Upon deployment of a dedicated Serverless SQL Warehouse and granting access, Vantage will ingest multiple System Tables to ingest precise usage quantities, list price before discounts, and post-discounted price provided by Databricks. This cost and usage data can be combined with other infrastructure providers where Databricks is deployed, such as AWS, Azure, and GCP, to see the combined costs for your entire Databricks deployment in a single view.

The updated Databricks integration is available now to all Vantage customers, and it is recommended that customers with existing Databricks integrations upgrade to the newer integration type. To connect your Databricks account, visit the Integrations section within Settings. For more details, see the Databricks documentation. For an option to set up the integration using a Terraform module, see the module’s GitHub repository.

Frequently Asked Questions

1. What is being launched today?

Vantage is updating its Databricks integration to ingest billing data from Databricks System Tables, including the system.billing.usage, system.billing.list_prices, and system.billing.account_prices tables.

2. Who is the customer?

This integration is available to all Vantage customers using Databricks with a Unity Catalog-enabled workspace.

3. How much does this cost?

There is no additional cost for using the Databricks integration. However, Databricks costs will be included in Vantage quota tier enforcement. In the event that your Databricks costs push you over your current tier limit, you may be prompted to upgrade. To see more details on pricing, please refer to the Pricing page.

4. How does the integration work?

Vantage creates a Serverless SQL Warehouse dedicated for Vantage to query system tables within a Unity Catalog-enabled workspace. Create a Service Principal that Vantage will use, grant CAN USE permissions to the SQL Warehouse for the Service Principal, and grant DATA READER permissions on the system tables to the Service Principal.

5. What role do I need in Databricks to perform this integration?

You require account admin privileges in Databricks in order to perform this integration.

6. What role do I need in Vantage to perform this integration?

You must have Owner or Integration owner in Vantage in order to perform this integration.

7. Does Vantage have write access to my Databricks account?

No, Vantage uses the DATA READER permissions for reading System Tables, and does not have access to perform any actions in your Databricks account.

8. What are System Tables in Databricks?

System Tables are a set of Unity Catalog tables that expose operational and billing metadata. For cost monitoring, Vantage uses:

  • system.billing.usage: contains SKU-level usage data by workspace.
  • system.billing.list_prices: provides SKU-level list pricing.
  • system.billing.account_prices: shows discounted prices for customers on enterprise agreements (requires private preview opt-in).
  • system.compute.clusters: contains metadata like human-readable names for clusters and custom tagging.
  • system.compute.warehouses: contains metadata such as warehouse configuration, human-readable warehouse names, and custom tags.
  • system.access.workspace_latest: contains human-readable names for workspaces.

9. Do I need Unity Catalog enabled to use this integration?

Yes, access to System Tables requires a Unity Catalog-enabled workspace.

10. Does Vantage support enterprise discounted rates?

Yes, if you are enrolled in the system.billing.account_prices System Tables, Vantage will use these discounted prices for cost calculation. This is considered to be in private preview from Databricks, and may require you to work with your Databricks account team to enable.

11. How far back will my data be available for this integration?

Every Databricks customer will have a different length of data based on account creation, enablement, and retention. Per Databricks, they offer one year of free retention and are rolling out configurable retention soon that will let customers extend it, but customers will not be able to perform backfills of System Tables. As of now, most customers will have data back until Sep 1, 2023. If your Databricks account has data available for a greater duration than your Vantage data retention period, only data up to your retention limit will be ingested.

12. How do I tell how far back my System Table integration goes?

You can speak with your Databricks account team, or run this query in your Databricks account:

SELECT MIN(usage_date) as oldest_full_month
FROM system.billing.usage
WHERE DAY(usage_date) = 1;

13. I’m using the previous Databricks integration. What is the recommendation in light of this new launch?

Vantage recommends you perform the new integration in order to receive the most up-to-date billing data from Databricks, as new products will not be added to the former Billable Usage Logs. If your Vantage retention is longer than your Databricks System Tables integration dates back, you can keep your previous integration to ensure continuity. Otherwise, you can remove your previous integration once the new integration is complete. Previous integrations will be indicated with a V1 - Read Only badge on the integrations page.

14. If I keep my previous integration, how do I ensure my costs are not double counted?

Vantage will backfill your new Databricks integration as far back as the systems tables contain data and will then remove overlapping data from your old Databricks integration.

You should disable the Vantage Log Delivery configuration within your Databricks account to prevent duplicate data from being ingested. See the documentation for instructions on how to do this.

15. What dimensions can I filter or group Databricks costs by?

The following filter and group by dimensions are available on Databricks Cost Reports:

  • Billing Account (e.g., Account)
  • Linked Account (e.g., Workspace)
  • Service (e.g., Jobs Compute)
  • Charge Type (e.g., Usage)
  • Category (e.g., Photon)
  • Subcategory (e.g., Serverless)
  • Resource ID (specific ID for a given Databricks resource)
  • Tags (Tags from Databricks and Virtual Tags created in Vantage)

16. Are there Active Resources available for Databricks?

At this time, Active Resources are not available for Databricks resources.

17. Will Databricks costs be represented in the Overview page?

Yes, Databricks costs are represented in the Overview page and present in the Provider Summary widget, just as before.

18. How long will it take for my Databricks cost data to be present in the Vantage console?

Costs will be ingested and processed as soon as you add the integration. It usually takes less than 15 minutes to ingest Databricks costs. As soon as they are processed, they will be available on your All Resources Cost Report.

19. How often does Databricks data refresh in the Vantage console?

Databricks data is refreshed daily in the Vantage console.

20. What happens if I remove a Databricks integration?

If you decide to remove your Databricks integration from Vantage, all costs associated with your Databricks account will be removed from the Vantage console.

21. Can I have multiple Databricks integrations?

Yes, you can perform the Databricks integrations for each Databricks account you have. Each integration you perform will collect data for all Workspaces under your Databricks account.

22. Can I view usage data for Databricks?

Yes, usage data is available for services that measure consumption, such as usage in DBUs (Databricks Units) or GBs.

23. What IP addresses will Vantage’s requests be coming from?

Vantage will use the following IP addresses when connecting to your Databricks account.

24. Does this integration incur costs in my Databricks account?

Vantage uses a Serverless SQL Warehouse to process the system table queries. It is estimated that this will incur roughly $84/month in your data Databricks account. Vantage uses the smallest warehouse possible to minimize costs.