Today, Vantage announces support for Databricks costs in the Vantage console. Vantage customers can now see their overall Databricks costs as well as costs per workspace, cluster, and Databricks tag. Customers can add any number of Databricks accounts from their integrations page and Vantage will automatically ingest and visualize Databricks costs accordingly.
Customers are increasingly using one or more cloud service providers alongside their primary cloud of AWS, GCP or Azure. For some customers, Databricks is central to their internal operations and can drive significant costs. The manner in which Databricks is deployed means customers are paying for underlying infrastructure alongside consumption based costs which are driven by user queries. This makes it difficult for organizations to see the entire picture of their Databricks costs as they have to stitch together their Databricks and cloud provider bills.
Now, customers can grant Vantage secure, read-only access to their Databricks billable usage logs. Upon granting access, Vantage will ingest granular usage records which are used to calculate overall costs. These costs will also be automatically updated on a daily basis moving forward. Filter Sets in Vantage have been expanded to allow for Cost Reports to report on Databricks costs with dimensions of a specific Databricks service, cluster, or tag. By combining filters for Databricks and the infrastructure provider where Databricks is deployed (AWS, Azure, GCP) users are able to see the combined costs for their entire Databricks deployment in a single view.
Databricks is now generally available to all customers at the time of this blog post. To begin viewing Databricks costs in Vantage, head to the integrations page to provide Vantage with Databricks usage report access. Additionally, you can read more about this integration in the Databricks section of the Vantage documentation.
Frequently Asked Questions
1. What is being launched today?
Today, Vantage is announcing general availability support for Databricks. Vantage users can now provide Vantage with access to Databricks billable usage logs and corresponding costs will automatically begin to be ingested and visualized within Vantage from the corresponding account. Vantage will refresh cost data from Databricks on a daily basis to ensure data is always up-to-date.
2. Who is the customer?
This feature is available to all Vantage users, including users in the free tier. You must have a Databricks workspace deployed on Azure, GCP or AWS.
3. What is Databricks?
Databricks is an enterprise software company that provides data engineering tools for processing and transforming large volumes of data.
4. How much does this cost?
There is no additional cost to Databricks support. However, Databricks costs will be included in quota tier enforcement. In the event that your Databricks costs push you over your current tier limit you may be prompted to upgrade. To see more details on pricing, please refer to the pricing page here: https://www.vantage.sh/pricing
5. How does Vantage technically integrate with Databricks?
Vantage will configure an S3 Bucket for each Databricks account for consuming the billable usage logs. Upon this S3 Bucket being created (which is done via a wizard in the integration flow), you’ll be given copy-and-paste instructions for having the billable usage logs delivered automatically to Vantage by Databricks. Vantage requests 6 months of backfilled data when configured by default.
6. Does Vantage have access to data stored in Databricks?
No. Vantage will only have access to the usage logs exported by Databricks. These usage logs only contain billing, workspace, and cluster related metadata.
7. What dimensions can Databricks costs be filtered by?
Databricks costs can be filtered by service, account, category and tag.
8. Can I see Databricks queries in my Active Resources?
No, currently there are no active resources for Databricks. However, when clicking on Costs by Resource for a Databricks service you will see costs broken down by cluster.
9. I am a Vantage customer who has already connected my AWS, Azure and/or GCP account, can I see my Databricks costs alongside existing AWS/Azure/GCP costs?
Yes. After adding a Databricks integration, Databricks will be available as a provider under Cost Report filters. Databricks costs will also automatically be included under the “All Resources” cost report. You can combine cost data between providers with Databricks cluster and pool tags.
10. What Cost Report groupings are available for Databricks costs?
Databricks costs can be grouped by service, cluster, and tag.
11. How often does Databricks data refresh in the Vantage console?
Vantage will receive periodic updates as your Billable Usage Logs are updated by Databricks, generally once per day.
12. What happens if I remove a Databricks integration?
If you decide to remove your Databricks integration from Vantage, all associated costs will be removed from the Vantage console associated with that Databricks account.
13. Can I have multiple Databricks account integrations?
Yes. There is no limit to the number of Databricks integrations you can add to Vantage.
14. Will Databricks costs be represented in the Overview page?
Yes. Databricks costs are available in the Costs by Provider widget.
15. I have custom Databricks pricing. Do you support this?
Please contact firstname.lastname@example.org if you have negotiated pricing you’d like supported.
16. Is there any additional documentation on this integration?
Yes. Documentation on the Databricks integration can be found here: https://docs.vantage.sh/connecting_databricks
17. I’ve just added my Databricks account, how long will it take for my cost data to be present in the Vantage console?
Costs will be ingested and processed as soon as they’re delivered by Databricks, you can see the imported billable usage logs within your Integrations Settings page. Databricks delivers logs once a day. Costs will initially be ingested once these files are delivered the first time which may take up to 24 hours.
18. How does Vantage calculate Databricks costs?
The usage records are analyzed and for each unique SKU a cost is calculated based on the amount of usage and the price for that SKU.
19. Databricks can be deployed on Azure, GCP or AWS. Does Vantage support all clouds?
Yes, Vantage integrates with the billable usage logs feature from Databricks which provides usage logs for all of the supported Databricks cloud providers.
20. Will historical cost data for Databricks be available?
Yes, when following the integration setup, you will be configuring 6 months of historical usage data from Databricks.
21. If I already have usage logs exports enabled. Can I integration Vantage into my existing S3 Bucket?
Not currently. Please contact email@example.com if you’d like to see this supported.
22. What comprises Databricks costs?
Databricks costs will be the usage costs derived from DBUs. You can combine these DBU costs with the underlying infrastructure present in the cloud provider on a single Cost Report in order to see the overall Databricks cost via tags.
23. What other cloud service providers is Vantage adding?
24. I have a request for a cloud service provider who is not on the current roadmap - can I get it supported?
Maybe. Please write into firstname.lastname@example.org as we are prioritizing the next set of cloud service providers.