Easily build complex reports
Monitoring and efficiency metrics
Custom cost allocation tags
Network cost visibility
Organizational cost hierarchies
Budgeting and budget alerts
Discover active resources
Consumption-based insights
Alerts for unexpected charges
Automated AWS cost savings
Discover cost savings
Unified view of AWS discounts
COGS and business metrics
Model savings plans
Collaborate on cost initiatives
Create and manage your teams
Automate cloud infrastructure
Cloud cost issue tracking
Cloud data through AI
Detect cost spikes
by Vantage Team
Today, Vantage is launching Persistent Metrics Recovery for Kubernetes: a reliability enhancement to ensure resource utilization metrics collected by the Vantage Kubernetes agent are retained during periods where the Vantage upload endpoint is unreachable. This can be due to environment configuration changes preventing outside communication or outages to Internet Service Providers, Cloud Providers, or Vantage. With this update, the agent now persistently stores metric reports in a specified fallback location and retries uploads later, ensuring reliable reporting even during periods of prolonged connectivity issues.
The Vantage Kubernetes agent collects hourly resource utilization metrics and sends them to the Vantage API for ingestion and analysis in the Vantage console. Historically, when the Vantage API was unreachable for more than 1–2 hours, the agent’s in-memory buffer would lose raw metrics before hourly reports could be generated. These metrics were unrecoverable, resulting in gaps in customers’ Kubernetes cost and usage data.
Now, with Persistent Metrics Recovery, the Vantage Kubernetes agent detects failed uploads of hourly generated metric reports and stores them in a fallback location for up to 96 hours until they can be successfully uploaded. There is no additional configuration required for this feature. The hourly reports are stored in the customer’s existing data persistence location, which can be a Persistent Volume (default storage location) or a specified S3 location. The system periodically retries uploading these hourly utilization reports and clears them once they are successfully delivered. This approach ensures hourly data is preserved for up to 96 hours, even during extended API or S3 outages.
This improvement is available to all Kubernetes customers starting today. To get started, upgrade your Kubernetes agent to version v1.0.29. For more details, see the Kubernetes Agent documentation, or check the logs for recovery and retry events.
v1.0.29
1. What is being launched today?
Vantage is introducing Persistent Metrics Recovery for the Vantage Kubernetes agent. This feature stores hourly resource utilization reports for up to 96 hours when uploads to the Vantage API fail. The agent periodically retries these uploads until the reports are successfully delivered.
2. Who is the customer?
Any Vantage customer running the Kubernetes agent in their cluster who relies on consistent utilization metrics.
3. How much does this cost?
There is no additional cost. Persistent Metrics Recovery is included, by default, for all Vantage Kubernetes agent users.
4. How does it work?
When the agent fails to upload a generated hourly report (due to a Vantage API or S3 outage), it now writes that report to local disk (or a fallback S3 location). The agent retries to upload the reports until they are successful or age out after 96 hours. Old or un-sendable reports are eventually purged to prevent storage bloat.
By default, reports are stored in the attached Persistent Volume, but you can configure data persistence to use S3 instead.
5. What are the reasons that my data might not successfully sync with Vantage?
This could be caused by outages to either S3 or the Vantage API, configuration changes in customer environments preventing external communication, or broader internet outages that prevented communication with the Vantage API.
6. What happens if an outage lasts more than 96 hours?
Data older than 96 hours will be purged to avoid unbounded disk usage or storage consumption. Any hourly data not successfully uploaded within this window may still be lost.
7. Does this increase the agent’s resource usage?
The mechanism is designed to have minimal impact during normal operations. Disk usage is bounded, and logs are provided to track recovery and retry behavior.
8. Will this consume additional memory for my clusters?
No, because these logs are written to an external backup location, this will not impact memory utilization of your running workloads
9. How can I tell if my agent is in this state or has been in this state?
The agent publishes a metric called vantage_last_report_timestamp_seconds that can tell you the delta between now and the last report. In normal circumstances, it would be under an hour. Additionally, logs can be viewed by running the kubectl logs <pod-name> command.
vantage_last_report_timestamp_seconds
kubectl logs <pod-name>
10. Is any additional configuration required to enable this feature?
This behavior is enabled by default in the latest version of the Kubernetes agent. If you choose to use S3 for your persistent storage, you will need to configure your storage preference.
The Remote Vantage MCP Server lets AI agents access cloud cost data, with no infrastructure management required.
Vantage announces an enhancement to its Custom Provider integration to ingest billing CSVs in their existing format without the need to reformat to the FinOps FOCUS format.
Vantage announces Custom Columns for Resource Reports, allowing users to select and arrange resource specific metadata fields directly within their Resource Reports.