Web Archive User's $14k BigQuery Bill Shock After Running Queries On 'free' Dataset

A user left with a surprise bill for thousands of dollars after running queries on Google's BigQuery data warehouse has sparked a debate about how vendors should place limits on the use of their tools.

One user of HTTP Archive – a project that aims to track how the web is built – was recently horrified to get a $14,000 bill from Google.

The HTTP project – which crawls websites recording detailed information about fetched resources, used web platform APIs and features, and execution traces of each page – hosts a publicly available dataset on the Chocolate Factory's BigQuery cloud-base data warehouse system.

"This website makes it seem like this 'public' dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars," said user Tim on the HTTP archive forum.

"This official website should be updated to warn people Google is apparently now hosting this dataset to make money. I don't think that was the original mission, but that's what it is today, there's basically zero customer support, and you can lose $14k in the blink of an eye," he added in the discussion post.

An archive maintainer responded that 99 percent of the archive users only view its free monthly reports and annual Web Almanac reports. BigQuery is designed for the 1 percent of "power users" who "need lower level access to the raw data."

The maintainer pointed out that $14,000 would have come from processing about 2.5 petabytes, given Googles rate of $6.25 per TiB. He said Google warns users how much data the query will process when run, yet nonetheless apologized for the user's experience and said he'll add a more explicit warning about BigQuery charging to the website's FAQ page.

However, the user, Tim, came back into the conversation. He said he was running queries from a Python script with the official GCP libraries, which, unlike the web UI, does not have a mechanism to show costs for a query, he said.

"I think one thing that would help is to highlight people should enable the cost controls prior to running queries, as they are not on by default," he said.

Tim argued for a circuit-breaker at $5k or less to stop users from running queries unless they manually confirm they want to continue.

One respondent logged on to say that the complainant was an idiot — in a post now hidden by moderators — for running a query without understanding the volume of data it might address. Others may see this as unhelpful.

While Google makes BigQuery's pricing clear on its website, users — particularly students or academics — might arrive at the data from another direction. Maybe a default should be to prevent processing data above a certain threshold unless the user explicitly agrees or they have signed up to a data plan.

The Register has contacted Google for a statement. ®

RECENT NEWS

From Chip War To Cloud War: The Next Frontier In Global Tech Competition

The global chip war, characterized by intense competition among nations and corporations for supremacy in semiconductor ... Read more

The High Stakes Of Tech Regulation: Security Risks And Market Dynamics

The influence of tech giants in the global economy continues to grow, raising crucial questions about how to balance sec... Read more

The Tyranny Of Instagram Interiors: Why It's Time To Break Free From Algorithm-Driven Aesthetics

Instagram has become a dominant force in shaping interior design trends, offering a seemingly endless stream of inspirat... Read more

The Data Crunch In AI: Strategies For Sustainability

Exploring solutions to the imminent exhaustion of internet data for AI training.As the artificial intelligence (AI) indu... Read more

Google Abandons Four-Year Effort To Remove Cookies From Chrome Browser

After four years of dedicated effort, Google has decided to abandon its plan to remove third-party cookies from its Chro... Read more

LinkedIn Embraces AI And Gamification To Drive User Engagement And Revenue

In an effort to tackle slowing revenue growth and enhance user engagement, LinkedIn is turning to artificial intelligenc... Read more