In our User Friendly BigQuery SQL Functions post, we cover 12 user-friendly BigQuery SQL features recently announced by Google Cloud. One of these is the ability to easily export data (using SQL) from BigQuery to Google Cloud Storage.
In our post, we point out that this EXPORT DATA operation is one of a number of FREE operations available in BigQuery. This got us thinking; how many BigQuery professionals out there are actually aware these free operations exist? Surely it would help cost efficiency if we shared them? So we did.
If you are new to BigQuery
Google BigQuery is a modern hyper-scale Cloud data warehouse platform. Thanks to Google’s industry leading network, storage and compute infrastructure, BigQuery is able to execute SQL queries over petabytes of data.
We think BigQuery perfectly balances scalability, flexibility, cost of ownership and ease of use, whilst guaranteeing security.
So, in no particular order, here are the freebies on offer with BigQuery.
1. No standing charge
The first thing to note about BigQuery is it has a true consumption based pricing model. Unlike some of the other alternatives out there (notably Amazon Redshift and Snowflake), with BigQuery you only pay for the data you store and data you scan (consume). There is no standing charge.
This is quite remarkable when you think you have access to compute power that rivals research-tier supercomputers.
2. BigQuery free tier
Google Cloud offers a generous free tier across most of its core services, including BigQuery. Under this Google Cloud Free Tier, BigQuery offers the first 10Gb of storage and the first 1Tb of data scan for free.
We are often asked if these rollover? The answer unfortunately is no (however it does mean that you can try out services like BigQuery for free).
3. Loading data into BigQuery
It is very easy to ingest JSON, CSV and Parquet files from Google Cloud Storage into BigQuery. You can even create a federated table in BigQuery which will query the files in GCS at query-time (we do this a lot). As well as being easy, this is completely free. You only pay for the data storage costs in GCS and BigQuery.
Furthermore, Data Transfer Service (DTS) offers FREE daily ingestion from a number of common data sources. These include:
Google Cloud Storage
Google Campaign Manager
Google Analytics 360
Google Ad Manager
Google Merchant Center
Search Ads 360
YouTube Content Owner
4. Copying tables in BigQuery
There are no charges for copying data. This includes within a BigQuery project and also between BigQuery projects.
5. Exporting data from BigQuery to Google Cloud Storage
This is a new feature for BigQuery (we cover it in our last post) and allows data in BigQuery to be easily exported to GCS using SQL. The great news is that there is no charge for the data transfer element - again just the storage costs in GCS and BigQuery.
6. DDL operations
DDL operations, such as deleting a table, dataset or view incur no charge. This includes the recently introduced TRUNCATE operation (OK, admittedly Google Cloud were playing catch-up with this one). We recommend TRUNCATE over a delete when clearing the contents of a table because a delete incurs a scan of rows to be deleted. This also includes deleting partitions.
7. Metadata operations
BigQuery supports an extensive array of metadata, the most common being a description held against each table, something which we encourage all our clients to do as it greatly speeds up the process of data exploration. All operations on this metadata, including view, update and delete are completely free. Great for reducing the cost of ownership of a data catalogue.
We are a big advocate of dbt (data build tool) for building SQL data pipelines in BigQuery and this has full support to automatically update table metadata. A great way to auto-document data assets.
8. Querying pseudo columns
Certain BigQuery tables support pseudo columns. For example, we frequently see clients exporting data to BigQuery with Firestore’s built-in function. Data is exported into date sharded tables, and these tables are automatically assigned the date as the _TABLE_SUFFIX, which makes crafting cost-efficient SELECT statements much easier.
All pseudo columns are free to scan.
9. Querying meta tables
There is no charge for querying the following meta tables:
- __PARTITIONS_SUMMARY: Used when getting metadata about partitions in a partitioned table or an ingestion-time partitioned table.
- __TABLE_SUMMARY: Used when getting metadata about the tables and views in a dataset.
10. User Defined Functions (UDFs)
In summary, knowing these are available in BigQuery should be really helpful in ensuring data analytics solutions are built in a cost-effective way.
1. Read the Official BigQuery Documentation
2. Learn more about Ancoris Data, Analytics & AI