Published in Data content on October 2020. 15 minute read

User friendly BigQuery SQL functions

We were excited to see the power of BigQuery receive a further boost this week with the release of 12 new BigQuery SQL features. Google Cloud describes these as “user-friendly SQL capabilities”. So, let’s take a look at what’s now possible.

1. Add table columns via DDL

New to BigQuery is the ability to add new columns via the ALTER TABLE DDL statement. This is something data professionals with a background in traditional on-prem database platforms would expect as standard, so nice to see that Google Cloud has acknowledged this.

alter table mydataset.mytable
add column a string,
add column if not exists b geography,
add column c array<numeric>,
add column d date options(description="my description") 

Blog_icons_tip

We like the syntax for only adding a column if one doesn’t already exist, handy for idempotent deployments. Full support for records too. Learn more from the
BigQuery documentation.

2. TRUNCATE table

TRUNCATE TABLE is now supported, which will please those from an on-prem background. Unlike a DELETE DML statement in BigQuery which incurs a scan cost, TRUNCATE falls under the growing list of BigQuery free operations. We will certainly be using this one a lot.

TRUNCATE TABLE [project_name.] dataset_name.] table_name

Blog_icons_cost

To reduce query scan costs we recommend using TRUNCATE over delete whenever you need to delete the entire contents of a table.

3. Unicode table naming

BigQuery now supports Unicode table naming. Acknowledging multi-language support, table names are no longer just restricted to letters, numbers and underscores. 

Blog_icons_warning
Be careful with table naming and try and make sure they are consistent, readable and follow a naming convention within your organisation.

4. BigQuery federated tables just got (even) better

At Ancoris we love federated (EXTERNAL) tables in BigQuery; they act as a powerful, zero-data-engineering approach to easily ingest files from Google Cloud Storage, including commonly seen formats such as JSON and CSV.

Blog_icons_tip
Great to see that you can now create external tables using a DDL statement.

Here’s an example to create a table that reads csvs from two different storage URIs:

create external table dataset.CsvTable options(
format = 'CSV',
uris = ['gs://bucket/path1.csv', 'gs://bucket/path2.csv']);

To learn more, see these examples of creating a federated table. We plan to do a tech how-to to over the coming weeks to show just what is possible. 

 

5. EXPORT DATA from BigQuery to Google Cloud Storage using SQL

This is a little buried in the release notes, but in our opinion is a really powerful feature. You can now export the results of a BigQuery query to Google Cloud Storage using the new EXPORT DATA SQL command; all of the Bigquery supported data formats and compression types are supported.

Here’s an example taken from the BigQuery EXPORT DATA documentation.

export data options(
uri='gs://bucket/folder/*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') as
select field1, field2
from table1
order by field1 limit 10

Blog_icons_tip
Great to see there are no egress costs; you only pay for any BigQuery data scanned and data stored in GCS.

This appears to reinforce Google Cloud’s aspirations of reducing the need for complex data engineering, something we (and many of our clients) share. If you couple this with, say, a Cloud Function triggered from the target GCS bucket, this could provide a simple mechanism for delivering data into data pipelines outside of BigQuery.

6. EXECUTE IMMEDIATE (dynamic SQL)

BigQuery now supports the execution of dynamic SQL. The syntax is very familiar, especially to those from a MS SQL background.

-- create a temporary table called Books.
execute immediate
‘create temp table books (title string, publish_date int64)’;

-- add a row for Hamlet 
execute immediate
‘insert into books (title, publish_date) values('Hamlet', 1599)’;

See BigQuery dynamic SQL support for further examples.

Blog_icons_tip
As a general rule, we are very cautious of using dynamic SQL and we try and avoid it where possible (mainly because of the risk of SQL injection and it tends to make code more difficult to read and debug).

7. Authorized User-Defined Functions (UDFs)

For those of you not familiar with BigQuery UDFs, these are a scalar function in BigQuery that allow you to operate over data using either SQL or Javascript (and associated Javascript libraries). 

BigQuery now supports authorized UDFs, which allows authorized consumers (via IAM) to query tables in the dataset, even if the user who calls the UDF does not have access to those tables. Those of you from an MS SQL background, this is similar to permissioning say, a table-value function in SQL Server; often these are used as a security layer over (locked down) tables and enforce row/column level security.

8. Duplicate column names in query results

BigQuery will now allow you to select the same (unaliased) column more than once in a query by appending a suffix of _n where n is the number of duplications observed.

 

9. New BigQuery LAST_DAY DATE function

BigQuery already has excellent support for DATE manipulations using SQL. It looks like in this release, Google Cloud has recognised a common business-lead use case;  finding the last date for a given period e.g. the last day of the month or last day of the week.

The new LAST_DAY function has intuitive syntax, and is easy to read:

select last_day(’2020-10-23’, week) as last_day_of_week
2020-10-24

 
Blog_icons_warning

Note by default, this will return the next Saturday because weeks start on Sunday in BigQuery.

To force the week to start on Monday:


-- the last day of the week
-- (week starting monday)

select last_day(’2020-10-23’,
       week(monday)) as last_day_of_week

2020-10-25
 

10. Date arithmetic in BigQuery

We are used to using data_add and date_sub functions for doing date arithmetic, but you can now do this using the + and - operators. 

-- you can now do this 

select ’2020-10-23’ + 2
2020-10-25

-- instead of this
select date_add(’2020-10-23’, interval 2 day) 
2020-10-25

11. New BigQuery string functions

A number (14 in all) of new BigQuery SQL functions for manipulating strings. Here are the ones we particularly like:

BigQuery concatenation operator “||”

As a team we think this method for concatenating strings is easier to read than the concat function.

-- the new way
select ’The ’||‘quick ’||‘brown ’||‘fox’ as quip

-- the old way
select concat(’The ’,‘quick ’,‘brown ’,‘fox’) as quip

BigQuery INSTR

This returns the index of a search value in a string. Note the first character in the string is indexed as 1 and 0 is returned if the string is not found.

The third argument is the position and denotes where to start the search. The default is 1. If a negative number is supplied, this denotes to start -n chars from the end of the string.

The last argument is the occurrence and if supplied specifies the occurrence of the string to search for e.g. 2 implies to find the index of the second occurrence of the search value.

select instr(’banana’,‘an’)
2
select instr(’banana’,‘an’,3)
4
select instr(’banana’,‘an’,1, 2)
4
select instr(’banana’,‘an’,1, 3)
0

BigQuery Soundex

This made our list because we all thought it’s pretty neat. This function returns the phonetic of a word represented as a soundex code (alpha followed by 3 digits). The algorithm was first patented in 1918 and is useful in fuzzy logic matching, for example to detect typos of people’s name:

select soundex(’Brian’)
B650
select soundex(’Bryan’)
B650
select soundex(’Briann’)
B650
select soundex(’Brrian’)
B650
select soundex(’Brenda’)
B653

12. Expanded INFORMATION SCHEMA

INFORMATION_SCHEMA, specified in the ANSI SQL standard, allows users to query the metadata about various objects or entities that contain or reference data, such as tables, views, stored procedures and user-defined functions. Here is the BigQuery INFORMATION_SCHEMA documentation.

BigQuery has made this easier with the introduction of a number of new objects that cover tables, views, routines (stored procedures, UDFs) and datasets.

select table_name
from mydataset.INFORMATION_SCHEMA.TABLES
where table_name like 'scenario%'

Blog_icons_warning

Note the tables are uppercase (BigQuery is case sensitive).

Our opinion

In summary, a really good addition to the BigQuery SQL arsenal and we will certainly be using this on client projects. Some of these topics we will cover in our upcoming Data how-to series.

Read the official release notes

Free resources

Please download any of our resources to help with your research and project specifications