Persistent User Defined Functions Now Supported in BigQuery!

Finally, BigQuery users have option to save their custom functions so they can be re-used across the project. Up until now, you could only create a UDF function within a query but you could not save the query as a view for example. Luckily, the times are changing now. But why should you care?

Creating Persistent User Defined Functions in BigQuery

So what’s the magic? Assume you have this very basic table:
UDF - 1 - Table

You want to create a new column called “fruit” which will tell you whether the “sentence” column contains apple, banana or orange. Normally, you’d probably write a case statement like this:
UDF - 2 - Case

The problem with this approach is that when you need to do the same classification across many queries and views, you’d have to:

  1. Copy this CASE statement everywhere you want use this classification
  2. Edit all CASE statements in multiple places in case your classification is changing

Nowadays though, you can:

  1. Create a custom classification function.
  2. Save it just once.
  3. Reuse it across as many queries as you want.
  4. Propagate changes everywhere in case you edit your custom function.

So instead of writing standard SELECT … FROM …, you need to write a “CREATE FUNCTION” statement first. It looks like this:
UDF - 3 - Custom UDF

FOLLOW US JOIN OUR FACEBOOK GROUP

Now, let’s decompose the elements of the statement:

  1. CREATE FUNCTION – you tell BQ that you want to create a new function. Alternatively, you can use CREATE FUNCTION IF NOT EXISTS.
  2. `test-project-excelinppccom.udfs.fruitFinder` – that’s where you want to save the function. It reads as `project.dataset.functionName`
  3. (some_string STRING) – that’s the argument of the function. In my case, it’s just 1 string. You can have no arguments, 1 argument, or multiple arguments. This depends on your use case. Basically, the argument is the “value” you want to somehow evaluate or act on with your custom function.
  4. AS ( …. ) part – that’s essentially the similar CASE which does the actual magic. Unlike in the standard SELECT … FROM …, I’m calling the argument instead of a regular column.

When you are done with writing your custom function, you need to press the Run button which essentially means “save” in this scenario:
UDF - 4 - Save Custom UDF

You will then see confirmation “This statement created a new function….” and the function will also appear in the dataset which you assigned it to.

You can then click the function name to see more information about the function, or you can even edit the function:
UDF - 5 - Custom UDF Info

A now circling back to our classification problem – how to actually use the function in a standard SELECT … FROM … statement?. Like this:
UDF - 6 - Using the UDF

You can now use the function in as many queries as you want 🙂

Conclusion

Persistent functions are a much needed addition to the BigQuery environment which can help you simply your work and stay more organized. You can also read more in the BQ official documentation – if you read carefully, you’ll discover you can use even JavaScript to create custom UDFs (this might be a topic for a future article).

FOLLOW US JOIN OUR FACEBOOK GROUP

One thought on “Persistent User Defined Functions Now Supported in BigQuery!

Leave a Reply

Your email address will not be published. Required fields are marked *