Finally, BigQuery users have option to save their custom functions so they can be re-used across the project. Up until now, you could only create a UDF function within a query but you could not save the query as a view for example. Luckily, the times are changing now. But why should you care?
Creating Persistent User Defined Functions in BigQuery
So what’s the magic? Assume you have this very basic table:
You want to create a new column called “fruit” which will tell you whether the “sentence” column contains apple, banana or orange. Normally, you’d probably write a case statement like this:
The problem with this approach is that when you need to do the same classification across many queries and views, you’d have to:
- Copy this CASE statement everywhere you want use this classification
- Edit all CASE statements in multiple places in case your classification is changing
Nowadays though, you can:
- Create a custom classification function.
- Save it just once.
- Reuse it across as many queries as you want.
- Propagate changes everywhere in case you edit your custom function.
So instead of writing standard SELECT … FROM …, you need to write a “CREATE FUNCTION” statement first. It looks like this:
FOLLOW US JOIN OUR FACEBOOK GROUP
Now, let’s decompose the elements of the statement:
- CREATE FUNCTION – you tell BQ that you want to create a new function. Alternatively, you can use CREATE FUNCTION IF NOT EXISTS.
- `test-project-excelinppccom.udfs.fruitFinder` – that’s where you want to save the function. It reads as `project.dataset.functionName`
- (some_string STRING) – that’s the argument of the function. In my case, it’s just 1 string. You can have no arguments, 1 argument, or multiple arguments. This depends on your use case. Basically, the argument is the “value” you want to somehow evaluate or act on with your custom function.
- AS ( …. ) part – that’s essentially the similar CASE which does the actual magic. Unlike in the standard SELECT … FROM …, I’m calling the argument instead of a regular column.
When you are done with writing your custom function, you need to press the Run button which essentially means “save” in this scenario:
You will then see confirmation “This statement created a new function….” and the function will also appear in the dataset which you assigned it to.
You can then click the function name to see more information about the function, or you can even edit the function:
A now circling back to our classification problem – how to actually use the function in a standard SELECT … FROM … statement?. Like this:
You can now use the function in as many queries as you want 🙂
Conclusion
Persistent functions are a much needed addition to the BigQuery environment which can help you simply your work and stay more organized. You can also read more in the BQ official documentation – if you read carefully, you’ll discover you can use even JavaScript to create custom UDFs (this might be a topic for a future article).
Thank for your information