BigQuery Scheduled Query Management
Sometimes you might want to run queries on recurring basis. The queries could be written in SQL.
We decided to run some queries for cleaning purposes the datasets, then developed the below scripts. If you have a lot of queries and you want to store a copy of them in a repository, the following script would be fit for you.
shoin-posts/scheduled_queries.py at master · xsetra/shoin-posts
Contains devops scripts to re-use. Contribute to xsetra/shoin-posts development by creating an account on GitHub.
Workflow
- The script simply reads the query list from
query-catalog-file
. There is a file in repo namedquery-catalog.json
that shows the structure.name
,schedule
andpubsub_topic
is optional fields. - It creates or deletes the queries according to
operation
parameter. - The query must be provided within the catalog file.
- While deleting the queries, the
generate_name
function takes a big role. Because the query name is composed with a parent name that includes project, location, dataset_id. The script finds the correct name, then deletes the query.
usage: scheduled_queries.py
project_id
query-catalog-file
operation
[-h]
[--service-account-name SERVICE_ACCOUNT_NAME]
[--pubsub-topic-id PUBSUB_TOPIC_ID]
[--default-schedule DEFAULT_SCHEDULE]
[--location LOCATION]
Which one wins the race? File or Parameter?
Answer is file. For instance if you don't specify the name in file, the script will generate a name for it. If you specify, the file name is chosen. You can find the name rules in generate_name
function.
Optional Parameters
- Service Account Name
It's associated to bigquery scheduled query. You have to grant correct permissions according to query requirements. For instance, bigquery user. - PubSub Topic Id
When scheduled query executed, the result of query will be send to this topic.
It's required to monitor the queries. Take a look monitoring solution. - Default Schedule
If you don't specify the schedule in catalog-file, this value will be used. - Location
What is the location of dataset or query execution location.
Example Usage
./scheduled_queries.py gcp-project query-catalog.json create