BigQuery Scheduled Query Management

Sometimes you might want to run queries on recurring basis. The queries could be written in SQL.
We decided to run some queries for cleaning purposes the datasets, then developed the below scripts. If you have a lot of queries and you want to store a copy of them in a repository, the following script would be fit for you.

shoin-posts/scheduled_queries.py at master · xsetra/shoin-posts
Contains devops scripts to re-use. Contribute to xsetra/shoin-posts development by creating an account on GitHub.

Workflow

  • The script simply reads the query list from query-catalog-file. There is a file  in repo named query-catalog.json that shows the structure.
    name , schedule and pubsub_topic is optional fields.
  • It creates or deletes the queries according to operation parameter.
  • The query must be provided within the catalog file.
  • While deleting the queries, the generate_name function takes a big role. Because the query name is composed with a parent name that includes project, location, dataset_id. The script finds the correct name, then deletes the query.
usage: scheduled_queries.py 
    project_id 
    query-catalog-file 
    operation
    [-h] 
    [--service-account-name SERVICE_ACCOUNT_NAME] 
    [--pubsub-topic-id PUBSUB_TOPIC_ID] 
    [--default-schedule DEFAULT_SCHEDULE] 
    [--location LOCATION] 

Which one wins the race? File or Parameter?

Answer is file. For instance if you don't specify the name in file, the script will generate a name for it. If you specify, the file name is chosen. You can find the name rules in generate_name function.

Optional Parameters

  • Service Account Name
    It's associated to bigquery scheduled query. You have to grant correct permissions according to query requirements. For instance, bigquery user.
  • PubSub Topic Id
    When scheduled query executed, the result of query will be send to this topic.
    It's required to monitor the queries. Take a look monitoring solution.
  • Default Schedule
    If you don't specify the schedule in catalog-file, this value will be used.
  • Location
    What is the location of dataset or query execution location.

Example Usage

./scheduled_queries.py gcp-project query-catalog.json create