Enabling BigQuery Time Partitioning

BigQuery Time Partitioning is a good way to reduce the number of bytes processed which lowers your bill and increases Artie Transfer's performance!

What is a partitioned table?

A partitioned table is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query. You partition tables by specifying a partition column which is used to segment the table.

Why should we use table partitions?

  • Improve query performance by scanning a partition.

  • When you exceed the standard table quota.

  • Gain partition-level management features such as writing to or deleting partition(s) within a table.

  • Reduce the number of bytes processed + reduce your BigQuery bill

Note: A BigQuery table can have up to 4000 partitions. So, if you picked the daily granularity, you have enough partitions for up to 10.9 years worth of unique partitions!

What are the different kinds of partitioning strategies?

Partitioning typeDescriptionExample

Partitioning a particular column that is a TIMESTAMP. BigQuery allows hourly, daily, monthly, yearly and integer range partitioning intervals.

Column: timestamp Partitioning granularity: daily

Partitioning off of a range of values for a given column.

Say you have a column called customer_id and there are 100 values. You can specify to have values 0-9 go to one partition, 10-19 the next, etc.

This is when the row was inserted. This is not recommended, because it requires storing additional metadata to know when this row was inserted. If we don't specify this upon a merge, we will end up creating duplicate copies.


Turning on BigQuery Partitions

The steps are as follows:

  1. Pause Artie Transfer deployment (this will only pause Artie Transfer, Debezium will still be running and capturing the changes).

  2. Create the table with the right partitioning strategy (e.g. leads_copy)

  3. Copy the main table into the new partitioned table

  4. Drop the old table (leads) and rename the new partitioned table. (leads_copy)

  5. Within the Artie deployment page, update the table settings

  6. Resume Artie Transfer

Pausing and resuming Artie Transfer

Editing Artie Transfer table settings

Last updated