Options

This page describes the available configuration settings for Artie Transfer to use.

Below, these are the various options that can be specified within a configuration file. Once it has been created, you can run Artie Transfer like this:

/transfer -c /path/to/config.yaml

Note: Keys here are formatted in dot notation for readability purposes, please ensure that the proper nesting is done when writing this into your configuration file. To see sample configuration files, visit the Examples page.

Key Optional Description

Key	Optional	Description
`outputSource`	N	This is the destination. Supported values are currently: `snowflake` `bigquery` `s3` `test` (stdout)
`queue`	Y	Defaults to `kafka`. Other valid options are `kafka` and `pubsub`. Please check the respective sections below on what else is required.
`reporting.sentry.dsn`	Y	DSN for Sentry alerts. If blank, will just go to stdout.
`flushIntervalSeconds`	Y	Defaults to `10`. Valid range is between `5 seconds` to `6 hours`.
`bufferRows`	Y	Defaults to `15,000`. When using BigQuery and Snowflake, there is no limit.
`flushSizeKb`	Y	Defaults to `25mb`. When the in-memory database is greater than this value, it will trigger a flush cycle.

outputSource

This is the destination. Supported values are currently:

snowflake
bigquery
s3
test (stdout)

queue

Defaults to kafka.

Other valid options are kafka and pubsub.

Please check the respective sections below on what else is required.

reporting.sentry.dsn

DSN for Sentry alerts. If blank, will just go to stdout.

flushIntervalSeconds

Defaults to 10. Valid range is between 5 seconds to 6 hours.

bufferRows

Defaults to 15,000. When using BigQuery and Snowflake, there is no limit.

flushSizeKb

Defaults to 25mb. When the in-memory database is greater than this value, it will trigger a flush cycle.

Kafka

kafka:
  bootstrapServer: localhost:9092,localhost:9093
  groupID: transfer
  username: artie
  password: transfer
  enableAWSMSKIAM: false

bootstrapServer

Pass in the Kafka bootstrap server. For best practices, pass in a comma separated list of bootstrap servers to maintain high availability. This is the same spec as Kafka. Type: String Optional: No

groupID

This is the name of the Kafka consumer group. You can set to whatever you'd like. Just remember that the offsets are associated to a particular consumer group. Type: String Optional: No

username + password

If you'd like to use SASL/SCRAM auth, you can pass the username and password. Type: String Optional: Yes

enableAWSMSKIAM

Turn this on if you would like to use IAM authentication to communicate with Amazon MSK. If you enabel this, make sure to pass in AWS_REGION, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Type: Boolean Optional: Yes

Topic Configs

TopicConfigs are used at the table level and store configurations like:

Destination's database, schema and table name.
What does the data format look like? Is there an idempotent key?
Whether it should do row based soft deletion or not.
Whether it should drop deleted columns or not.

These are stored in this particular fashion. See Examples for more details.

kafka:
  topicConfigs:
  - { }
  - { }
# OR as
pubsub:
  topicConfigs:
  - { }
  - { }

Key Optional Description

Key	Optional	Description
`*.topicConfigs[0].db`	N	Name of the database in destination.
`*.topicConfigs[0].tableName`	Y	Name of the table in destination. If not provided, will use table name from event If provided, tableName acts an override
`*.topicConfigs[0].schema`	N	Name of the schema in Snowflake. Not needed for BigQuery.
`*.topicConfigs[0].topic`	N	Name of the Kafka topic.
`*.topicConfigs[0].idempotentKey`	N	Name of the column that is used for idempotency. This field is highly recommended. For example: `updated_at` or another timestamp column.
`*.topicConfigs[0].cdcFormat`	N	Name of the CDC connector (thus format) we should be expecting to parse against. Currently, the supported values are: `debezium.postgres` `debezium.mongodb` `debezium.mysql`
`*.topicConfigs[0].cdcKeyFormat`	N	Format for what Kafka Connect will the key to be. This is called `key.converter` in the Kafka Connect properties file. The supported values are: `org.apache.kafka.connect.storage.StringConverter`, `org.apache.kafka.connect.json.JsonConverter` If not provided, the default value will be `org.apache.kafka.connect.storage.StringConverter`.
`*.topicConfigs[0].dropDeletedColumns`	Y	Defaults to `false`. When set to `true`, Transfer will drop columns in the destination when Transfer detects that the source has dropped these columns. This column should be turned on if your organization follows standard practice around database migrations. This is available starting `transfer:1.4.4`.
`*.topicConfigs[0].softDelete`	Y	Defaults to `false`. When set to `true`, Transfer will add an additional column called `__artie_delete` and will set the column to true instead of issuing a hard deletion. This is available starting `transfer:1.4.4`.
`*.topicConfigs[0].skippedOperations`	Y	Comma-separated string for Transfer to specified operations. Valid values are: c (create) r (replication or backfill) u (update) d (delete) Can be specified like: `c,d` to skip create and deletes. This is available starting `transfer:2.2.3`
`*.topicConfigs[0].skipDelete` This is getting deprecated in the next Transfer version. Use `skippedOperations` instead.	Y	Defaults to `false`. When set to `true`, Transfer will skip the delete events. This is available starting `transfer:2.0.48`
`*.topicConfigs[0].includeArtieUpdatedAt`	Y	Defaults to `false`. When set to `true`, Transfer will emit an additional timestamp column named `__artie_updated_at` which signifies when this row was processed. This is available starting `transfer:2.0.17`
`*.topicConfig[0].includeDatabaseUpdatedAt`	Y	Defaults to `false`. When set to `true`, Transfer will emit an additional timestamp column called `__artie_db_updated_at` which signifies the database time of when the row was processed. This is available starting `transfer:2.2.2+`
`*.topicConfigs[0].bigQueryPartitionSettings`	Y	Enable this to turn on BigQuery table partitioning. This is available starting `transfer:2.0.24`

*.topicConfigs[0].db

Name of the database in destination.

*.topicConfigs[0].tableName

Name of the table in destination.

If not provided, will use table name from event
If provided, tableName acts an override

*.topicConfigs[0].schema

Name of the schema in Snowflake. Not needed for BigQuery.

*.topicConfigs[0].topic

Name of the Kafka topic.

*.topicConfigs[0].idempotentKey

Name of the column that is used for idempotency. This field is highly recommended. For example: updated_at or another timestamp column.

*.topicConfigs[0].cdcFormat

Name of the CDC connector (thus format) we should be expecting to parse against. Currently, the supported values are:

debezium.postgres
debezium.mongodb
debezium.mysql

*.topicConfigs[0].cdcKeyFormat

Format for what Kafka Connect will the key to be. This is called key.converter in the Kafka Connect properties file. The supported values are: org.apache.kafka.connect.storage.StringConverter, org.apache.kafka.connect.json.JsonConverter If not provided, the default value will be org.apache.kafka.connect.storage.StringConverter.

*.topicConfigs[0].dropDeletedColumns

Defaults to false. When set to true, Transfer will drop columns in the destination when Transfer detects that the source has dropped these columns. This column should be turned on if your organization follows standard practice around database migrations. This is available starting transfer:1.4.4.

*.topicConfigs[0].softDelete

Defaults to false. When set to true, Transfer will add an additional column called __artie_delete and will set the column to true instead of issuing a hard deletion. This is available starting transfer:1.4.4.

*.topicConfigs[0].skippedOperations

Comma-separated string for Transfer to specified operations. Valid values are:

c (create)
r (replication or backfill)
u (update)
d (delete)

Can be specified like: c,d to skip create and deletes. This is available starting transfer:2.2.3

*.topicConfigs[0].skipDelete

This is getting deprecated in the next Transfer version. Use skippedOperations instead.

Defaults to false. When set to true, Transfer will skip the delete events. This is available starting transfer:2.0.48

*.topicConfigs[0].includeArtieUpdatedAt

Defaults to false. When set to true, Transfer will emit an additional timestamp column named __artie_updated_at which signifies when this row was processed. This is available starting transfer:2.0.17

*.topicConfig[0].includeDatabaseUpdatedAt

Defaults to false. When set to true, Transfer will emit an additional timestamp column called __artie_db_updated_at which signifies the database time of when the row was processed. This is available starting transfer:2.2.2+

*.topicConfigs[0].bigQueryPartitionSettings

Enable this to turn on BigQuery table partitioning. This is available starting transfer:2.0.24

BigQuery Partition Settings

This is the object stored under Topic Config.

bigQueryPartitionSettings:
  partitionType: time
  partitionField: ts
  partitionBy: daily

partitionType

Type of partitioning. We currently support only time-based partitioning. The valid values right now are just time. Type: String Optional: Yes

partitionField

Which field or column is being partitioned on. Type: String Optional: Yes

partitionBy

This is used for time partitioning, what is the time granularity? Valid values right now are just dailyType: String Optional: Yes

Google Pub/Sub

pubsub:
  projectID: 123
  pathToCredentials: /path/to/pubsub.json
  topicConfigs:
  - { }

projectID

This is your GCP Project ID, click here to see how you can find it.Type: String Optional: No

pathToCredentials

This is the path to the credentials for the service account to use. You can re-use the same credentials as BigQuery, or you can use a different service account to support use cases of cross-account transfers. Type: String Optional: No

topicConfigs

Follow the same convention as kafka.topicConfigs above.

BigQuery

Key Optional Description

Key	Optional	Description
`bigquery.pathToCredentials`	Y	Path to the credentials file for Google. You can also directly inject `GOOGLE_APPLICATION_CREDENTIALS` ENV VAR, else Transfer will set it for you based on this value provided.
`bigquery.projectID`	N	Google Cloud Project ID
`bigquery.location`	Y	Location of the BigQuery dataset. Defaults to `us`.
`bigquery.defaultDataset`	N	The default dataset used. This just allows us to connect to BigQuery using data source notation (DSN).
`bigquery.batchSize`	Y	Batch size is used to chunk the request to BigQuery's Storage API to avoid the 10 mb limit. If this is not passed in, we will just default to `1,000`.

bigquery.pathToCredentials

Path to the credentials file for Google. You can also directly inject GOOGLE_APPLICATION_CREDENTIALS ENV VAR, else Transfer will set it for you based on this value provided.

bigquery.projectID

Google Cloud Project ID

bigquery.location

Location of the BigQuery dataset. Defaults to us.

bigquery.defaultDataset

The default dataset used.

This just allows us to connect to BigQuery using data source notation (DSN).

bigquery.batchSize

Batch size is used to chunk the request to BigQuery's Storage API to avoid the 10 mb limit. If this is not passed in, we will just default to 1,000.

Shared Transfer config

sharedTransferConfig:
  additionalDateFormats:
    - 02/01/06 # DD/MM/YY
    - 02/01/2006 # DD/MM/YYYY
  createAllColumnsIfAvailable: true

additionalDateFormats

By default, Artie Transfer supports a wide array of date formats. If your layout is supported, you can specify additional ones here. If you're unsure, please refer to this guide. Type: List of layouts Optional: Yes

createAllColumnsIfAvailable

By default, Artie Transfer will only create the column within the destination if the column contains a not null value. You can override this behavior by setting this value to true. Type: Boolean Optional: Yes

Snowflake

Please see: Snowflake on how to gather these values.

Key Optional Description

Key	Optional	Description
`snowflake.account`	N	Account Identifier
`snowflake.username`	N	Snowflake username
`snowflake.password`	N	Snowflake password
`snowflake.warehouse`	N	Virtual warehouse name
`snowflake.region`	N	Snowflake region.

snowflake.account

Account Identifier

snowflake.username

Snowflake username

snowflake.password

Snowflake password

snowflake.warehouse

Virtual warehouse name

snowflake.region

Snowflake region.

Redshift

Key Optional Description

Key	Optional	Description
`redshift.host`	N	Host URL e.g. `test-cluster.us-east-1.redshift.amazonaws.com`
`redshift.port`	N	-
`redshift.database`	N	Namespace / Database in Redshift.
`redshift.username`	N
`redshift.password`	N
`redshift.bucket`	N	Bucket for where staging files will be stored. Click here to see how to set up a S3 bucket and have it automatically purged based on expiration.
`redshift.optionalS3Prefix`	Y	The prefix for S3, say bucket is foo and prefix is bar. It becomes: s3://foo/bar/file.txt
`redshift.credentialsClause`	N	Redshift credentials clause to store staging files into S3. Source
`redshift.skipLgCols`	Y	Defaults to false. If this is passed in, Artie Transfer will mask the column value with: 1. If value is a string, `__artie_exceeded_value` 2. if value is a struct / super, `{"key":"__artie_exceeded_value"}`

redshift.host

Host URL e.g. test-cluster.us-east-1.redshift.amazonaws.com

redshift.port

redshift.database

Namespace / Database in Redshift.

redshift.username

redshift.password

redshift.bucket

Bucket for where staging files will be stored.

Click here to see how to set up a S3 bucket and have it automatically purged based on expiration.

redshift.optionalS3Prefix

The prefix for S3, say bucket is foo and prefix is bar.

It becomes: s3://foo/bar/file.txt

redshift.credentialsClause

Redshift credentials clause to store staging files into S3. Source

redshift.skipLgCols

Defaults to false. If this is passed in, Artie Transfer will mask the column value with: 1. If value is a string, __artie_exceeded_value 2. if value is a struct / super, {"key":"__artie_exceeded_value"}

S3

s3:
  optionalPrefix: foo # Files will be saved under s3://artie-transfer/foo/...
  bucket: artie-transfer
  awsAccessKeyID: AWS_ACCESS_KEY_ID
  awsSecretAccessKey: AWS_SECRET_ACCESS_KEY

optionalPrefix

Prefix after the bucket name. If this is specified, Artie Transfer will save the files under s3://artie-transfer/optionalPrefix/...Type: String Optional: Yes

bucket

S3 bucket name. Example: foo. Type: String Optional: No

awsAccessKeyID

The AWS_ACCESS_KEY_ID for the service account. Type: String Optional: No

awsSecretAccessKey

The AWS_SECRET_ACCESS_KEY for the service account. Type: String Optional: No

Telemetry

Overview of Telemetry can be found here: Overview.

Key Type Optional Description

Key	Type	Optional	Description
`telemetry.metrics`	Object	Y	Parent object. See below.
`telemetry.metrics.provider`	String	Y	Provider to export metrics to. Transfer currently only supports: `datadog`.
`telemetry.metrics.settings`	Object	Y	Additional settings block, see below
`telemetry.metrics.settings.tags`	Array	Y	Tags that will appear for every metrics like: `env:production`, `company:foo`
`telemetry.metrics.settings.namespace`	String	Y	Optional namespace prefix for metrics. Defaults to `transfer.` if none is provided.
`telemetry.metrics.settings.addr`	String	Y	Address for where the statsD agent is running. Defaults to `127.0.0.1:8125` if none is provided.
`telemetry.metrics.settings.sampling`	Number	Y	Percentage of data to send. Provide a number between 0 and 1. Defaults to `1` if none is provided. Refer to this for additional information.

telemetry.metrics

Object

Parent object. See below.

telemetry.metrics.provider

String

Provider to export metrics to. Transfer currently only supports: datadog.

telemetry.metrics.settings

Object

Additional settings block, see below

telemetry.metrics.settings.tags

Array

Tags that will appear for every metrics like: env:production, company:foo

telemetry.metrics.settings.namespace

String

Optional namespace prefix for metrics. Defaults to transfer. if none is provided.

telemetry.metrics.settings.addr

String

Address for where the statsD agent is running. Defaults to 127.0.0.1:8125 if none is provided.

telemetry.metrics.settings.sampling

Number

Percentage of data to send. Provide a number between 0 and 1. Defaults to 1 if none is provided. Refer to this for additional information.

Last updated 4 days ago