Links

Google Cloud Storage (GCS)

You can create a GCS (Google Cloud Storage) Destination Connector to write to write CSV files (by default) or JSONL files to a Cloud Storage Bucket.
To set up this Connector using a GCP Service Account Key associated with a GCP Service Account that has access to the project(s) where you wish the resources reside. To learn more about creating and managing service accounts within GCP, visit: https://cloud.google.com/iam/docs/creating-managing-service-accounts.
Supported file formats: CSV and JSONL

Prerequisites

Required information:
  • Service Account Key with the proper privileges
  • Existing GCS Bucket Name

Creating a GCS Destination Connector

Step 1: Navigate to the Connectors list page, then click + New Connector
Step 2: Under the System prompt, click GCS
Step 3: Enter a Connector Name
Step 4: Select Destination Connector

Authentication

Authentication is accomplished using Service Account Keys. Provide the Service Account JSON key for the account you wish to connect to.
  • Service accounts associated with a GCS Source Connector will need the proper Cloud Storage Privileges in order to successfully establish a connection.
Creating a Service Account Key in the Google Cloud Console
  1. 1.
    To create a Service Account JSON key, first navigate to the Service Accounts page in the Google Cloud Console.
  2. 2.
    Click the project dropdown in the top navigation bar to view all of your projects, choose the project you want to create a service account key for, and then click Open.
  3. 3.
    Find the row of the service account that you want to create a key for. In that row, click the More button, and then click Create key.
  4. 4.
    Select the JSON Key type and click Create.
Note: to set up a Source Connector using your service account, the service account you select needs to have access to the project you want to connect to.

Bucket Name

Step 1: To find the Bucket Name, first select the Google Cloud Navigation menu, then scroll to Cloud Storage and select Buckets.
Step 2: On the Buckets page, select the name of the bucket you would like to connect to.
Step 3: The Bucket Name can then be copied from the top of the resulting page.
See the Bucket Name highlighted in blue, here called example_bucket_name

Building the Schema for the Destination Connector

Use the schema designer to build the output schema for this Destination Connector.
Parameter
Description
Field Name
Provide a field name for the output fields. These names will be used as the column headers or field names in the output file you are writing to.
Type
Define the type of each field. The field types will be used to enforce rules when you send data to this Connector.
Nullable
Check this box if the field is nullable. If the field is not nullable, you will be required to provide values for this field when sending data to this Connector.
Delete
Deletes the field.
Add Field
Adds another field to the schema.
Step 1: Click Add Field for each additional field required in the schema Step 2: Select Create Schema once you have built the schema.

Advanced Options

Output File Format
By default, this Destination Connector writes CSV files, and each Osmos Pipeline run produces a new file. If preferred, you can choose to change the output to a JSONL file instead of a CSV file.

File Prefix Format String

We support the designation of file prefixes in order to more easily manage the output of this connector. The contents of this field will be written into the filename of the data this connector writes. If a prefix is specified, a UUID will be appended to it to prevent filename conflicts. You can include a UUID that corresponds to the UUID of the job by including {jobId} in your prefix format string. Strftime syntax is allowed here.

Limit Records Per File

By default, we do not set a limit on the number of records to be written to a single destination file by a single job (i.e. a single run of a Pipeline or Uploader). If this box is checked, the data written to the destination will be "chunked" into separate files which contain at-most the number of records designated here. These "chunked" files will be suffixed with it's position in the sequence i.e. filename_part_1.csv, filename_part_2.csv, etc.

Validation Webhook

We support the use of Validation Webhooks to prevent bad data from being written to your systems, adding another layer of protection to the built-in validations that Osmos provides. The Webhook URL can be posted here.
For more information on Validation Webhook configuration, see Server Side Validation Webhooks​

Overwrite Output Column with Raw Input Data

Enter the name of the destination column where you'd like to store the entire raw source record data. The raw source record data will be stored as a JSON string in the provided destination column.

Additional Options

Organizing File Structure
A user can chunk files and output to different folders based on job_id. Osmos leverages a magic string {jobid} in the file prefix and the file output names for these file based Destination Connectors. To set a file prefix, go to the Destination Connector > Show Advanced Options > populate prefix information in the File Prefix Format String field.
Output Scenarios:
  1. 1.
    No file prefix Output: <user base path>/chunk-<chunk num>-<GUID>.<file extension>
  2. 2.
    File includes description in the prefix Sample prefix: my_osmos_output_ Output: <user base path>/my_osmos_output_chunk-<chunk num>-<GUID>.<file extension>
  3. 3.
    File includes description and job_id in the prefix Sample prefix: my_osmos_output/{jobId}/ Output: <user base path>/my_osmos_output/<ACTUAL JOB_ID HERE>/chunk-<chunk num>-<GUID>.<file extension>