Links

Server Side Validation Webhooks

In this section you will learn how to provide us a hosted validation webhook to validate incoming data against
Validation Webhooks provide support for running arbitrary per-field validation logic on data before it is written to destination connectors. This provides the ability to prevent bad data from being written to your systems, adding another layer of protection to the built-in validations that Osmos provides. They are compatible with both Pipelines as well as the Uploader, and they can work with data from any source connector.
By using validation webhooks, any desired validation logic can be added to any destination connector. This includes things like querying internal databases, making requests to private/internal APIs, performing conditional checks based off of values from multiple fields, and much more.
Validation webhooks are called both during the data cleanup process and during transformation - just before data is written to the destination connector. While the user is mapping columns, applying QuickFixes, and performing other data cleanup actions, validation webhooks will be called to make sure that the output of the configured transforms are valid for all rows.
Data can also be transformed through a webhook configuration, see Transformation Writeback for more details.

Configuring

Validation webhooks are set up during the Connector configuration process while creating a new Destination Connector. Click the "Show Advanced Options" button at the bottom of the Connector configurator UI.
Destination Connector > Show Advanced Options
To initiate the validation webhook, provide the HTTP endpoint at which your webhook is available. Your validation webhook must be publicly available on the internet. If your endpoint is behind a firewall or other restrictive system, please contact Osmos support for help with whitelisting IPs.
There are two additional configuration options available when adding the webhook URL, the batch size and the max parallel requests. Note: These fields are only available once you enter the URL.
Validation Webhook Batch Size adds support for the user to set the max rows per request. In this field, the maximum number of rows entered will be sent to the provided validation endpoint in each request. The field defaults to 100,000 rows.
Validation Webhook Max Parallel Requests adds support for running multiple validation webhook requests in parallel. This means that multiple chunks of rows to validate will be set to the customer API simultaneously. In this field, populate the maximum number of concurrent requests that will be sent to the provided validation endpoint. The field defaults to 1 request at a time, which assumes no parallelism.

API Specification

Validation Webhooks use a simple JSON-based schema for providing data for validation and receiving validation outcomes. Data is provided in batches of up to 100,000 rows and sent to the endpoint as a HTTP POST request.

Request Schema

The request body is a two-dimensional JSON-encoded array of data to validate: a top level array of rows, each of which contains an array of objects containing field name and field value.
The following TypeScript types represent the schema of requests that Osmos systems will make to customer validation webhooks:
interface FieldToValidate {
fieldName: string;
value: string;
}
​
type RowToValidate = FieldToValidate[];
​
type ValidationRequestBody = RowToValidate[];
An example request consisting of two rows with two fields may look like this:
[
[
{ "fieldName": "color", "value": "green" },
{ "fieldName": "shape", "value": "square" }
],
[
{ "fieldName": "color", "value": "blue" },
{ "fieldName": "shape", "value": "circle" }
]
]

Response Schema

Your validation webhook is expected to return a two-dimensional JSON array of validation outcomes, matching the shape of the request body. It should consist of a top level array of rows, each of which contains an array of validation outcomes for each field in that row. The ordering of rows and fields should match that of the request.
There are four possible outcomes for each field:
  • Success: the field is valid and can be written to the destination system
  • Warning: the field isn't invalid and won't be blocked from being written to the destination system. However, a message will be displayed to the user during the data cleanup process to indicate the situation.
  • Error: the field is invalid and will be rejected from being written to the destination system. An error message will be shown during the training process and the user will be blocked from saving the transformation until the error is resolved. During transformation, the record will be marked as an error and the underlying pipeline/uploader will need to be retrained.
  • Writeback: the specified replacement value will be written back into the cell, overwriting what was there previously. See the Transformation Writeback page for additional details about this functionality.
For the error and warning cases, a message can optionally be provided to aid the user performing the cleanup by explaining the reason the field is invalid or providing some extra context. Additionally, the error case may include array strings that are valid for that field. When included, these values will be shown to the user in a dropdown menu on the transformation builder.
The following TypeScript types correspond to what is expected as a response from validation webhooks:
type FieldValidationOutput =
| boolean
| {
isValid: boolean;
errorMessage?: string;
warningMessage?: string;
validOptions?: string[];
}
| {
replacement: string;
infoMessage?: string;
};
type RowValidationOutput = FieldValidationOutput[];
​
type ValidationResponse = RowValidationOutput[];
Boolean values can be provided as validation output for values within a field, with true corresponding to valid and false corresponding to invalid. However, it is recommended that you provide error or warning messages in order to help users know how to resolve the validation failure.
Here is a possible response for the example request shown above:
[
[
{
"isValid": true,
"warningMessage": "The color green will not be supported in the future"
},
true
],
[
{ "isValid": true },
{ "isValid": false, "errorMessage": "All circles must be red" }
]
]
You can see in this example, all four fields from the request have been validated. The number and ordering of rows matches that of the request, along with the number and ordering of fields within those rows.

Invalid Response Handling

In the case of a validation endpoint returning a non-200 response code, being unreachable, or failing for some other reason, the validation request to the endpoint will be retried multiple times after a delay. All error types will be retried.
If the issue persists after several attempts, the default behavior is to reject all records being validated as invalid. Details about the error that was encountered will be included in error records which can be viewed on the connector details page of the destination connector or on the retrain page for the Pipeline or Uploader.
For cases where the lengths of the returned validation array are not equal to the number of elements provided in the request array, the behavior is undefined. Please make sure that you return exactly one validation outcome for every field in the request.

Limitations

Validation Webhooks work with all destination connectors except the HTTP API connectors. These connectors are implemented in a way which prevents validations from being run efficiently while still functioning according to their design.