# Email

## Overview

Osmos can automate the ingestion of partner data shared through email in an Osmos Pipeline, leveraging the **Email Source Connector**. Upon creation of the connector the source schema is automatically defined by the email attachment. If multiple attachments are included in the initial email, the first document alphanumerically sorted by filename, will define the schema of the connector.

The following file types are supported in the email connector: CSV (as `.csv` and `.txt`), XLSX, XLS, JSONL, and ZIP files containing these file types.

The Email connector is initialized in a two part process, sending the email and connector configuration.

<figure><img src="https://353417064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MYrsDW6vGBTygB1qqSE%2Fuploads%2F4Vpn7T4u6N4S8OnVyRIw%2FEmail%20Connecotr-Docs-V1.png?alt=media&#x26;token=e7556436-92cf-49e3-803b-834279937eee" alt=""><figcaption></figcaption></figure>

## **Part 1 - Sending the Email (Required)**

**Step 1:** Send an email containing your data to an Osmos Address. This email address must be unique to the connector, and is composed of three parts, structured as:\
\<Recipient Email ID>\_\<Org Name>@data.osmos.io

1. **Recipient Email ID** - this unique identifier is the only part of the address defined by you. It is often a customer name or number. This value cannot include spaces or underscores, though dashes are acceptable.  This value will be used again in **Part 2 - Building the Connector**
2. **Org Name** - this is the identifier used to refer to your organization, contact <support@osmos.io> to get your Org name
3. **Email Connector Domain** - this will always be "@data.osmos.io"

{% hint style="danger" %}
**Your new connector will not save without forwarding or sending an email prior to selecting test and save.**
{% endhint %}

{% hint style="info" %}

1. An Osmos Email Connector Alias can only be associated with a single email connector.
2. The email address must be the only email on the To line and all other emails need to reside on the cc or bcc. The connector reads forwarded or new emails.
   {% endhint %}

## **Part 2 - Building the Connector**

**Step 1:** After selecting **+ New Connector**, under the System prompt, click **Email**

![](https://353417064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MYrsDW6vGBTygB1qqSE%2Fuploads%2FtmrHzp2V2X0Ln3q16nS9%2FEmail.png?alt=media\&token=0317d58d-de60-4cda-a076-3d90c5a44af3)

**Step 2:** Enter a **Connector Name.**

**Step 3:** Select **Source Connector.**

**Step 4**: Input the **Recipient Email ID** from **Part 1 - Sending the Email** into the Osmos Connector creation page.

**Step 5:** Select the **Ingestion Method:** Email attachment. Please reach out to <support@osmos.io> for questions if your data is in the body of an email.&#x20;

**Step 6:** Select Test and Save

{% hint style="danger" %}
If you have not already forwarded or sent data to this address, you will see the error below
{% endhint %}

<div align="center"><figure><img src="https://353417064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MYrsDW6vGBTygB1qqSE%2Fuploads%2F868HQs2qW7nGidiL0FaG%2FScreenshot%202024-07-19%20at%2010.04.57%E2%80%AFAM.png?alt=media&#x26;token=c54f38c8-7a9b-41b4-a0b6-d706e250c589" alt=""><figcaption></figcaption></figure></div>

## Advanced Options

**File Filtering**

You can choose to process all source files, or filter the files based on the file name.

1. **Include all files:** If this option is chosen, all of the files in the folder will be processed in chronological order.
2. **Only include files that:** If you choose this option, you can filter which files to process from the source folder based on three options:

   * File names starting with,
   * File names containing, or
   * File names ending with.

   Any files that do not meet the filter criteria will be ignored.

{% hint style="info" %}
If you provide a ZIP file with a name that contains the filter criteria, all files within the ZIP file will be processed (if the files match with the Connector’s schema). The file filter does not filter any files within a ZIP file.
{% endhint %}

**File Headers**

Within the source folder, all files can contain column header names or none of the files can contain column header names. **Select one of the options:**

1. **All source files contain headers:** If this option is selected, we will use the first row as column header names to label the schema within Osmos. Rows two and up will be read as data records.
2. **No source files contain headers:** If this option is selected, we autogenerate column names for the schema within Osmos. All rows, including the first row, will be read as data records.

**Delimiter for TXT Files**

The delimiter to use when reading files. Delimiters are selectable in the form of a dropdown list:

* Comma `,`
* Tab     &#x20;
* Pipe `|`
* Semicolon `;`

There are then two available options for how these delimiters should be applied:

* **Selected delimiter applies to ..TXT file only...**:By default, the delimiter selected from the dropdown list will only apply to `.txt` files, `.csv` (Comma-separated files) and `.tsv` (Tab-separated values) will continue to be processed according to their file extension designation.
* **Selected delimiter applies to all files in the folder...**: Can be selected for situations when file extension designations should be ignored, and the delimiter selected from the dropdown menu should be the exclusive delimiter for all files processed by the connector.

**Handle Invalid Characters**

The source file may have characters that may not be valid. You can choose to keep all characters from the source, or to strip the null characters. **Select one of the options:**

1. **Keep all characters from source:** If this option is selected, we will retain all characters from the source file, replacing characters we cannot decode with the unicode undefined character.
2. **Strip null characters:** If this option is selected, we filter out all characters that are equal to 0. Useful when dealing with null-terminated strings.

**Deduplication Method**

We support three different deduplication methods. You can choose to deduplicate at file level, or record level. **Select one of the following options:** &#x20;

1. **File level Deduplication -** If this option is selected, deduplication will be performed at a file level only. If the metadata or the contents of a file are changed, the entire file will be processed in subsequent runs. Note, for some filetypes changing the filename alone is not sufficient for the metadata to update. Likewise, even if a file is created with the same data and filename as another file, their metadata will differ.

2. **Record level Deduplication across all historical data -** When this is selected, in addition to file-level deduplication, deduplication will be performed at a record level across all the files processed by this Pipeline. An identical record that was already processed in a previous Pipeline run will not be processed in the current file, nor will duplicated records within the same file.

   > **Example:**&#x20;
   >
   > ```
   > file_a.csv:
   > item, quantity
   > apple, 3
   > orange, 9
   > banana, 2
   > ```
   >
   > ```
   > file_b.csv:
   > item, quantity
   > pear, 9
   > apple, 3
   > banana, 2
   > ```
   >
   > After processing `file_a.csv`, if we add `file_b.csv` to the same directory and run a job, only the row containing `pear, 9` will be processed, as `apple, 3` and `banana, 2` were already seen when `file_a.csv` was processed. The same applies within the same file - if we'd added `pear, 9` to `file_a.csv` instead of creating `file_b.csv`, the net result would be the same: `pear, 9` would be the only new row.

3. **Record level Deduplication within individual files -** When this is selected, in addition to file-level deduplication, deduplication will be performed at a record level, but only within the same file. If the file being processed has the same record appearing multiple times, the record will be processed only once.

   > **Example:**&#x20;
   >
   > ```
   > file_a.csv:
   > item, quantity
   > apple, 3
   > orange, 9
   > banana, 2
   > ```
   >
   > ```
   > file_b.csv:
   > item, quantity
   > pear, 9
   > apple, 3
   > banana, 2
   > ```
   >
   > After processing `file_a.csv`, if we add `file_b.csv` to the same directory and run a job, all three records in `file_b.csv` will be processed. If instead we'd added those records to `file_a.csv`, the duplicated records (`apple, 3`, `banana, 2`) would be skipped, and the new record `pear, 9` would be the only new record processed.

4. **No deduplication** - Do no deduplication, neither for files nor records. All rows of all files will be processed.

**Starting Cell**

We support Starting Cell offset for spreadsheet type data (`.csv`, `.xls`, `.xsv`, etc.) in order to crop unnecessary information out of a dataset and to ensure headers are correctly mapped.&#x20;

The coordinates provided will serve as the starting location from which the data will be read. By default, The data read begins at coordinates (1,1) which will result in a read of all the data in the document. The example below shows in blue where the data has been read, and in white where data has been omitted, based on a configuration of Row 2 Column 2.

<figure><img src="https://353417064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MYrsDW6vGBTygB1qqSE%2Fuploads%2FGNxEqy8HdpEJQ2v2LaNS%2Ftable.png?alt=media&#x26;token=1ec211f5-e2a5-49b5-8631-db6904c63b19" alt=""><figcaption></figcaption></figure>

Note, that even with no Starting Cell offset in place (i.e. a Row 1, Column 1 configuration) only the first row containing data will begin the data read, omitting any leading rows containing no data.

{% hint style="info" %}
Leading rows that are completely void of data will be omitted
{% endhint %}

#### Sheet Names

If no sheet names are designated, this connector will read the schema of the first sheet of a document, then will continue to search subsequent sheets for data that matches this schema. \
\
If sheet name(s) are designated, they will be read exclusively, allowing the connector to skip non-relevant sheets, and to read multiple sheets from a single workbook.

#### Parser Webhook

We support the use of a parser webhook for the purpose of pre-processing data. This field allows for the designation of a webhook URL. The webhook protocol must also be designated here. **Currently, gRPC webhooks are supported.**

{% hint style="info" %}
A webhook must first be built and configured in order to be utilized by a connector, please contact Support for more information&#x20;
{% endhint %}

## Connector Options

The connector can be deleted, edited and duplicated.

#### Duplication

To save time, the connector can be duplicated.  This new connector needs to be named and can be edited, as needed.

<figure><img src="https://353417064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MYrsDW6vGBTygB1qqSE%2Fuploads%2FFscPvfNuDhtHHJh9ZTmS%2FCleanShot%202024-01-04%20at%2020.53.21%402x.png?alt=media&#x26;token=291e2366-86fb-46c8-844b-7a0f9c4182b2" alt="" width="563"><figcaption></figcaption></figure>
