Search…
FTP

Overview

You can create a Source FTP Connector to read from multiple files within an FTP folder using an SSH Private Key or your SFTP password.
The schema for this source Connector is defined by the newest file in the folder. All files must have the same schema (number and order of columns). Any files not matching the original schema will be ignored, and the schema cannot be changed after saving the Connector.
Supported file formats: CSV, XLSX, XLS, TXT (Comma separated), and ZIP files containing these files.

Prerequisites

Information needed:
  • FTP Host
  • Port Number
  • Username
  • Password
  • Directory

Creating an FTP Source Connector

Step 1: Navigate to the Connectors list page, then click + New Connector
Step 2: Under the System prompt, click FTP
Step 3: Enter a Connector Name.
Step 4: Select Source Connector.
Step 5: Select the FTP protocol used by the server.
Step 6: Provide the FTP host.
Step 7: Provide the port number.
Step 8: Provide your FTP username.
Step 9: Provide the directory path for the FTP folder you want to connect to. (Note: the path is case sensitive)

Authentication

If you selected SFTP as the server protocol, you have the option to authenticate with an SSH Private key or with a Username and Password.
If you selected FTP as the server protocol, you can only authenticate with a Username and Password.
SSH Private Key
If you choose to authenticate with an SSH Private Key, provide the SSH private key with the BEGIN prefix and END suffix, as shown below. Otherwise, provide your FTP Password.

Advanced Options

File Filtering
You can choose to process all source files, or filter the files based on the file name. Any files that do not meet the filter criteria will be ignored. Select one of the options:
  1. 1.
    Include all files: If this option is chosen, all of the files in the folder will be processed in chronological order.
  2. 2.
    Only include files that: If you choose this option, you can filter which files to process from the source folder based on three options:
    • File names starting with,
    • File names containing, or
    • File names ending with.
If you provide a zip file with a name that contains the filter criteria, all files within the zip file will be processed (if the files match with the Connector’s schema). The file filter does not filter any files within a zip file.
Null Characters
The source file may have characters that may not be valid. You can choose to keep all characters from the source, or to strip the null characters. Select one of the options:
  1. 1.
    Keep all characters from source: If this option is selected, we will retain all characters from the source file, replacing characters we cannot decode with the unicode undefined character.
  2. 2.
    Strip null characters: If this option is selected, we filter out all characters that are equal to 0. Useful when dealing with null-terminated strings.
File Headers
Within the source folder, all files can contain column header names or none of the files can contain column header names. Select one of the options:
  1. 1.
    All source files contain headers: If this option is selected, we will use the first row as column header names to label the schema within Osmos. Rows two and up will be read as data records.
  2. 2.
    No source files contain headers: If this option is selected, we autogenerate column names for the schema within Osmos. All rows, including the first row, will be read as data records.
Deduplication Method
We support three different deduplication methods. You can choose to deduplicate at file level, or record level. Select one of the following options:
  1. 1.
    File level Deduplication - If this option is selected, deduplication will be performed at a file level only. If a file name is changed, or the file itself is changed, the entire file will be processed in subsequent runs.
  2. 2.
    Record level Deduplication across all historical data - When this is selected, in addition to file-level deduplication, deduplication will be performed at a record level across all the files processed by this Pipeline. An identical record that was already processed in a previous Pipeline run will not be processed in the current file, nor will duplicated records within the same file.
    Example:
    1
    file_a.csv:
    2
    item, quantity
    3
    apple, 3
    4
    orange, 9
    5
    banana, 2
    Copied!
    1
    file_b.csv:
    2
    item, quantity
    3
    pear, 9
    4
    apple, 3
    5
    banana, 2
    Copied!
    After processing file_a.csv, if we add file_b.csv to the same directory and run a job, only the row containing pear, 9 will be processed, as apple, 3 and banana, 2 were already seen when file_a.csv was processed. The same applies within the same file - if we'd added pear, 9 to file_a.csv instead of creating file_b.csv, the net result would be the same: pear, 9 would be the only new row.
    ​
  3. 3.
    Record level Deduplication within individual files - When this is selected, in addition to file-level deduplication, deduplication will be performed at a record level, but only within the same file. If the file being processed has the same record appearing multiple times, the record will be processed only once.
    Example:
    1
    file_a.csv:
    2
    item, quantity
    3
    apple, 3
    4
    orange, 9
    5
    banana, 2
    Copied!
    1
    file_b.csv:
    2
    item, quantity
    3
    pear, 9
    4
    apple, 3
    5
    banana, 2
    Copied!
    After processing file_a.csv, if we add file_b.csv to the same directory and run a job, all three records in file_b.csv will be processed. If instead we'd added those records to file_a.csv, the duplicated records (apple, 3, banana, 2) would be skipped, and the new record pear, 9 would be the only new record processed.
Last modified 3mo ago