Datasets Query Builder
Last updated
Last updated
Performing a Query typically involves pulling data objects from a table or a series of tables. In Osmos, once you have created your Dataset(s) and have populated at least one Table, you are ready to build a query.
Queries are created in the Query Builder interface in Datasets.
Create a Dataset and upload data
On the Datasets panel, click on the name of one of the tables you will be querying.
The table that was initially selected will be pre-populated into the query. Additional tables can be added by linking values in the tables via join logic.
Step 3: Select Columns
The columns available from table selection and any join activity will appear in a dropdown. The values here can be selected for display in your output.
Rows can be aggregated in order to reformat the data in the table. Any columns created using aggregations will need to be named. Available aggregations are: Average, Count, Count Distinct, Minimum, Maximum, Sum, Concatenate, Concatenate Distinct, Boolean: All, Boolean: Any
Columns can also be inter-aggregated in order to create new outputs. In this example the columns First name and Last name have been combined into a single output
In order to access a specific subset of records, conditions can be set to include or exclude records based on values in specific fields. Complex filters can be designed leveraging AND and OR scenarios.
Group by allows a user to select the key fields to be output, and collapse non-key fields. This step is required for aggregations to be performed. In the context of an excel pivot table, Group by fields are the equivalent of the "Rows" used to construct a pivot. In this example, grouping by State and Employee Type, Our aggregations will tell us how many employees of each type are in a given state, and how much salary is being paid out to that group. This field will auto-populate based on the requirements of any aggregations that are to be performed.
Group By is required for aggregations to be performed
Once the query has been built, a user can select "Run Query" in order to confirm the output. If there are any issues with the query configuration, an error will populate at this point. If a query is successful, a user will receive a notification of the success and the number of rows returned.
If the output created by the query is correct, a user can now download their results as a .csv file. If a query is being built for a pipeline, the query can then be saved, to be run at set intervals.
If a query is likely to be used again in the future, either to create consistent inputs to a pipleine or to generate output ad hoc, the query can be saved. Selecting save query and providing a name will allow the query to be recalled using the dropdown on the top right of the query builder.