When using a Scheduled Trigger to run your ETL periodically, you will very likely need to implement the last runtime method
This has been left of the above workflow for the purposes of simplicity
The basic stages of this are:
1 - Extract the data from the source service (Salesforce accounts in this case) using the to through the results. This will come to an end when Salesforce no longer returns an offset token, indicating there are no more records to be found
2 - Carry out any necessary transformations on each batch of data being returned. This can be done to e.g. meet the requirements of your destination database. In this example we are transforming the External_ID__c field from string to integer:
This uses a simple script to set all External_ID__c
values to integer:
1exports.step = function(input, fileInput) {2var list = input.list;34for (var i = 0; i < list.length; i++) {5list[i].External_ID__c = parseInt(list[i].External_ID__c);6}7return list;8};
3 - Load the result into the destination database the resulting transformed data can then be loaded in one call as multiple rows to the destination database (Postgresql in this case):
A key thing to note here is that all the accounts are loaded in one single call, rather than looping through the data and making e.g. 100 calls for each account.
Please see our Redshift documentation for guidance on how to deal with issues such as dynamic payloads.
As mentioned in the note on scaling above, please also see our Redshift documentation for guidance on processing massive amounts of data via S3 and the COPY command
Setting up error handling methods can help mitigate against workflows which might fail and hold up the completion of your threads.
The following example shows how we could get the details around an error associated with getting contact data and put them into a Google Sheets record of errors:
In this case we are extracting the timestamp, contact name and error message to add to each entry in Google Sheets, before breaking the loop to move on to the next contact.
A requirement for your ETL project may be to dynamically generate the table schema in the destination database - perhaps because there is a vast number of potential fields, which would make the manual process extremely labor-intensive.
That is to say that you will not be manually creating the table schema in your database before setting up the ETL system, and you want your Tray workflow to:
Pull data from the source service
Build the correct table schema (including the correct data types, allowed null values etc.) based on the data that has been returned
Automatically create the table in the database and input the first batch of data
Input all subsequent batches of data in accordance with the defined schema
Subsequent to this you may also need to automatically respond to new fields added to the schema, or even such changes as fields being renamed.
If dynamic schema creation is a requirement of your implementation please contact your Sales Engineer or Customer Success representative to discuss the exact requirements.
Some Data Warehouses, such as Google Big Query, may have input schema which is difficult to satisfy.
In the case of Big Query, the following 'flat' input data:
1"value": [2{3"Email": "muhammad@tray.io",4"IsActive": true,5"Title": "Sales Engineer",6"Id": "0054xxxxxxxxWauSQAS",7"Name": "Muhammad Naqvi"8}9]
Would need converted to the following:
1"value": [2"row": {3"data": {4"Email": "muhammad@tray.io",5"IsActive": true,6"Title": "Sales Engineer",7"Id": "0054xxxxxxxxxWauSQAS",8"Name": "Muhammad Naqvi"9}10}11]
If you are having difficulties satisfying the schema requirements, please contact your Sales Engineer or Customer Succcess Representative.