Error handling (Embedded)

Available methods
Copy

Different stakeholders need to be taken into consideration for debugging/ troubleshooting Embedded solution instances - integration builders, customer support, and the client themselves. Tray has several features available allowing integration notifications or errors to be surfaced to these stakeholders:

  • Workflow step-level error handling

  • Solution instance-level alert workflow

  • Account-wide solution alert workflow

  • Log streaming to external services

The combination of which features are used will ultimately depend on the following:

  • What triggers do your solution workflows use?

  • Who's going to be receiving & handling execution errors?

  • Does each solution require different error handling, or do error notifications need to include execution context from an external system?

  • Does the workflow execute & respond synchronously or asynchronously? For instance if a solution instance is executed by sending a webhook request to Tray, does the calling app expect to receive an immediate response and then poll for updates, or does it expect the full execution information to be returned in the call?

  • Do you have a log ingestion tool (such as Datadog)?

Decision process charts
Copy

The following process charts can show how you can take these factors into consideration when scoping your implementation.

Webhook-triggered workflows
Copy

Service or scheduled trigger workflows
Copy

Example implementations
Copy

1 - Realtime service-triggered solution instance workflows (partner's support team handle errors)
Copy

e.g. listen to incoming lead events from Salesforce. As the partner's support team will be handling all errors on behalf of customers, they can use Alert / Solution Alert triggers to send a message to their support team for follow up.


2 - Realtime service-triggered solution instance workflows (partner's End Users self-serve errors)
Copy

e.g. listen to incoming lead events from Salesforce. As the partner wants their customer to self-service errors, they will likely need to store execution / error logs on their end to then surface to the customer.

If the customer has a log ingestion tool, such as Datadog, they can leverage Log streaming to ingest all errors.

If not, the customer can leverage Alert / Partner Alert triggers along with a Database or queue connector to send logs to their system.


3 - Solution instance workflows triggered from partner's app (webhook trigger), execution is synchronous, partner's End Users self-serve errors
Copy

In this case the partner's application will send event payloads to Tray for each solution instance execution,

This allows realtime syncs or data fetches. As the application is waiting for the workflow to execute and return a response, step-level error handling can be used to handle errors and send a customised message to the calling application.


4 - Solution instance workflows triggered from partner's app (webhook trigger), execution is asynchronous, partner's End Users self-serve errors
Copy

In this case the partner's application will also send event payloads to Tray for each solution instance execution, allowing realtime syncs or data fetches.

However, the application expects an instance response just to confirm that the webhook has been received by Tray.

It then likely polls an endpoint, awaits logs or polls a file bucket to check when the execution has finished.

In this scenario, a service connector with the partner's static auth will likely be used in solution instances to send success / error payloads to an external service.

Errors can either be handled using workflow step-level error handling, or more scalably through a Solution Alert trigger.

A common approach to this is:

  1. Listening to an incoming request ID that is unique to each webhook event, using Data Storage to save this under a key containing the Tray execution ID,

  2. This can then be referenced to retrieve the original request ID inside the Alert trigger workflow.

Note on services failing
Copy

If a third party being used in your workflow is having network issues, this can cause your whole workflow to fail.

A best practice approach to consider here is to set your service connectors to use Manual error handling so that you can take immediate appropriate action should this be the case.

Note on points of failure
Copy

You should be aware that setting up error handling systems within your workflow can add more points of failure. An example of this is when somebody might change the login credentials for the MySQL database you are using to store your status messages.