Re: Error Handling in Application Integration

PoornimaD · 11-15-2023 02:39 PM

How we handle errors in our integration is one of the most crucial aspects of designing a good integration. However, it’s one of the most overlooked areas as well. In this article, I will talk about different error strategies Google Cloud’s Application integration provides and the best practices to follow.

Error Categories:

In general, we can classify errors into two main categories:

Permanent errors: Client-side errors like authentication failures, and data validation errors. These errors cause tasks to fail permanently and you require some kind of intervention to resolve these types of errors.
Temporary errors: Server-side errors such as HTTP 503 (service unavailable), and HTTP 400 (bad request), are considered as temporary errors. Temporary errors only cause temporary task failures.

It is also important to understand different error codes to correctly and quickly identify the error. For example, if you get a 403 error, you should know that it’s a permission denied error, most likely caused by insufficient permission at client side. You can learn more about different error codes here: application-integration/error-codes

Execution Modes:

Broadly speaking, there are two categories of execution:

Synchronous - In synchronous mode, the execution result of the integration is available immediately after the integration runs. Synchronous mode is helpful in scenarios where the integration invoker needs to wait for the execution result. You should use synchronous mode for real-time, and transactional data exchange.

Asynchronous - Asynchronous executions use the fire and forget model. Asynchronous mode is helpful in scenarios where integrations can take a long time to run, or the execution result is not required immediately after the integration runs. For example, you’re sending a courtesy email and don’t want to wait for that to complete before moving to the next step in your process.

Error Strategies:

The error handling strategy for a task specifies the action to take if the task fails due to a temporary error. In Application Integration, we can specify different error handling strategies for each execution mode. For the same task, you can specify different strategies based on whether the integration is invoked synchronously or asynchronous.

I have a sample flow below to Create an Order. This flow uses Cloud SQL - MySQL database to create new orders. In case of a database outage, the connector task will throw errors while inserting any new orders into the database. To add the Error Handling logic, click on the connector task (create order in this example) in the integration designer to open the task configuration pane

Here I have selected none for Synchronous Executions and retry with linear backoff for Asynchronous Executions

With this configuration, the integration should fail immediately when the integration is invoked synchronously and the associated task fails to execute. Please note: When you select none and if an alternate path to the final task exists, tasks in the alternate path are run and if all tasks in the alternate path run successfully, the integration status will be marked as Succeeded. But if there is no alternate path and the task fails, the integration will be marked as failed.

If the integration is invoked asynchronously, the errored task should be retried, and the expected delay in between retries is as follows:

1st retry, at least 3 seconds after previous failed attempt
2nd retry, at least 6 seconds after previous failed attempt
3rd retry, at least 9 seconds after previous failed attempt
4th retry, at least 12 seconds after previous failed attempt
5th retry, at least 15 seconds after previous failed attempt

You can specify different strategies for error handling, you can either choose to ignore, or restart the integration or retry using one of the backoff mechanisms, like the example above is using the linear backoff. In exponential backoff, you increase the waiting time between retries after each retry failure exponentially rather than increasing it linearly. For example, if the specified retry interval is 3 seconds, the first retry occurs after 3 seconds, the second retry occurs after 9 seconds, the third retry after 81 seconds, and so on. The process continues until the maximum number of retries is reached or the task succeeds, whichever is earlier.

You can learn about different error handling strategies that you can use for a task here: application-integration/error-handling-strategy

Error Catcher

Error catcher defines a customized way to handle the failure of an identified trigger, task, or edge condition in your integration. You can define one or more error catchers in a single integration to handle task errors and/or execution failures. Each error catcher can be invoked using a trigger, called the Error Catcher trigger, to run the set of configured integration tasks customized to handle the error. You can learn more about error catcher here: application-integration/error-catcher

As a best practice, you should use both error handling strategy and error catcher in your integration. For any error, the integration follows the strategy defined in the error handling and after exhausting the error handling strategy configured, error catcher logic will be triggered. You can use variables like CommonErrorCode, ErrorMessage to capture the value of error code, error message, etc to send to your error catcher flow.

In the example below, the integration flow is using the error catcher at the task level, when Call REST Endpoint task errors out, as per the Error Handling Strategy configured, it will try to retry 3 times. After the retry attempts are exhausted, the error catcher will send the messages to a Dead Letter Queue.

Similar to configuring Task-Level error catchers, you can also define Event-Level error catchers to handle execution failures such as integration failures, edge condition failures, task failures, and retry execution failures. Error catchers at the event level are invoked when you have not defined or attached task-level error catchers to handle any task failures.

The example below shows a sample integration flow that uses error catcher at the event level. When the API Trigger event fails, it will call an Error Catcher which would call another REST Endpoint.

Please note that in the error catcher flow, you can specify any actions like send an email or call another REST endpoint, or suspend an execution etc

To learn more about Google Cloud’s Application Integration error handling, please refer to the product documentation: application-integration-error-handling

pramodvallanur

Thanks Poornima for the details. Also cross referencing another post related to error handling / error catcher, so that members are able to find related content quickly
https://www.googlecloudcommunity.com/gc/Integration-Services/How-to-test-for-errors-on-a-Task-in-App...

phertzog

Hello,

I try to get my head around error catching.

I created a global error catcher attached to my integration trigger. If I understood right, this error handler will be trigger if any uncatched error occurs in any task. But I don't find any reference of global variables that would allow to know what task failed and what the error message was.

I must miss something because at the current state the only thing I can do would be to send a message saying "something happened somewhere". A little limited option to my understanding.

Have you reference of integration examples that make use of error catchers? The official documentation jsut shows pictures of the flow but without any details of the tasks of the error catcher.

Regards

phertzog

Hello,

Can you please elaborate a little more on the SendEmail task you show on your example.

In my case, I have a REST task that can fail on timeout for example. I setup a Error cather and want to use a data mapping task to store the error message from the REST and use a connector to store the value in a database.

When I use the mapping task, I cannot find any variable that could be the error message from the REST task.