Bug? - API Steps producing duplicated results when...

WillowMobileSys · 07-01-2022 08:52 AM

Hi all!

I am going to submit this to support but thought I'd see if any other have input or similar experience. Forewarning....this post is long.

I have been battling this issue for a few days now and no closer to identifying the cause. In a nutshell, I am seeing a Bot error due to timeout. The step causing the timeout is duplicating the set of row add entries EXACTLY 4 times - even finishing any triggered Bots (step set to trigger other bots) before fully halting.

The issue DOES NOT occur if user selects smaller lists in Form (see background below).
If I disable the triggered Bot, the issue DOES NOT occur.
Issue DOES NOT occur in an undeployed app copy - so I can replicate it in a sample app.

A little background...

I am supporting a Crop Services app. In this app a user will create an Activity where they will choose a list of Fields and a list products to be applied. The Fields and Products are chosen in EnumList/Ref columns - one for Fields and one for Products. The process flow then triggers automation which uses API steps to expand the chosen lists to add associated rows to several tables (up to 8).

The tables related to this issue are Activity Farms and Activity Farm Products. We'll focus on these two. Based on the chosen Field list, the assigned Farms are extracted and entered into their own table with calculated Acreage. Then Activity Farm Products is populated based on chosen Products and calculated Quantity based on Farm acreage. Following so far?

The process flow...

A user will enter an Activity, choosing a list of Fields and a list of Products (among other choices). When the Activity is Saved, a Bot populates the Activity Fields table and a second step populates the Activity Farms table. Both use API Tasks in attempts to add rows in bulk and speed up the overall processing.

Since API row adds may not be available for subsequent steps in a Bot, the Activity Farms add step uses the EnumList of chosen Fields (as opposed to the newly entered Activity Fields rows) from which to select a unique list of Farms. See the API JSON Code excerpt below designed to handle both adds and updates:

<<START:UNIQUE([ReportTask_Fields_Included][Farm_Link] - [Related Activity_Farms][Farm])>>
{
"Activity": "<<[ReportTaskID]>>",
"Entities": "<<[GrowerFarmsID].[Entities]>>",
"Farm": "<<[GrowerFarmsID]>>",
"Add_Products": "<<[Related REC_Products][Rec_Products_ID]>>"
}
<<END>>

The add Activity Farms step is set to trigger other bots, and for any newly added Activity Farm row where the "Add_Products" column is not blank, another Bot triggers and uses the "Add_Products" column in another API Task to insert rows into the Activity Farm Products table.

The Analysis...

Reviewing data, only the Activity Farms table rows are being duplicated. Based on row time stamps and physically watching the process, it appears the "Save Activity Farms" step ONLY is being re-executed multiple times.

But only when the list of chosen Fields is 15+ and the chosen Products is 5+. When the chosen listes are smaller, then everything executes as expected.

As noted at the beginning, if I disable the triggered bot that adds the Activity Farm Products, no duplication of adding Activity Farms occurs and automation completes successfully. Of course no Activity Farm Products rows are added!

Turning off "Trigger other bots" didn't seem make any difference. Other bots were still triggered, duplication still happened and timeout error still occurred.

Conclusion...

It appears that when there are API steps adding rows that then trigger another Bot that also adds rows through an API step, the first API Task is being re-executed but only when there is a certain volume of these rows adds?

Any thoughts at all??

1minManager

So the ony thing I can add is that I think bots run time is capped at 300secs. I presume in your case this is also a per step time limit. You have 3 very rough options:

1 = Change the setup so some or all of the process is done using actions. The downside is your user might see donzens or hundreds of items waiting to sync

2 = Change it over so the Appsheet API calls the bot

3 = I'm guessing this bot doesn't add hundreds of new rows. So I'd assume the time taken to add each row is time consuming because you got various app formulas and initial values in there. See if you disable these wether the Bot can run without error. If so, work on makeing these more efficent, or doing a different way.

Simon@1minManager.com

WillowMobileSys

I may not have articulated the issue strongly enough. The timeout is caused because an API step is being needlessly replicated. It should only execute once and for smaller lists it executes perfectly. But in these scenarios where there are larger lists of Fields + Products chosen, the problem API step executes exactly 4 times (3 of them unexpectedly). It is these extra executions of step that are leading to the timeout.

As far as I can tell, there is nothing implemented to purposely run this step multiple times.

As noted above, if I disable the secondary Bot that is triggered from the main problem Bot, then that main Bot runs exactly as expected - of course I then don't get all of the entries inserted done by that secondary Bot. By the way, the secondary Bot is conditioned on the Adds from the problem step, so they are connected in that way.

It is very weird - especially since I cannot reproduce it in an app copy under my personal account.

1minManager

So, without really understanding your data or bot:

Its odd you say it runs 4 times. As in exactly 4 and not 3 or 5 times. If it was a random amount of time then I'd presume its related to the amount of Activities chosen. But if its exactly 4 does this correlate to the number of farms or the number of products?
Looks like your calling a webhook, does this setting have any affect - Max number of retries on failure - Should AppSheet retry the webhook if it fails or times out? If so, how many times?
Maybe your Bot start condition is choosing 4 rows rather than 1? Checkout the trigger for the Event
Or similar to the above, assuming your calling an action somewhere, checkout the "Only if this condition is true" for the action

Hope this helps, sorry can't be any more specific 😐

WillowMobileSys

Ah-ha! The max number of retries was set to 3 - that explains the 4 executions of that step only - 1 attempt that apparently fails and then 3 retries. Now I am wondering if the timeout is due to reaching max retries or if it is a true timeout.

I'll investigate that further by setting the retries to zero.

WillowMobileSys

UPDATE: As realized by @1minManager 's post, the API steps in my automation were set to 3 retries. It seems prudent then to assume that the reason the automation step repeated exactly 4 times each and every time was due to these retries. Because there are retries, it seems reasonable that there is some processing error occurring and, though there are none logged, maybe they are being obscured by the Timeout error that IS being logged.

So the thought was to remove the retries (reduce to zero), allow the step to error normally and see what's causing the problem.

Well, the dang thing ran to completion and produced ALL of the expected results!!!

I set the retries back to 3, and the next run halts with a Timeout after executing the "problem" step exactly 4 times.

I again set retries to zero and this time run with all possible Fields (127 rather than 20) and an increase of Products selected (8 yo 12). The automation fully ran (obviously much slower) and successfully completed without any errors.

As a final test, I set the retries to 1 and ran again. Yep, the "problem" step ran exactly twice and the Bot erred out with a Timeout message.

Again, I cannot reproduce this error in the test environment - a copy of the app under my personal account and not deployed.

I am convinced that one of two things is true:

1) This version of the app is corrupted somehow affecting the behavior of the Bot when retries are set.

or

2) More likely.... There is some Bug in production around the Retries parameter.

1minManager

It is kinda odd the symptoms you're getting. Every time I've had a Bot timeout issue the result was that NOTHING was updated or added. Even if the Bot was 99% complete. So my hypothesis is that the Bot code and API code are totally different. Kinda like a virtual column could show you one result in detail mode and another in a form view.

I think this is a an issue for support@, but don't hold your breath. Since google bought Appsheet the support is now almost nonexistant. My experience:

Send email
Hear nothing for 2 weeks
Email asking me if I can just recreate the entire app from scratch to demonstrate the issue
Me fuming for 2 weeks at how stupid a response this is
An email from appsheet saying they are closing the ticket beause of no response

😠

WillowMobileSys

I have never really thought that support was that great to begin with. I do agree it is much worse now.

I believe that they want to get to a self-help model but to do that the platform needs to be more anomaly free and better documented.

Marc_Dillon

I've run into this before as well. The best solution, I think, is to turn the "run asynchronously?" option on for your webhook tasks. Without this option turned on, you're forcing all of these nested and independent bots to run within the same maximum execution time period. With the option turned on, you get more execution time, because the subsequent tasks' execution time doesn't count towards the first bot's maximum window.

WillowMobileSys

Yes, I can do that now in this case. Originally, I did not have it turned on because I was trying to run steps synchronously where the the second step uses the output from the first. But I found out that even running synchronously, the output (new rows in this case) of a prior step are not available in the tables at the time the subsequent step runs. Which is odd since in a grouped action they are.

I was forced to find another way and now the steps are no longer inter-dependent. I believe I can now use the asynchronous processing but will need to test to be certain.

Marc_Dillon

What exactly do you mean by "prior" and "subsequent" steps?

Suppose you have 1 Bot, with 2 steps. All steps in a bot run on the same data set. And you can modify that current data set by triggering "local" data change actions, such that subsequent steps in the bot can act on the new data. However, by sending off a webhook, you're handing that execution off to another "process" which will be acting on a different data set. Any data change happening in that other process won't be available to subsequent steps in the original bot. This has nothing to do with synchronicity.

WillowMobileSys

What do you mean by "you're handing that execution off to another "process" which will be acting on a different data set"? What is the different data set? Do you mean a different table?

Marc_Dillon

No, not a different table. A snapshot of the entire set of data in your app, in a certain timeframe. A "session", or "state" are other ways to describe it. Just like how 2 different devices using the same app can see different sets of data based on when they last synced.

Imagine sending a webhook to Appsheet API is like a virtual device just opened and synced the app, getting the newest data, before acting on the contents of the webhook. While your app running on your physical device is still in the previous state, and won't know anything about the data changes being made on said virtual device until after a full sync is made.

WillowMobileSys

The server side of things is a black box to me. I have always thought of the server as having a single data source that all clients are operating against and then filtering occurs to send refreshed data back out to the clients through Syncs.

Have you confirmed your understanding with AppSheet? I accept your explanation of how the server side works because it makes sense with what I am seeing, but man, it's a HUGE overhead in attempts to support automation and keep all data synced to a final data source.

I wish we could get a clear picture of how the server side of things work. It would be a huge help in deciding how to organize our automated processes.

Marc_Dillon

>>" I have always thought of the server as having a single data source that all clients are operating against and then filtering occurs to send refreshed data back out to the clients through Syncs."

Appsheet servers don't hold the data. They temporarily load it from the actual data sources, on demand (i.e. on every sync), process it as necessary (like applying Security Filters, calculating VCs, combining partitions, etc), then send it to the device.

https://support.google.com/appsheet/answer/10108551?hl=en&ref_topic=10104486

But I'd say the server-side is not where your lack of understanding lies, for this particular case. It is more about how the app session on your physical/local device isn't able to immediately see the data that is added/changed by a webhook, because those changes are NOT executed by your local device.

>>"Have you confirmed your understanding with AppSheet? "

Not per se. It's just from my long time using the platform, reading the help articles over and over, experimenting a lot, and thinking about how the system would/should logically work.

Even someone with your similarly long experience with Appsheet may benefit from reading through some of the help articles again, such as the one posted above. There is a decent bit of info on them. Details that you probably glossed over on your first few reads because it didn't seem important at the time.

WillowMobileSys

"Appsheet servers don't hold the data."

This is not quite accurate. As mentioned in the article, the servers cache the data in memory. They just don't "persist" the data which is just a technical way of saying they don't move the data to disk storage - some servers might do this for efficiency since read/writing to a local disk is much faster than read/writing to a cloud service. Local disk becomes a security concern because data on disk opens up several avenues were a hacker could gain access to that data. It is much harder to get at the data while in memory.

It is the server side I am having issues with. My complaint all along is that when I add rows to a table in Step 1 of a Bot, those rows are NOT immediately available to Step 2 of the same Bot. This realization led me to an alternate solution but then I ran into the issue highlighted in my original post above.

But your explanations are starting to bring into focus what is actually happening on the server. Unfortunately, the highlighted article doesn't provide details of what IS truly going on in the server with regards to retrieval and usage of the data and how that applies to the various automated processes built in our apps. But I think I am starting to understand.

Marc_Dillon

>>"My complaint all along is that when I add rows to a table in Step 1 of a Bot, those rows are NOT immediately available to Step 2 of the same Bot."

They are available if you're using data change actions, because those operate on the local app session. They are not available if you're using webhooks to Appsheet API, because those don't operate on the local app session.

At the risk of repeating the same thing over and over again, here's an analogy. Forget about "the server". Pretend that instead of using a webhook to make the data changes, you yell over to your coworker in the other room "hey man, can you please add these rows to the app". He'll add those rows on the app on his own device, and at his own pace. Will those rows be immediately available to you on your device? Of course not, you need to first wait for him to be done, then sync your own app. "The server" is the same way, it's just another instance of the app, running in the cloud, that you're giving instructions to via webhooks.

WillowMobileSys

I do get what you are saying. Each API call is being treated as if it is coming from a different user and therefore gets its own little processing space.

greggillam

Hi @WillowMobileSys, were you ever able to figure out exactly why this was occurring? I'm having similar issues with a bot creating duplicate entries. I have an open post about it here.

WillowMobileSys

My issues with duplicates rows turns out to have been an issue with "timeout" errors when running the API call. API's can be configured with "retries". Mine was retrying 3 times. I changed the retry count to zero to prevent duplicated data. This doesn't eliminate the error, just prevents the step from re-processing and creating duplicate rows.

My bigger issue was that data from one API step was not available to the next API step. I solved this problem by breaking up the steps into separate Bots where the API step was dependent on a previous API step.

Bug? - API Steps producing duplicated results when supplied larger number of user input choices