Improving performance by getting rid of unnecessary virtual columns

Hey guys,
I am trying to tweek out some performance by working through the details in the performance profile. I do not use many of the virtual columns created by appsheet automatically but if I delete them they are regenerated on save. Some of them take up a second or so performance time. Is there a way to disable them?
Another question, is there a way to quickly identify the virtual columns that are not used in your app?
Thanks
Hyman

Solved Solved
3 29 4,652
1 ACCEPTED SOLUTION

Hi @Hyman_van_Zyl it is very unlikely that the auto-generated virtual columns are where your sync time is going. However, you may think so when you look at the performance profiler, but that is a flaw in how the data is being presented. Let me explain. Letโ€™s say your app has 10 tables. Our server will only fetch a few of them in parallel (the exact number depends on your subscription and the current load, but letโ€™s say that number is 3). So off it goes to fetch tables 1, 2, and 3 in parallel. When any of them is fetched and ready, that thread goes off to fetch 4, and so on.
But now hereโ€™s the tricky part โ€ฆ what happens with virtual columns? The virtual columns for table 1 are computed right after table 1 is fetched. But letโ€™s say one of them requires table 8. So that has to be fetched too. In the performance logs, this will count towards the compute time for the virtual column. But in reality, table 8 had to be fetched anyway at some point during the sync. In fact, youโ€™ll see this recorded if you drill down into the virtual column computation in the profiler.
The virtual columns that take a long time to run are those that involve inefficient multi-level SELECT formulas. The โ€œreverse referenceโ€ formulas are usually really fast โ€” they just need one pass through the data. And we do this in our server during sync so that we donโ€™t have to do them on the devices (slower and more of a battery drain on a device)

View solution in original post

29 REPLIES 29

If you are using a Ref field and you donโ€™t change that column type, it will regenerate the virtual list column after saving.

Hey Aleksi,
Is there anyway to kind of disable that virtual list so it does not get processed during app sync?
Thanks

Iโ€™m not aware of such a workaround.

Ok, Thanks

Hi @Hyman_van_Zyl it is very unlikely that the auto-generated virtual columns are where your sync time is going. However, you may think so when you look at the performance profiler, but that is a flaw in how the data is being presented. Let me explain. Letโ€™s say your app has 10 tables. Our server will only fetch a few of them in parallel (the exact number depends on your subscription and the current load, but letโ€™s say that number is 3). So off it goes to fetch tables 1, 2, and 3 in parallel. When any of them is fetched and ready, that thread goes off to fetch 4, and so on.
But now hereโ€™s the tricky part โ€ฆ what happens with virtual columns? The virtual columns for table 1 are computed right after table 1 is fetched. But letโ€™s say one of them requires table 8. So that has to be fetched too. In the performance logs, this will count towards the compute time for the virtual column. But in reality, table 8 had to be fetched anyway at some point during the sync. In fact, youโ€™ll see this recorded if you drill down into the virtual column computation in the profiler.
The virtual columns that take a long time to run are those that involve inefficient multi-level SELECT formulas. The โ€œreverse referenceโ€ formulas are usually really fast โ€” they just need one pass through the data. And we do this in our server during sync so that we donโ€™t have to do them on the devices (slower and more of a battery drain on a device)

Is there advantage, if I donโ€™t need the REFROWS, in using the enum basetype REF maneuver?

Can you please describe, with an example, what you mean by โ€œmulti-level SELECT formulaโ€?

Itโ€™s a SELECT() formula inside of another SELECT() formula.

Good evening, I came to this thread looking for help for an app that abruptly started displaying an unusual sync time. I found your description of the "physiology" of syncing in Appsheet very insightful and helpful.

That description of the process made me ask myself some questions that I would like if it were possible for you to help me answer, please:

Question 1.

Do you know how Google Sheets "virtual columns" (that is, those columns whose values โ€‹โ€‹come from formulas written in google sheets) affect the performance of appsheet synchronization?

Question 2.

I am a more or less old user of Google Sheets combined with Google Apps Script. Now, with Appsheet, I still use Google Apps Script a lot to add information in the tables through triggers (either by time or every time a file changes, being that those changes are, for example, editing, adding or deleting a record from Appsheet). The question here is how does this influence the sync times? Is adding or modifying data in the appsheet app databases using Google Apps Script automations something that is considered not recommended from any point of view?

Question 3.

On the other hand, I am also a fervent user of Google Sheets queries, and I use them frequently to solve in google sheets, queries that in appsheet I still do not feel capable of solving, and less when I see everywhere that the use is discouraged from SELECT on the virtual columns. In other words, I take that virtuality and solve them in google sheets, thus generating and updating tables that are later part of the appsheet apps database, generally as read-only tables. The question here is if this behavior is somehow something that could slow down the appsheet sync speed. From already thank you very much.

Marcos. 

hello @pravse , I see I omitted to mention you in the previous message

 

In general, I would recommend not trying to hack the REFROWS thing. Just treat it as an essential column added by the system. In no performance investigation have I ever found it to be an actual cause of slow performance.

Deal.

Thanks Praveen,
Any suggestions where I can learn about ways to avoid inefficient multi-level SELECT formulas? Like what parameters determine if a formula becomes heavy? How do I know if my app is too slow? What is the recommended amount of tables? Those kind of things.
Thanks
Hyman

I guess we should publish some general rules of thumb (adding @Derek FYI).

Iโ€™d say, assuming a PRO plan, and painting in very broad strokes here (there are many differences between SQL and google sheets and excel and โ€ฆ):

  1. If you have N tables, assume N/3 seconds of baseline latency
  2. Take your largest table in terms of cells (rows * columns) and add 1 second for every 5000 cells
  3. In #2, If you have security filters, and they are simple (ANDs, equality or IN conditions) and you use a SQL database, then you count the rows that pass the security filters only โ€ฆ otherwise you count all rows
  4. Avoid virtual columns with a SELECT and anything but an equality or IN filter condition

So thatโ€™s a good rule of thumb. Syncs should be under 10 seconds. Maybe 15 max. Otherwise, you most likely have (a) too many tables โ€” break up your app into sub apps with a launcher, (b) too much data โ€” use security filters, ยฉ inefficient SELECTS in the security filters, or (d) inefficient SELECTS in the virtual columns.

So me always trying to get them sub 5sec is sillyโ€ฆ
LOL

@Grant_Stead it is possible to be sub 5 second, but when using sheets, there is latency imposed by Google (or Dropbox or whoever) that isnโ€™t necessarily related to the number of rows.

@praveen

How do complex UX formatting rules factor into sync times?

Thanks for your input on this topic by the way!

@praveen
These are very valuable tips. Just to be clear on point 4, so equality and IN conditions are ok and any other kind of condition should be avoided as far as possible?
Does it affect sync time if you make a table read only?
Thanks lots
Hyman

UX formatting rules have no impact on sync time but have tremendous impact on display/rendering time. We render rows dynamically โ€“ as you scroll up and down, values are shown or not. Now for every one of them, formatting rules have to be run, and if they involve complicated expressions, things can get really laggy

You can do AND([Col1] = {something that evauates to a constant}, IN([Col2], {something that evaluates to a list}), etc).

Also, a very common pattern is to use a simple filter [Email] = UserEmail on TableA and then to use TableA itself as a filter on TableB like this in the security filter of TableB IN([ARef], TableA[Key])

For example, if you have WorkOrders and WorkOrderDetails, you can filter WorkOrders based on the Email of the assigned user and filter WorkOrderDetails based on the WorkOrder ids.

So a AND([Col1]=[THISROW].[Col2],โ€ฆ) would be ok?

So make a slice of TableA which you filter and then use that slice in Table B?

Does it help to make tables read only where possible?

Thanks

@praveen This would a be good article indeed to open the backend.

I agree @Aleksi

In terms of Appsheet performance, its a bit like a black box model at the moment. We have to create stuff and see what the results are

Skip2MiLu
Participant V

Please consider dedicating a office hours and / or short YouTube video tutorials dedicated to performance concepts and examples (eg. App examples with inefficient setup and how they can be modified to be more efficient)

aucforum
Participant V

Are the following are remarks about using multiple SELECT (in particular its variations) in a statement/formula correct in this context?

> Beware of LOOKUP() This is a compact equivalent of ANY(SELECT(Customers[Phone], [Name] = "John Doe")) where the filter is an equality condition.

> Expressions in format rules should be simple โ€” avoid SELECT or its derivatives (like FILTER and MAXROW)

The use of any of the following in a virtual column app formula or in a format rule can hurt performance in direct proportion to your data set size (the more data, the more performance suffers):





All of them use SELECT() behind the scenes.

To be clear, SELECT() and all these variants of SELECT() are useful and thatโ€™s why they exist. However, some SELECT() functions are efficient and some are not. This depends on the filter condition being used. Equality, IN(), and AND() of these are efficient filter conditions. Inequality and OR() functions lead to inefficient filter functions.
MAXROW() and MINROW() are also inefficient.

I tend to use MAXROW() where the database is a spreadsheet as I cannot reliably enforce a 1 to 1 table relationship (where there a no parent tables above it) Is that a problem where there could be hundreds or thousands of records?

That could definitely be a problem!

Realistically, if your MAXROW() (or any SELECT() variant) would be expected to always produce the same result (for instance, using MAXROW() to find the preceding row), youโ€™re much better off making it a normal (non-virtual) column to calculate it just once and reuse the result.

My specific recommendation would be a normal column with the expression as the app formula (yes, an app formula for a non-virtual column). This will cause the rowโ€™s column value to be recomputed any time that row is updated in any way, but not every time the app syncs. This way, if some sort of corruption sneaks in, thereโ€™s simple way to recalc and fix it on a case-by-case basis, rather than the massive, mostly-pointless effort of recomputing such virtual columns so frequently.

But, to stress, this recommendation is only for columns that need to be calculated, but only once or very infrequently under specific circumstances.

Top Labels in this Space