Missing records on large BigQuery data with Security Filters

Hello everyone! 

Some context first:

I have a BigQuery data source (around repair jobs on properties) with about half a million rows that I would like to bring to AppSheet to allow for easier searching by non-technical users. I've been trying to design a security filter for this data in order to display only the most relevant rows, but after testing, I've noticed some irregularities. 

I've been using the User Settings table to allow my users to choose from existing values like this (all fields below are EnumLists with suggested values from the base table): 

Rares_Lungu_0-1698913617675.png

Whenever a user selects some values and presses Save, the Security Filters should update and bring the relevant data.

My Security Filter for the data source looks something like this: 

Rares_Lungu_1-1698872621401.png

I've been testing for the past couple of hours to see if this system works, and I've noticed that sometimes the data displayed on AppSheet is incomplete (I've run BigQuery queries in parallel to check my results).

For the address in the image above (address hidden for security purposes), for example, I can only see a single record in AppSheet, while there are three on the BigQuery table. 

Could this be a potential bug I've run into, or is my Security Filter breaking some rules? I know that OR operations are to be avoided with Security Filters, but I've tried to only use ANDs and still got the same problem.  

I really can't figure this one out, so any help would be much appreciated.

Thank you! 

 

1 18 672
18 REPLIES 18

Although I have no clues as to what might be happening right now, have you found any consistency to your missing data or it is purely random.

Hello @TeeSee1 , and thank you for your reply! 

I haven't been able to find any common patterns at this point. The problem occurs with all of the fields and the rows that get returned seem random.

One thing I am wondering is if AppSheet is extracting the entire table from BiqQuery pre-security filtering or just a subset of it.

Well, it seems the issue is not how you have set up your application but most likely a system bug. I suppose you need to raise a support ticket with Google support to get them to look at it.

Sorry I cannot be of any help.

A few things you may want to take a look at are

1. If you are testing it in the emulator, the user settings persist only till page is refreshed. Please take a look at the last section on Lifetime of the following article.

User Settings: The Essentials - AppSheet Help

At times,  during the testing , this aspect could be missed. 

2. The "property address" field seems to be having comma separated address components for a single address. If the enumlist is also composed by comma separated values, then each individual address may be mistaken as multiple elements of the enumlist by the expression. You could try with some other item separator than comma (",") for this field.

Edit: I quickly tested with a comma separated enumlist filed inwhich each individual value is also comma separated and still the IN() function worked. So the comma in individual field value does not seem to be an issue. 

3. If you are using  "Sheet connected BigQuery" , then the freshness of the data will be based on sync frequency of BigQuery to Google sheets as mentioned in the article below.

Use data from Google BigQuery with AppSheet - AppSheet Help

 

Hello @Suvrutt_Gurjar, and thank you for your suggestions!

Regarding your points:

1. I was aware of this, but thank you for the reminder. I have been keeping this in mind while testing.

2. I hoped that was the problem, but I have tested this as well, and I couldn't see any improvements. 

3. The AppSheet app is connected directly to BigQuery, so this shouldn't affect me. 

Again, thanks for the ideas; I really appreciate any help!  


@Rares_Lungu wrote:

(all fields below are EnumLists with suggested values from the base table): 


Thank you.  Could you elaborate above?  Could you share the expressions and where you are using those to construct the security filters fields' enumlists?

Sure thing! 

Because the base table is so big, I have instead created a couple of auxiliary tables with all the possible distinct values. Because of that, I can use something like this for the suggested values: 

Rares_Lungu_3-1698937733748.png

All the other fields have similar formulas. 

These auxiliary tables have, at most, a couple tens of thousands rows (less than 50k), so I don't think that would cause any problems.  

Aurelien
Google Developer Expert
Google Developer Expert

Hi @Rares_Lungu 


@Rares_Lungu wrote:

I have a BigQuery data source (around repair jobs on properties) with about half a million rows


 

In addition:

Limits on data size - AppSheet Help

Aurelien_0-1698922355271.png

Makes me think you may have reached it?

Also, for reference: Limits on tables per application - AppSheet Help

I would file a ticket with support to better investigate about it, I think the data size is at play here.

I would use a table partition for your use case.

Here is more information:

Approaches to data scalability - AppSheet Help

Scale using data partitions - AppSheet Help

Hello @Aurelien, and thank you for pitching in! 

Regarding the size limit, I was under the impression that the source data size is not as important as AppSheet will initially load the data on their servers, apply any filtering, and then cap the results to 100k rows (and, as you mentioned, 5 to 10 MBs).

In terms of the partitions, I am not sure how I would be able to deploy that as a solution for filtering the data. The fields I want to use for filtering have a couple thousand distinct values, so I don't believe this approach would be that feasible. 

I haven't raised a support ticket yet, but I believe that will be my next step. 

Thank you for the help, though! 

When using OR() as you have it, it's probably not pushed to BigQuery. At least it doesn't work with other databases like SQL, etc. Then it fetches all the data first to AppSheet's server before the security filter. If you have a huge amount of data, that could explain your behavior. When using IN(), databases don't undersand that expressions either, and that's why AppSheet needs to convert it before it's pushed to BigQuery. AppSheet converts it to a list of parameters.. and MS limits it to 2100.

So, if you remove the OR() and then use only one of your IN(), it should show you the correct result if the IN() brings less than 2100. Worth to test. You can check what is pushed from your app's Performance analyzer, It's called "DataFilter"

AleksiAlkio_0-1698930143990.png

 

I don't think this would work as AppSheet doesn't do 'smart' transpiles using the BigQuery, connector. At least not as far as the documentation says. Is there a way to trace what queries get sent to the database provider so we can debug?

That's what you can check from the Analyzer. The sample that I took is using IN() with the SF. Though I'm using SQL (Azure), but it should work with BigQuery as far as I have understood.

I have used the Analyzer as you suggested, and this is what I saw when using my original Security Filter (with a three-clause OR statement): 

Rares_Lungu_0-1698935460951.png

Interestingly enough, the filter didn't appear at all.

This was the log for the same table when I used a single field in the Security Filter: 

Rares_Lungu_1-1698935997951.png

This was the filter for the second test, a single IN() command:

Rares_Lungu_2-1698936081515.png

The values I used in the filter should 100% return some rows (based on BigQuery tests), but in both cases, no data was displayed on AppSheet.

I feel that AppSheet only gets a maximum of 100k rows, no matter what. 

So.. the single IN() is pushed to BigQuery as you can see it from the DataFilter, but it's probably not finding the row because you have more than 100k rows to read. Let me check if there is such a limit.

Hello @AleksiAlkio! I have seen this before as well. I thought that the way I was using the security filters should allow me to filter through the entire data (i.e., more than 100k rows). 

As per your test, it sounds it doesn't go through all rows, just the first 100k rows.. probably.

I encounter the same problem, although it is specified that the filter applies before retrieving the 100,000 rows.

https://support.google.com/appsheet/answer/11905680?hl=en&ref_topic=10102124&sjid=416630947393922023...

 

 

Top Labels in this Space