Conditions of nested SELECTs

Hello,

I have a scenario, where I would like to select a set of records based on a certain number of conditions  AND also if the sum of a certain field in the SELECTED rows exceeds a certain threshold. For example, customers with a certain gross sales amount during a specific period.

I did this:

SELECT(theTable[customer],
  AND(
    [condition-1],
    [condition-2],
    [condition-3], 
    SUM( SELECT(theTable[sale], 
      [customer] = [_ThisRow-1].[customer]
    )) > [_ThisRow].[threshold]
  ), true
)

This didn't give correct result since the inner SELECT operated on all rows ignoring the preceding the outer SELECT's conditions, while I expected it to only work on the rows already selected by the outer SELECT, as SQL maybe? especially with the presence of the [_ThisRow-1] clue.

So I did:

SELECT(theTable[customer],
  AND(
    [condition-1],
    [condition-2],
    [condition-3], 
    SUM( SELECT(theTable[sale], AND(
      [customer] = [_ThisRow-1].[customer],
      [condition-1],
      [condition-2],
      [condition-3] 
    ))) > [_ThisRow].[threshold]
  ), true
)

This gave correct results as expected but the expression itself is very slow to execute. It is in a normal column, and just opening a form with line calculated takes over a minute. If the table has 1000 rows, ignoring the outer SELECT's conditions would make the calculation run 1 million times. 

At the end I ditched it and built a specific table to do selection and referencing before hand. 

Would you please share your thoughts? Would you expect the same behaviour? Do you think there are better ways/improvements? Thanks!

Solved Solved
0 8 451
2 ACCEPTED SOLUTIONS

So you are a bit stuck with how we have to write formulas with no option to store a value.  A simplier form of the problem is 

IF(
[A]+[B]+[C]>=7,
7,
[A]+[B]+[C]
)

Ideally you want to do [A]+[B]+[C], save that as a variable, then use the variable in the formula.  But lets not complain about what we don't have... 

The best I can come up with for you is to use a slice to split it up a bit.  Create a slice of theTable with

AND(
[condition-1],
[condition-2],
[condition-3]
)

Then change your formula to:

SELECT(SLICE[customer],
SUM(SELECT(SLICE[sale],[customer]=[_ThisRow-1].[customer]))>[_ThisRow].[threshold]
)

So the slice will run on sync.  But like a variable, this happens only once and hopfully will be more efficent. 

The only other addition is that I presume that with SUM(SELECT(SLICE[sale],[customer]=[_ThisRow-1].[customer])) you're adding up all the sales for each customer.  Do you not have a seperate customer table you could do the sum on once?  Or of this is simply a list of sale items, then consider using an action that calculates this on save.  That way the forumla could be to find the last updated record for this customer and then extract [sale].

Simon, 1minManager.com 

View solution in original post

Aurelien
Google Developer Expert
Google Developer Expert

Hi @Joseph_Seddik 

From my point of view, your calculation runs twice on a huge dataset and it would be great being able to reduce it.

Just some thought to share for help, is it possible to restrict with an intermediate virtual column the dataset to evaluate, and run the calculations over it?

For example:

1) let's name [_restrictedTheTable]

 

FILTER("theTable",
  AND(
    [condition-1],
    [condition-2],
    [condition-3]
  )
)

 

2) Then, I would go with:

SELECT(theTable[customer],
  AND(
   IN([id],[_THISROW].[restrictedTable]),
    SUM( SELECT(theTable[sale], 
      AND(
        [customer] = [_ThisRow-1].[customer],
        IN([id],[_THISROW].[restrictedTable])
      )
    )) > [_ThisRow].[threshold]
  ), true
)

This is a minor change, I'm not sure that would be efficient.

However, using the Reference properties, it may be usefeul to use a set of actions as described by @1minManager to flag relations between your customer table and your selection table.

For example, you could create a new column [_restrictedCustomer], type List of Ref with the expression:

[restrictedTables][customer]

Doing this could be used to track the rows of table that matches your selection criteria and establish a relationship with the sales spent with your customer, regarding the threshold.

I don't know if there is a kind of SELECT([Related XXX], condition) that exists, but I wish it was for your case! (I just tried it in case...but nope)

View solution in original post

8 REPLIES 8

So you are a bit stuck with how we have to write formulas with no option to store a value.  A simplier form of the problem is 

IF(
[A]+[B]+[C]>=7,
7,
[A]+[B]+[C]
)

Ideally you want to do [A]+[B]+[C], save that as a variable, then use the variable in the formula.  But lets not complain about what we don't have... 

The best I can come up with for you is to use a slice to split it up a bit.  Create a slice of theTable with

AND(
[condition-1],
[condition-2],
[condition-3]
)

Then change your formula to:

SELECT(SLICE[customer],
SUM(SELECT(SLICE[sale],[customer]=[_ThisRow-1].[customer]))>[_ThisRow].[threshold]
)

So the slice will run on sync.  But like a variable, this happens only once and hopfully will be more efficent. 

The only other addition is that I presume that with SUM(SELECT(SLICE[sale],[customer]=[_ThisRow-1].[customer])) you're adding up all the sales for each customer.  Do you not have a seperate customer table you could do the sum on once?  Or of this is simply a list of sale items, then consider using an action that calculates this on save.  That way the forumla could be to find the last updated record for this customer and then extract [sale].

Simon, 1minManager.com 

Hi Simon,

Thanks for taking the time to reply. I do have a customers table, naturally, but it is not sufficient. Conditions are dynamic so slices unfortunately are not a viable option either. 

This is where I'm doing the selects (on another sales table):

Key Period Threshold Condition1 Condition2 Condition3 Customer List
1            
2            
3            

The last column in red is where the calculation is made. The table itself is growing and all conditions can change from one row to another, also the period and the threshold. The conditions are basically some calculated or manually-entered business-specific metrics and are more than 3.

The calculated "Customer List" column selects a list of customers from a sales table. This selected list are those who, each, has accumulated a sum of sales exceeding the row's Threshold during the row's Period while also fulfilling all the row's other Conditions.

 

Aurelien
Google Developer Expert
Google Developer Expert

Hi @Joseph_Seddik 

From my point of view, your calculation runs twice on a huge dataset and it would be great being able to reduce it.

Just some thought to share for help, is it possible to restrict with an intermediate virtual column the dataset to evaluate, and run the calculations over it?

For example:

1) let's name [_restrictedTheTable]

 

FILTER("theTable",
  AND(
    [condition-1],
    [condition-2],
    [condition-3]
  )
)

 

2) Then, I would go with:

SELECT(theTable[customer],
  AND(
   IN([id],[_THISROW].[restrictedTable]),
    SUM( SELECT(theTable[sale], 
      AND(
        [customer] = [_ThisRow-1].[customer],
        IN([id],[_THISROW].[restrictedTable])
      )
    )) > [_ThisRow].[threshold]
  ), true
)

This is a minor change, I'm not sure that would be efficient.

However, using the Reference properties, it may be usefeul to use a set of actions as described by @1minManager to flag relations between your customer table and your selection table.

For example, you could create a new column [_restrictedCustomer], type List of Ref with the expression:

[restrictedTables][customer]

Doing this could be used to track the rows of table that matches your selection criteria and establish a relationship with the sales spent with your customer, regarding the threshold.

I don't know if there is a kind of SELECT([Related XXX], condition) that exists, but I wish it was for your case! (I just tried it in case...but nope)

@Aurelien @1minManager 

Thank you both for sharing your thoughts. I did the following: 

  1. I added a single column Linking table with a descriptive calculated key like:
      period1-criteria1
      period1-criteria2
      period1-criteria3
      period2-criteria1
      period2-criteria2
    ... and so forth
  2. I added descriptive keys as well to the table where I needed to do the large calculations with nested SELECTs. I also added a virtual Ref column in the Linking table to this table. 
  3. In the large sales table I added two additional Ref column, one to the newly added Linking table and another to the table where I'm doing the calculations. These two columns formulate a corresponding key based on the data in each row. They are normal columns so they get updated only when the row's data change. 
  4. I also added an automation to add rows to the Linking table as needed.
  5. At the end in the table where I need the calculations, I replaced most of them with simple math on dereferenced values or very simple SELECTs with most of the values coming from dereferencing. 

------

  • We have: <start:[keys]>> but we don't have SELECT([keys]). Unfortunately the former is available only within templates, and I see no reason for not having an equivalent in column formulas.
  •  Offloading and partitioning calculations among different tables worked very well. However, I'm still surprised that an inner SELECTs with [_ThisRow-1] operates on the whole table ignoring the context of the enclosing SELECT. To me, this was a very unexpected behavior!

Hi @Joseph_Seddik 

Thank you for sharing your solution. This must have been a immense mental gymnastics!

Does the result match your expectation regarding calculation time on form opening?

And, just curious: do you see the improvement on the manage/monitor/performance profile/Performance analyzer or Audit History feature of the app editor?

I'm thinking about this graph:

Aurelien_0-1691166302184.png

 

 

Merci à toi mon ami !


@Aurelien wrote:

This must have been a immense mental gymnastics!


I have been using descriptive keys and reverse referencing very frequently for a while now. I was just perplexed for a moment by the strange SELECT() behavior and was trying to figure it out. 

Regarding performance, here I'm only using VCs for simple text concatenations to formulate the the descriptive-key references I need, nothing more. Besides, the problem I was having was with the time of calculation of the normal columns. When all columns are set, clicking Edit to open a form view in the calculated table (thus forcing them to calculate) took comfortably over 2 minutes !! 

Due to the large dataset I'm surveilling the performance monitor all the time. I haven't had any issue or any noticeable impact due the reverse referencing columns, and I have a lot of them all over the app. 

 


@Joseph_Seddik wrote:

When all columns are set, clicking Edit to open a form view in the calculated table (thus forcing them to calculate) took comfortably over 2 minutes !! 


Your app be like: 🤣


@Joseph_Seddik wrote:

I haven't had any issue or any noticeable impact due the reverse referencing columns, and I have a lot of them all over the app. 


 

Perfect then! Good job again for the trick!

Exactly 😂

Top Labels in this Space