CONTAINS() v/s FIND() in terms of speed.

I wonder which one outputs/finishes its result faster, for example

NOT(CONTAINS("He only needs two more cards to complete the set.", "complete"))

*outputs a logical value

v/s

FIND("complete", "He only needs two more cards to complete the set.") = 0

*outputs a logical value

0 24 896
24 REPLIES 24

The difference in speed with which these two operate is negligible.  Both need to traverse the text string to find the first occurrence of the search string.

But these two have slightly different uses. 

  • CONTAINS() simply reports the text is there or not.   
  • FIND() tells you the position of that search string so that you can take another action - such as LEFT(), RIGHT(), MID() - to extract out a portion of the source text.

Thank you @WillowMobileSys for pointing out the slightly different uses that more than slightly enhances understanding of which to use for efficient applications.

Each expression has its own reason to exist, most of them are provided as building block and used in composition with the others providing the desired outputs.

However, they sometimes could be used interchangeably. This's why I came up with the question of speed because each expression has different algorithm and costs differently (in com-sci language). The performance may look slightly different in tiny-toy apps but would not be the case in huge apps packed with huge number of rows.

As you say, each expression has its own reason to exist, as @WillowMobileSys pointed out and differentiated. As you are trying to find if a string is contained in another string and not where it's located in that string the CONTAINS() operator would be more appropriate as it returns logical(true/false) value you need. Using the find operator would require determining whether the string contains the substring but would then have to calculate the position of the substring within the string and then the additional operation of converting a number to a logical which is unnecessary additional overhead. As for which would be faster, I am pretty sure that the function that returns the true false value directly instead of returning a number and then converting that number to a true false value would probably be faster.

Interesting reason, thanks.

IMO, the NOT(CONTAINS(...)) would be faster than FIND(...) = 0 because the NOT() would just replace the input value but the "=" would do a binary word comparison.

They both will perform exactly the same - speed "negligible".   The most expensive part of these operations BY FAR, is finding, or not finding, the substring in the source text. 

To do that, both operations MUST test each character starting from the beginning of the source text and marching down the list of characters one-by-one testing each time to see if the search string is satisfied.

In the case of the search string NOT being present,  the entire source string will be tested - in both functions.  The extra step of testing "= 0"  or flipping FALSE to TRUE in the NOT situation, adds minuscule time  to the overall run time of the functions.   

They both perform the same.

Thanks for coming by but it does not answer the question. I don't care the term 'negligible' which yourself as a user cannot prove it. Also, I know what the expressions do things for me.

I need some more legit info.

Seriously?

Test it yourself if you don't trust what I am saying.  You likely won't find any "legit" details, whatever that means,  anywhere or from anyone else with regards to those two functions.  No one will have tested the speed,  It's just not worth the effort.

I DO KNOW how those functions work  and confident in my response.  I say "negligible" because if you were to perform timing tests you will NEVER get the same exact speed readings - even for the same function over the same exact data.  There are too many environment variables to say they run at exactly the same speed.

If you want the answer in terms of Big "O" analysis (look it up) - they are exactly the same!!

@WillowMobileSys  just this: ๐Ÿ‘

 

@Swoopy you may want to give it a try by yourself with this tool:

Introducing Virtual Column calculation time in Per... - Google Cloud Community

Would you mind keeping up posted if you find a meaningful difference? Thank you

I hope you would know that the calculation results from the mentioned link is not constant because of many factors involved. I just need technical info from engineer team (if possible/available).

Don't mind keep looking up post if they come by for the technical info.

In short & respect your concern, I just need technical info from engineer team (if possible/available).

Steve
Platinum 4
Platinum 4

CONTAINS() is case-insensitive, FIND() is case-sensitive.

Thanks for coming by.

That's another fair reason the NOT(CONTAINS()) would process faster than FIND() = 0

Ummm... No. Case-sensitive comparison (char1 = char2) is faster than case-insensitive (upper(char1) = upper(char2)). I would expect FIND() to be faster than CONTAINS().

Ahh...right, after rethought your expectation would be correct, the CONTAIN() would somehow/somewhere embed the UPPER() or LOWER() in its algorithm, which would make it needs considerably more processing time.

Now comes to NOT(CONTAINS(...)) and FIND(...) = 0, my thought is that NOT() would be faster than "=" operator. After all there would be some compensation between these 2 expressions but I personally prefer FIND(...) = 0 when working with long-text.

Now that we are on this:

True vs 0=0
Isblank() vs =""
Isblank(List) vs Count(List)=0
Isnotblank() vs Not(Isblank())
Filter(Table) vs Select(Table[Id])

...to be honest, I don't care, but if this topic is important for some, I don't think these wouldn't.

Btw @Swoopy, this is a community, so expect community-level discussion. If you want a technical answer, write to support and wait for as long as you can, send a PM to someone on the team or be patient and accept comments that could not fit on your ideals

Geez guys the person asked a very simple and straight fwd question for valid reasons and it is simple in that if you do not know then just say so. 

So when a comment like "this is a community.... community level discussion" is it referring to the support personnel expectations or the community members' expectations and such assumptions may be construed as somewhat offensive and demeaning.

There are also, in addition,  the whole series of EXTRACT() functions to consider and the question is why the differences? Is it just no-code convenience or speed?

As APPSHEET code is designed to run on multiple microprocessor-based platforms one presumes it cannot be optimized as is done on say DSP platforms where the difference between assembler and c-code is very marginal.

0.0001 scaled by 100000 is 10 secs so one can imagine.

For example, try Google to find out what programming language APPSHEET is written in?

I presume AS is written for no-code comfort rather than speed.

Does AS make use of any onboard graphics card for faster processing and is both interesting and valid for both techies and "community" members.

Please note I am just chipping in and not seeking any answer or debate.

I end with a quote from Praveen

The smartest move weโ€™ve made so far: โ€œListening to customers. Not just what theyโ€™re saying, but what their true need is.โ€

The biggest mistake weโ€™ve made so far: โ€œIf weโ€™d listened to customers better, weโ€™d have pivoted earlier.โ€

Capture.PNG

I've never judged anyone's post for what level it's on or if it matters for who or what. If I don't care the post, I rather just ignore it, it's just that simple.

Many questions have never been answered or solved. Then what level those posts are in? Are they worthless to be answered? or they may be too much technical/difficult for @SkrOYC

This community also includes AppSheet staffs that they may come by and join for any kinds of question as long as they relate to AppSheet. This's why I posted this here for any replies from whom willing to share their knowledge or idea.

Any ideas regarding the question are always welcome.

Steve
Platinum 4
Platinum 4

@Swoopy wrote:

Now comes to NOT(CONTAINS(...)) and FIND(...) = 0, my thought is that NOT() would be faster than "=" operator. After all there would be some compensation between these 2 expressions but I personally prefer FIND(...) = 0 when working with long-text.


The cost of NOT(...) vs (... = 0) is likely to be comparable and negligible, and therefore insignificant in relation to the cost of .... The significant cost is ....


@SkrOYC wrote:

True vs 0=0


TRUE is better because no operation is involved.


@SkrOYC wrote:

Isblank() vs =""

Isblank(List) vs Count(List)=0

 Likely equivalent.


@SkrOYC wrote:

Isnotblank() vs Not(Isblank())

Filter(Table) vs Select(Table[Id])

Are in fact equivalent. ISNOTBLANK() and FILTER() are macros.


@gregdiana1 wrote:

There are also, in addition,  the whole series of EXTRACT() functions to consider and the question is why the differences? Is it just no-code convenience or speed?


Meaning why both EXTRACT("X", ...) and EXTRACTX(...)? I've wondered that myself. My guess is just for convenience. I suspect EXTRACTX(...) is just a macro for EXTRACT("X", ...).


@gregdiana1 wrote:

As APPSHEET code is designed to run on multiple microprocessor-based platforms


To my knowledge, there is no native code in AppSheet--it is written entirely in interpreted or byte-code-compiled languages. At the very least, I believe there is very little native code. But I may be wrong.


@gregdiana1 wrote:

0.0001 scaled by 100000 is 10 secs so one can imagine.


This is exactly why virtual columns are a prime cause of performance problems.


@gregdiana1 wrote:

For example, try Google to find out what programming language APPSHEET is written in?


As is true for most large projects, several languages are used.


@gregdiana1 wrote:

I presume AS is written for no-code comfort rather than speed.


I wouldn't make that assumption, myself.


@gregdiana1 wrote:

Does AS make use of any onboard graphics card for faster processing and is both interesting and valid for both techies and "community" members.


I don't know, but I strongly doubt it.


@gregdiana1 wrote:

I end with a quote from Praveen


From the pre-Google day, I believe. ๐Ÿ˜ž

Thank you so much for this precious information. I'll surely come back to this post again and again in future when design or tweak of new expressions is required.

I've an app packed with 10+ tables, 30+ slices and max 15,000+ rows in the bigest table. There are text-verifying expressions required in many places (virtual columns and slices). I found, in average based on real testing, that the FIND(...)=0 gives about 19-22 sec in sync time while NOT(CONTAINS(...)) gives 25-35 sec. Note that the test was done on the same 5 years old Android phone (I always test my apps on old phones because it's easier to perceive the best tweak).

If the difference of the 2 functions is behind the results, it's not small gap for me and may be for others also. Hopefully it'd be useful for anyone at anytime.


@Swoopy wrote:

I've never judged anyone's post for what level it's on or if it matters for who or what. If I don't care the post, I rather just ignore it, it's just that simple.

Many questions have never been answered or solved. Then what level those posts are in? Are they worthless to be answered? or they may be too much technical/difficult for @SkrOYC


I was just trying to make a point on the fact that @WillowMobileSys's answer was very valid and packed with experience but for no obvious reason you just came up with this:


@Swoopy wrote:

Thanks for coming by but it does not answer the question. I don't care the term 'negligible' which yourself as a user cannot prove it. Also, I know what the expressions do things for me.

I need some more legit info


That's why I told you that you would be better using other ways to get "more legit info" rather than asking the wide variety of users on this community for opinions and then throwing those opinions to the garbage.

Sadly AppSheet's backend is a black box, as many experienced users have mentioned before.
Even if we want to help you with your question we can only give you our experiences and opinions, that's all. The best source for answers will be common sense (after understanding the core ways in which the platform works) plus testing.

Just take a look at this example:


@Steve wrote:

CONTAINS() is case-insensitive, FIND() is case-sensitive.



@Swoopy wrote:

That's another fair reason the NOT(CONTAINS()) would process faster than FIND() = 0



@Steve wrote:

I would expect FIND() to be faster than CONTAINS().



@Swoopy wrote:

my thought is that NOT() would be faster than "=" operator.


Then:


@Swoopy wrote:

I found, in average based on real testing, that the FIND(...)=0 gives about 19-22 sec in sync time while NOT(CONTAINS(...)) gives 25-35 sec.


There you have it, the answer was always "do testing to check" but even then we have some variables to consider that might affect the result.

I just want this place to be an open area to have discussions without disqualifications between users, I guess things sound different based on how we read them or it's just a language barrier but no matter how much I try to give help to you @Swoopy, most of the times it feels you think your needs are on a higher standard than others and I get the lesson so I'l step aside from now on


@WillowMobileSys wrote:

Test it yourself if you don't trust what I am saying.  You likely won't find any "legit" details, whatever that means,  anywhere or from anyone else with regards to those two functions.  No one will have tested the speed,  It's just not worth the effort.

I DO KNOW how those functions work  and confident in my response.  I say "negligible" because if you were to perform timing tests you will NEVER get the same exact speed readings - even for the same function over the same exact data.  There are too many environment variables to say they run at exactly the same speed.


๐Ÿ’ฏ


@SkrOYC wrote:

most of the times it feels you think your needs are on a higher standard than others and I get the lesson so I'l step aside from now on


Fine !

Firstly @Steve thanks for responding although it was not necessary.

And as has become clear @Swoopy  had already conducted the tests and was simply asking why, how, or where the noted time differences he has seen arise!

Although technical in nature I believe such detail could be valuable to no-code community-based members.

AS community support has been the crux of AS success as has been the community feedback to make it a valuable tool without which Praveen and the AS team would never be able to progress so far and sell it off to Google (which was the plan all along as investors do not do charity). So AS under google will only p[rogress if it satisfies its user community and as we know Google can pull the plug as and when it feels and it does as history has shown if it no longer serves shareholder (not community) interest. It is not personal just a fact of how Google works and why it made up its own attempts and bought AS.

 

Top Labels in this Space