Solved: Selecting text to the right of text in a string

Lukevancleave · 04-28-2022 08:04 AM

Hi, I'm trying to use a column with a string created from an OCR scan of an image to populate other columns in a form. So far my string looks something like this

Order# 0068348 BOL# XXXXX o From: DOWAGIAC, MI 49047 To: IRON RIDGE, WI 53035 call Ship date: April 28, 2022, 0001 - 2359 Delivery date: April 29, 2022, 0001 - 2359 Carrier Instructions: , Loaded miles: 239, Empty miles: 167, leave one copy of the BOL DO NOT BOBTAIL INTO THIS LOCATION NO REEFER TRAILERS AT THIS LOCATION MUST BRING A DRY VAN

And I am trying to extract something like the "239" loaded miles. So I'm trying to make a virtual column that says find "Loaded miles: " and then return the number after it. Are there any text expressions I can use to do that? Thanks

Rafael_ANEIC-PY

Hello @Lukevancleave , try with this expression:

MID([stringColumn],
FIND(" Loaded miles: ",[stringColumn])+LEN(" Loaded miles: "),
FIND(", Empty miles:",[stringColumn])-FIND(" Loaded miles: ",[stringColumn])-LEN(" Loaded miles: ")
)

View solution in original post

Suvrutt_Gurjar

Nice one @Rafael_ANEIC-PY

@Lukevancleave : You could also try the following expression for loaded miles if you are not likely to have mentions (ie @) in your OCR string and your OCR pattern is standard. Please test well.

SUBSTITUTE(EXTRACTMENTIONS(SUBSTITUTE([OCRString],"Loaded miles: ","@")),"@","")

And this one for empty miles:

SUBSTITUTE(EXTRACTMENTIONS(SUBSTITUTE([OCRString],"Empty miles: ","@")),"@","")

https://help.appsheet.com/en/articles/3497820-extractmentions

View solution in original post

Suvrutt_Gurjar

Adding one more possibility , again with a EXTARCT() variant and with a condition.

One could use EXTRACTNUMBERS()

For loaded miles:

EXTRACTNUMBERS(MID([OCRString], FIND("Loaded miles: ",[OCRString])+14,6))

I believe the above will work well for all conditions.

For Empty miles, with two conditions:

If empty miles are not going to be greater then 3 digits as mentioned, and if the empty miles will always be followed by a text message, then one could use

EXTRACTNUMBERS(MID([OcrString], FIND("Empty miles: ",[OCRString])+13, 3))

These may or may not work in current post's requirement. Just thought of adding above in case someone else finds use cases for using EXTRAXT() functions for extracting part of strings.

View solution in original post

Rafael_ANEIC-PY

Hello @Lukevancleave , try with this expression:

MID([stringColumn],
FIND(" Loaded miles: ",[stringColumn])+LEN(" Loaded miles: "),
FIND(", Empty miles:",[stringColumn])-FIND(" Loaded miles: ",[stringColumn])-LEN(" Loaded miles: ")
)

Lukevancleave

That is exactly what I was looking for! But my next conundrum is this: How would I do the same thing for the numbers after "Empty Miles:" since the string after that is not always consistent.

*Edit* I'm thinking of just having it start at the first character over from "Empty Miles: " and then checking if it's a comma and including if it is not the comma.

**EDIT EDIT**

This seems to work okay for my use case! Since empty miles are never more than three digits, they will never have a comma in the string of numbers. So I just have it check if that number of characters includes a comma and IFS() my way back until it gets the number

IFS(
Not(Contains(MID([Trip Page Text],
FIND(" Empty miles: ",[Trip Page Text])+LEN(" Empty miles: "),
4),","))
,
MID([Trip Page Text],
FIND(" Empty miles: ",[Trip Page Text])+LEN(" Empty miles: "),
4)
,
Not(Contains(MID([Trip Page Text],
FIND(" Empty miles: ",[Trip Page Text])+LEN(" Empty miles: "),
3),","))
,
MID([Trip Page Text],
FIND(" Empty miles: ",[Trip Page Text])+LEN(" Empty miles: "),
3)
,
Not(Contains(MID([Trip Page Text],
FIND(" Empty miles: ",[Trip Page Text])+LEN(" Empty miles: "),
2),","))
,
MID([Trip Page Text],
FIND(" Empty miles: ",[Trip Page Text])+LEN(" Empty miles: "),
2)
,
Not(Contains(MID([Trip Page Text],
FIND(" Empty miles: ",[Trip Page Text])+LEN(" Empty miles: "),
1),","))
,
MID([Trip Page Text],
FIND(" Empty miles: ",[Trip Page Text])+LEN(" Empty miles: "),
1)
)

Suvrutt_Gurjar

Nice one @Rafael_ANEIC-PY

@Lukevancleave : You could also try the following expression for loaded miles if you are not likely to have mentions (ie @) in your OCR string and your OCR pattern is standard. Please test well.

SUBSTITUTE(EXTRACTMENTIONS(SUBSTITUTE([OCRString],"Loaded miles: ","@")),"@","")

And this one for empty miles:

SUBSTITUTE(EXTRACTMENTIONS(SUBSTITUTE([OCRString],"Empty miles: ","@")),"@","")

https://help.appsheet.com/en/articles/3497820-extractmentions

Lukevancleave

In this specific example, that does yield the correct results for loaded and empty too! Thank you! Now, it's rare but sometimes the character "@" may get included along with the notes at the end of the OCR string like part of an email or that someone may have added as a shorthand for "at" like for an appointment time. would that throw this expression off then?

Suvrutt_Gurjar

Yes, if there are going to be more mentions , it could fail. However you could use EXTRACTPRICES() instead, if $ symbols are not going to be there in the string.

SUBSTITUTE(EXTRACTPRICES(SUBSTITUTE([OCRString],"Loaded miles: ","$")),"$","")

Or else please continue with what solution you and @Rafael_ANEIC-PY have got. Those are nice solutions.

If I can think of any more alternative, apart from EXTRACT() functions variants, I will post tomorrow.

Suvrutt_Gurjar

Adding one more possibility , again with a EXTARCT() variant and with a condition.

One could use EXTRACTNUMBERS()

For loaded miles:

EXTRACTNUMBERS(MID([OCRString], FIND("Loaded miles: ",[OCRString])+14,6))

I believe the above will work well for all conditions.

For Empty miles, with two conditions:

If empty miles are not going to be greater then 3 digits as mentioned, and if the empty miles will always be followed by a text message, then one could use

EXTRACTNUMBERS(MID([OcrString], FIND("Empty miles: ",[OCRString])+13, 3))

These may or may not work in current post's requirement. Just thought of adding above in case someone else finds use cases for using EXTRAXT() functions for extracting part of strings.

Lukevancleave

That’s good info, happy to be learning about this. However the empty miles are also sometimes only 1 or 2 digits and there is not always text afterwards. There is always a comma at the end of the empty miles though.

Suvrutt_Gurjar

You are welcome. The suggested expression will work for 1 to 3 digits well.

Thank you for mentioning that there will always be a comma. If there will also always be a space after comma as highlighted by underscore before the next string starts "Empty miles: 167,_leave one" and if empty miles will always be at least denoted by 0, then I believe the suggested expression expression will work for Empty miles for all other conditions as well.