A couple of questions about ML prediction

Derek_N · 07-13-2020 07:40 PM

@tony (I hope it is OK to ping you directly - you seem to be the ML guy?)

1: Does training continue if I log out? I uploaded the Iris dataset (150 rows, 4 columns) an hour or so ago and it is still training. I can imagine that a larger dataset will take many days (of course depending on many variables)

2: Is there a public doc describing training optimzation available? Eg shows me options/ benefits for feature scaling/ normalization?

3: Would you describe the algorithm under the hood as a production ready ML? Or maybe a ‘novelty’ implementation demonstrating how other Google Cloud services can be integrated with AppSheet?

4: I’m excited by the possibilities that ‘democratized’ ML apps can bring to businesses - especially integration with other Cloud ML services in the future. Would you say that the current implementation is ‘fast enough’ for training. Or would it be better for a dev to train their own models on another (faster?) platform and then go down the traditional code app path?

5: I’m a ML newbie so I don’t have any specific use cases in mind. I imagine that once I get my head around the tech I will be seeing opportunities everywhere

Thanks and cheers

tony1

Hi @Derek_N.

Training should be relatively quick (a minute or two at most). The fact that the Iris dataset is not training suggests that there’s a bug. Could you create an isolated app that reproduces the issue and send the details to support@appsheet.com? We can take a look.
The platform takes care of scaling and normalization. Right now, we keep the details of the training algorithm and model hidden from the app creator. We’re considering future designs that include the ability to compare model alternatives, so your feedback is welcome.
Right now the performance of the system is below Google AutoML Tables, but we’re investing more in the ML features so improvements should be coming.
Unless you’re dealing with truly big data, the training time won’t be the bottleneck. The biggest bottleneck is in developer time, which we hope to reduce by providing no-code features for ML.
Please share any ideas/questions you have on the community, it’s very helpful for us to learn about use cases and scenarios that you’re interested in.

Derek_N

Thanks Tony.

1: I have reproduced the issue and have sent an email to support with the app url.

2: Roger. May I suggest making the model feature importance visible as well? As I understand it (very, very poorly) model and local feature importance is analogous to the ‘multivariate principal components analysis’ and would also allow the developer to engineer/ verify that the ‘design of experiment’ / model features are necessary and sufficient?

3: Thanks for the AutoML Tables heads up. Very helpful. I’m guessing the models are currently assigned from the ‘column to predict’ types. Eg

Yes/No - Logistic Regression
Enum - Logistic - Multiclass
Ref - Logistic - Multiclass (is this correct? Not sure what Ref means)

Price - Linear Regression
Decimal - Linear Regression
Number - Linear Regression

4: Roger.

5: Will do.

Cheers

tony1

Feature importance is shown after training in the editor.
There are three kinds of models that get trained, depending on the type of output column: binary classifiers (yes/no columns), multi-class classifiers (enum columns), regressions (numeric columns). “Ref” means a reference column.