Import data from CSV

Hi Phil,
Yes, thing should happen one after another, step by step.
Once the error is detected the operation ceases, which should work as safe guard to protect the wrong data (unexpexctedly) uploaded to the data source and force the user to work on app to delete data one row and then another… which will be hours and days jobs, but it can be avoided by validation process like that.

In the meantime, SQL and speed stuffs you mention right here, I m not perfectly sure what it is.
On this thread, we are addressing “import CSV”, and you are introducing new feasture to improve the speed to dump large amount of data to SQL. For instance, add and update thousands or rows via CSV import, and then will improve the performance at the end, likely massively in case app is hosted over the cloud sql.

This is also valid and applied when we add / update large amount of rows through APIs as well?
I suppose you answer should be yes.

Im wondering if there is also improvement of the performance of Appsheet “Action” to add new rows, update bulk rows which is similar operation like API, CSV import.

Hi Koichi-san,

The original customer problem reported that Import CSV into an AppSheet table that used SQL was taking too long. They reported that if they loaded a few hundred rows, the Import CSV dialog reported success but the import operation took too long. They reported that if they loaded a few thousand rows, the Import CSV dialog reported failure but eventually the rows appeared to load.

When I investigate this problem, I learned that the performance problem was not specific to the Import CSV command. Performance was poor whenever we added lots of rows to any SQL based table. This was true whether the rows were being added as a result of a Import CSV command, an ETL operation, or by someone invoking the REST API to add many rows.

When we are adding rows, we first check to see if each record already exists. If it does not exist, we insert the record. If it does exist, we update the record. I was able to significantly improve the performance of both checking whether each record exists and adding the record. We now check whether the added records exist by doing a SELECT WHERE query that specifies up to 200 keys at a time. If you have more than 200 added records, we do this in batches of 200 keys each. As a result, the performance depends on how many records you are adding and how many records are present in the existing table. It is IMPORTANT that you have a SQL index on the key fields to make the SQL SELECT WHERE query as efficient as possible for SQL to execute.

As a result of these performance improvements, I was able to significantly improve the performance of Import CSV, ETL, and the REST API when adding records to a SQL table. When I loaded 500 rows into an empty SQL table it reduced the load time from 6 minutes 45 seconds to under 4 seconds. I was able to load 3000 rows into an empty SQL table in 16 seconds. If you have existing rows in your SQL table the load times will be higher. The more existing rows in your existing SQL table, the slower because they need to be checked when determining if the added rows already exist. That is why you need to have a SQL index on the key fields to make the SQL SELECT WHERE queries as efficient as possible.

While I was testing my change, I noticed that the Import CSV dialog box had two problems. It was not displaying an error message if an input validation or other error occurred. It was not displaying a success message if Import CSV succeeded. I updated the Import CSV dialog to display an error if any was detected. I updated the dialog to display a success message and to report how many rows were loaded if the Import CSV succeeded.

As a result of your Import CSV testing, I investigated and fixed a few other problems affecting Import CSV. These included automatically assigning key values to new added rows when the key value was omitted provided an InitialValue was defined for the key field. I also fixed other data validation problems which you and I found.

To fix these problems I had to make fundamental changes. I tried to test these changes carefully, but there is risk with every change. These changes are no exception. We cope with that, in part, by rolling changes out gradually.

In this case, I am offering to let those of you who want to try these changes early access to them. I can enable these changes on an account by account basis. If you wish to be added to the rollout, your entire account and all of the application within it will see the effects of these change. That is why I suggest that if you wish to try the change you use a test account, a test app, and test SQL data. I can then enable these change for your test account. Once you have confirmed that the changes are working for you test application, I can add you production account to the rollout.

2 Likes

When we import, what is it looking for, simply a column name?, what if they are out of order, gaps, etc.?

Match on column name. Column order not important. Omitted columns allowed, provided required fields without InitialValues are specified.

2 Likes

I’ve got an issue where I have an app formula set to NOW() - I wanted this to trigger only if there was an actual update… it appears this is getting updated even if nothing about the record changed. So it appears that some form of updates are triggering even if nothing had changed.

Why not use a ChangeTimestamp instead?

1 Like

I should actually be using that… (But, an app formula shouldn’t trigger unless the record is updated… right?)

It’s still a bug that needs to get fixed, I just noticed you had a more explicit and exact need of the ChangeTimestamp instead.

Correct.

Actually, let’s think about an edge case… Suppose updates come in for a row, so the saved row is loaded, the updates are applied, and the app formulas are evaluated. Suppose the “updates” just happened to be the same values already present in the row, so following the “update”, none of those column values changed. But the columns with app formulas may have… Thus, the row has changed and the updated row is saved.

1 Like

When I typed my previous response, I was thinking of exactly this, but couldn’t remember how to describe it at the time.

Also, I doubt any performance improvements could possibly come from @Phil having it check if any of the values for that record are actually changing, and dropping the update if all the data matches. But I don’t know for sure.

1 Like

Maybe I need to play with it a bit more… maybe some finesse this time.

1 Like

Hi Phil - san,

Thank you very much for your detailed explanations, I m now fully cathin up wiht you I hope. I will create new appsheet account for testing purpose, and will get back to you separatetly to conduct testing. Yes, all my production app is running on MS SQL as data source, better to avoid to conduct testing on the same account where we may see unexpected behaviro which may interfere the use of prouction apps.

The general performance improvement is surely good news for users like me, hosting data on SQL instead of Sheet.

For now, I came up with two questions I hope I can get your advice over.

  1. SQL index

It is IMPORTANT that you have a SQL index on the key fields to make the SQL SELECT WHERE query as efficient as possible for SQL to execute.

To improve the performance in general, I will take the suggetion to make key fields as index. I just wondered if I should make the other fields which could be employed afte WHERE clause on SQL select statement. For instance, I do have date fieds, and on appsheet expression i filter out the older data whose date filds is older than X date. In that case, I assumed it will improve the query performance (both on SQL and Appsheet engine) by making “date” fields as idex as well as key?

Or doing so will not affect any performance at the end?

  1. Possible workaround to make the performance far better.

I understand the technical mechanism in terms of how the rows are added and updated by API , CSV import. So all in all, the update/add operation against the table with “smaller amount of exisitng rows” should perform better rather than table with tons of existing table.

To improveperformance better, I thought there is possible workarond like this

a. First, on the SQL server, create a VIEW out of table we wish to add new rows by API/CSV file.
b. This view is made out of table with filter condition, where the matched rows is marginally zero. For instance, filtering by date filter, with condition, date <~ 2100/1/1 something like that, and returning zero row as we dont have existing rows mathing such condition.
c. Read proper table as well as this VIEW to Appsheet.
d. Create action for CSV file import, but this action is associated to VIEW instead of Table.

Whenever the import csv file is fired, then the row will added to VIEW instead of table, where the existing rows are basically ZERO all the time, and which will make the performance far better? The uploaded rows to this VIEW will be ruled out after import opration as it does not matching the condiitions.

If this works, then in case of “adding new rows” ops will perform better? although it wil cause the problem when we UPDATE the rows though.

A little testing this morning. (Note, I’m using Oracle DB) When I loaded 2 records, the app instantly went to “Starting Up” screen, and then provided this. One thing is that the dialogue box only has option for cancel, which is misleading.

When loading in a file with 871 records, there is no catch, or feedback that anything happened. (I have the biggest urge to keep uploading the file over and over again) After a few minutes when I decide to sync, the records are indeed there. Then, all of a sudden after perusing the app for a few minutes, this dialogue pops up:

In general I would rather have some control of this dialogue box, or at least be able to define a “Display Table Name” I’ve ran into this in other places, a table display name would tidy that up a bit. (But of course control over this message/variables at the action level would be cool)

One more thing, it seems the CSV Import Action is pushing a different time than it should be. The top record was manually edited, and the bottom record was updated by the import moments before.
image

@Phil @Steve @Bahbus I went ahead and created a super long video (10min) showing what I’ve got going on. Would love to get this even semi-functional, especially from the change table perspective. I stripped out all my expressions, and set the changetimestamp to watch the main columns. It is still catching a change when rows are “Overwritten” Let me know what you guys think the best way to setup a change table is.

Hmm, I had assumed ChangeTimestamp would internally run some sort of quick BEFORE<>AFTER in order to actually trigger.

Same, but apparently it’s just snagging “did it get updated” unless I can figure out that "Changed values section, but I don’t see that anywhere in the docs…

Hi Grant,

I have been tied up on other customer issues and have not gotten back to Import CSV.
I have added your account to the Import CSV rollout.
If testing goes will, you will be included in the rollout early next week.

1 Like

Is there a demo app we can try for this please?

I don’t think any demos exist for it yet.

Hi @Phil
Could my account please be added for Import CSV when you get back to it, 129805, thanks.