How to remove duplicate rows from my data My ...

(Karl Rolfe) #1

How to remove duplicate rows from my data My data comes from a MySQL database and it has duplicate rows. I want to keep the original data for historical reasons but I need to apply updates to the latest data rows.

My Problem: I can use a View with a GROUP BY clause in MySQL to remove the duplicates however then I can’t update the data because MySQL doesn’t allow updates on Views with GROUP BY clauses in them. My Question: Is there a way in AppSheet to filter the duplicate rows?

The view I’m using in MySQL is: SELECT

  • FROM

admin_po._lines GROUP BY po_number , qty , rate , running_total;

In my sample data below the last three lines are a duplicate of the first three. I want to show my App user just the last three lines

(Praveen Seshadri (AppSheet)) #2

Hi Karl, there’s no reasonable way to do this if the rows are truly identical. There needs to be some column in the database that differentiates the rows. Otherwise, how do you know which are the “newest” ones and which are the “historical” ones?

(Karl Rolfe) #3

Thanks for your reply @praveen. Actually the rows are not truly identical, the data fields line_id and updated_UTC differentiate them so they can be delineated. The fields I am using to call them duplicates are po_number , qty , rate and running_total. If those three fields are identical then I am calling it a duplicate.

(Karl Rolfe) #4

I’m thinking if it’s at all possible to achieve in AppSheet then it would have to be a slice but I really have no idea of how I might go about it. In my example above I want to keep the three rows with the later updated_UTC date and filter out the earlier ones. But the criteria needs to be based on entries having identical data in po_number , qty , rate and running_total

(Karl Rolfe) #5

Thanks for that @Suvrutt_Gurjar it works perfectly. And for good measure I added one more criterium so my final slice formula looked like: AND(([line_id]=MAXROW(_lines, updated_UTC, [DuplicateEliminate]=[_THISROW].[DuplicateEliminate])), ([invoiced] <> “Y”))

(Suvrutt Gurjar) #6

Hi @Karl_Rolfe, thank you for the update and sharing your final formula. Good to know it works as per your needs.

(Suvrutt Gurjar) #7

Hi @Karl_Rolfe, Based on your latest post that you have

UTC_Updated is differentiating value for these columns,you may wish to try following approach,which I believe worked when I tested it on similar data pattern.

You may wish to create a virtual Appsheet column combining four fields po_number , qty , rate and running_total . This column is called say [DupEliminate]

Then you may wish to put the following expression in the slice to select the latest row by UTC_updated but have

fields po_number , qty , rate and running_total duplicating.

([Table Key Column]=MAXROW(Table Name, UTC_updated, [DupEliminate]=[_THISROW].[DupEliminate]))