What is a best way (performance wise) to mark duplicate entries in the datasource?

Hi,

Let’s say i have a data source where form entries from end users are stored. Form entry has phone and e-mail. I would like to mark items with the same phone and email as duplicates.
I know that ideal way will be to create for example Person object to store phone/email, and connect form entries to Person. But unfortunately there is already many thousands of form entries stored.

Thank you

Hi!

It may be other more efficient ways, but in general I would suggest the following (let’s call that object class “Form entry”)

  1. You should identify duplicates upon saving those new form entries. Form entry should have a property “Is duplicate” (boolean datatype). Upon saving a new Form entry - perform a read objects (from the database) of other Form entries with same Email and Phone (but not same ID). If an object is found, mark the new Form entry as duplicate (Is Duplicate set to true).

Now, you have taken care of all new entries, and just need to run an update of those “some thousand” old entries without this Is Duplicate flag set…

  1. Create an Action to be executed once (maybe best to add to a dedicated new App). You would need a datasource holding all Form entries. This datasource can have a Runtime property Is Duplicate (calculated), boolean datatype. Lets call this datasource “Form Entries (all)”. Add the same datasource as Function parameter (name ut e.g. formEntries), with filter Created Date < Form Entries (all).Created Date AND Email Equals Form Entries (all).Email AND Phone Equals Form Entries (all).Phone. The function itself should then be return formEntries.length > 0
    In the case above, all entries will have a calculated property Is Duplicate (calculated) that is true for all duplicates. The action only needs to perform a single Update Object on that Data Souce, setting the (database) property Is Duplicate to true for all entries with Is Duplicate (calculated) Equals true
1 Like