Performance in services/flows

Hello,

I have a performance-related question regarding services/flows. We have several heavy services that retrieve metadata from a large amounts of documents, checklists, certifications, etc. from external sites. (Usually 30-70 000 objects)

Based on previous discussions, documentation, ADFs, and conversations with AI-Ask, I understand that best practice is to actively use runtime data sources and batch-work with data connectors, avoid foreach loops where possible, read a limited amount of objects at a time, persist data once, run asynchronously, and otherwise apply relevant optimizations.

However, in practice this often performs poorly, as it tends to result in server timeouts or overload. When I’ve used Ask or other AI tools, I often get suggestions that either don’t fit with the no-code architecture, or involve mapping via functions. The latter, for example, caused high server load for us in mid-March to the point where you (Appfarm) had to tell us to calm down a bit :blush:

A concrete example: I want a service or combination of services that:

  • retrieves document metadata from an API (Unique Document ID, strings that can be used to download/navigate to the document)

  • updates existing documents in our data source if any changes have occured since last run

  • deletes documents in the data source that no longer are included in the API call

Currently, I have split the API calls into a separate action to reduce complexity. The action below aims to create, update and delete, while a previous action has retrieved all data from the API and stored in the Dokumenter i Landax (fra API) data source. The API-data is read into a runtime source (from API, temp), after which new documents are created directly in a data connector (alternatively, this could go via temp → persist).

The problem occurs when I try to delete or update the document objects in the data connector. At this point the service consistently crashes with the error “server did not get a response in time,” and it also impacts the entire environment’s API.

Previously I have attempted to read much smaller amounts of data to the runtime data sources and used pagination in the API-call to improve the performance. This has usually caused uniqueness constraint issues as the objects from the API have not necessarily had the same place as the corresponding object in the data source causing Object State = New to be applied to existing objects.

I am also aware that the easiest way to solve this is to not retrieve so many damn objects, but the upside of having all of these readily available to filter and sort in our apps have outweighed this so far.

Any recommendations for how I should approach this?

Hi!

Are you reading 70k objects in a single API call every time, to check for existence (if not, create, if so - update) or non-existence (delete)? That is not very optimal. Any way to get only object updates, created or deleted after a certain timestamp? That is the preferred option when syncing large quantities of data.

If not: the answer lies in batching as you mention. If the persist is too large, or a delete is too large (note that cascade delete also comes into effect), you may take down the API service temporarily.

Where possible, we have used webhooks to create and update objects in the data sources, which works nicely. However, external sites do not always support deletion events, which leaves some orphaned objects in Appfarm. This is more often than not unproblematic, but it would be good to check and clean the data at regular intervals.

Some endpoints do not support webhooks at all. This leaves us with two patterns: services that accept large batches in a single call (say 10,000 objects), or service actions that loop over API calls where each response is capped at 1,000 objects. Some endpoints offer timestamp-based filtering; others do not.

If I understand you correctly the recommended approach for our case is to: continue with webhooks or similar change-based updating where possible, and use smaller batches where that is not an option. If neither approach is sufficient, the question becomes whether all this data actually needs to be stored locally, or whether it can be handled another way.

Yes, understood correctly!

It sounds like you have a good strategy and viewpoint on this.

For the cases where neither approach is sufficient:

  • In some cases, “lazy sync” of data is possible. E.g. doing the API lookup when the data is needed, then doing the update/create/delete. E.g. adding a new sales order, search product via API, save product when first order occurs, then run a nightly sync (update or mark deleted) on the existing set of products.
  • In other cases, a separate object class can be used for the “raw entries” from the external system. This object class should have no inbound delete constraints or cascaded deltes, meaning each sync could just “delete all, recreate all” (still, persist in batches should be done). Another agent could then run on another schedule, processing these raw entries against the main entity that needs to be synced. This moves the external data “one step closer” at least, and gives you more flexibility to batch read or filter the reading.