Best practice for large csv imports?

I have a 140MB csv file that I’d like to import, and it seems to import into frontend memory fine but fails to persist that data to the server afterwards.

I didn’t see file size limitations in the docs, but I might have missed it. Do you have a recommended best practice for dealing with large imports like this?

I could do it with GraphQL, but would prefer to make it easier for less technical users to also perform the same task.

This is the action being run, and it works for smaller data sets without any problems.

image

Hello Matt,

When I need to persist large amounts of data, I like to do it in chunks of 1000 objects at a time.
You can achieve this by using a while-loop and two counters stored in App Variables.

On the first iteration you set Counter Start to 0 and Counter End to 1000.
Then select all objects where Index >= Counter Start && Index < Counter End. This will select object 0-999.

For the next iteration, increment both counters by 1000. The Select will now select objects 1000-1999. Again, persist only the selected objects.

Do this until you have reached the end of the array. When the end is reached, break out of the loop.

Hope this helps :slight_smile:

//Erik

Thanks, it helps! But the performance is really horrible. I followed your lead like this:

image

but each round of persisting 1k objects takes ~10 seconds. So importing this file is going to take at least 90 minutes. I’ll try again with 10k objects at a time and see how much that helps I guess.

90 minutes is a long time, even for a large CSV-file.
Is this import located in a big app? If it is, are there other datasources that are affected when persisting the data? This could explain the poor performance as they would need to recalculate runtime properties and dependencies after each persist.

I recommend moving the import job to a service. With this approach you can persist the CSV-file in the app and read it from a service later. The service can then process the CSV-file in the background without the users having to monitor the app.

You could also move the import job to a new app with less datasources, but this would require the user to not leave the app while persisting.

//Erik

This is a new, very small app, and there shouldn’t be any other datasources affected when persisting the data. I’ll look into the service approach a bit and also try importing via GraphQL I guess. I would expect this to work fine using the normal tools though. :frowning:

Do you have any examples of this kind of setup, where a user uploads a CSV file from the app, which triggers a separate service to persist the data?

Yes, I have a very simple showcase here. It consists of an app where the user can upload a file. When the file has been persisted a service responsible for processing the file is called. It takes the File Content URL of the file as an Query Param.

Here are some images of the setup:

  1. The app-action for uploading the file and calling the service

  2. The setup of the Service Endpoint that is responsible for handling the query params. Note that the Query Param is stored in Service Variables. The variable has the type: Internet URL.

  1. The action that is called ran when the Endpoint is called. The Internet URL is passed as an Action Param and used in the Import Data-action node to retrieve the file. You may ofcourse add additional logic to this action.

Service Action Param. Bound to the Service Variable: Param - URL

Since you expect the Service to take a while to process, you should probably enable Run Async on the Service Endpoint. This will make it so that the App does not wait for the Service to complete before going to the next action node. Please set the timeout accordingly as the default is 5 minutes.

If you’re worried about data races or if the service should not be run concurrently, turn on No Concurrency instead.

//Erik

1 Like

It looks like you are recasting the URL to a String in your service action parameter - is that necessary instead of using URLs throughout?

I almost have this working…I think…but I am getting 403 errors when the service is trying to run. My user should have access to run this service though, so I don’t understand why it’s giving me a 403. Do I have my data model objects setup incorrectly or something, so that the File URL is wrong? I’m using a database connected object’s File Content URL here as shown in the screenshot.




To others who might have the same issue, the solution to this problem was to use the File Content URL of the runtime object and not persisted object when calling the service :slightly_smiling_face:

//Erik

2 Likes