Best practice for large csv imports?

matt · March 25, 2024, 8:50am

I have a 140MB csv file that I’d like to import, and it seems to import into frontend memory fine but fails to persist that data to the server afterwards.

I didn’t see file size limitations in the docs, but I might have missed it. Do you have a recommended best practice for dealing with large imports like this?

I could do it with GraphQL, but would prefer to make it easier for less technical users to also perform the same task.

This is the action being run, and it works for smaller data sets without any problems.

ErikAKSkallevold · March 25, 2024, 9:19am

Hello Matt,

When I need to persist large amounts of data, I like to do it in chunks of 1000 objects at a time.
You can achieve this by using a while-loop and two counters stored in App Variables.

On the first iteration you set Counter Start to 0 and Counter End to 1000.
Then select all objects where Index >= Counter Start && Index < Counter End. This will select object 0-999.

For the next iteration, increment both counters by 1000. The Select will now select objects 1000-1999. Again, persist only the selected objects.

Do this until you have reached the end of the array. When the end is reached, break out of the loop.

Hope this helps

//Erik

matt · March 25, 2024, 11:07am

Thanks, it helps! But the performance is really horrible. I followed your lead like this:

but each round of persisting 1k objects takes ~10 seconds. So importing this file is going to take at least 90 minutes. I’ll try again with 10k objects at a time and see how much that helps I guess.

ErikAKSkallevold · March 25, 2024, 2:34pm

90 minutes is a long time, even for a large CSV-file.
Is this import located in a big app? If it is, are there other datasources that are affected when persisting the data? This could explain the poor performance as they would need to recalculate runtime properties and dependencies after each persist.

I recommend moving the import job to a service. With this approach you can persist the CSV-file in the app and read it from a service later. The service can then process the CSV-file in the background without the users having to monitor the app.

You could also move the import job to a new app with less datasources, but this would require the user to not leave the app while persisting.

//Erik

matt · March 26, 2024, 10:12am

This is a new, very small app, and there shouldn’t be any other datasources affected when persisting the data. I’ll look into the service approach a bit and also try importing via GraphQL I guess. I would expect this to work fine using the normal tools though.

matt · March 26, 2024, 11:19am

Do you have any examples of this kind of setup, where a user uploads a CSV file from the app, which triggers a separate service to persist the data?

ErikAKSkallevold · March 26, 2024, 7:29pm

Yes, I have a very simple showcase here. It consists of an app where the user can upload a file. When the file has been persisted a service responsible for processing the file is called. It takes the File Content URL of the file as an Query Param.

Here are some images of the setup:

The app-action for uploading the file and calling the service

bilde2253×322 13.8 KB
The setup of the Service Endpoint that is responsible for handling the query params. Note that the Query Param is stored in Service Variables. The variable has the type: Internet URL.

bilde500×967 16.5 KB

The action that is called ran when the Endpoint is called. The Internet URL is passed as an Action Param and used in the Import Data-action node to retrieve the file. You may ofcourse add additional logic to this action.

bilde2249×396 13.6 KB

Service Action Param. Bound to the Service Variable: Param - URL

Since you expect the Service to take a while to process, you should probably enable Run Async on the Service Endpoint. This will make it so that the App does not wait for the Service to complete before going to the next action node. Please set the timeout accordingly as the default is 5 minutes.

If you’re worried about data races or if the service should not be run concurrently, turn on No Concurrency instead.

//Erik

matt · March 27, 2024, 10:35am

It looks like you are recasting the URL to a String in your service action parameter - is that necessary instead of using URLs throughout?

I almost have this working…I think…but I am getting 403 errors when the service is trying to run. My user should have access to run this service though, so I don’t understand why it’s giving me a 403. Do I have my data model objects setup incorrectly or something, so that the File URL is wrong? I’m using a database connected object’s File Content URL here as shown in the screenshot.

ErikAKSkallevold · March 27, 2024, 12:39pm

To others who might have the same issue, the solution to this problem was to use the File Content URL of the runtime object and not persisted object when calling the service

//Erik

Topic		Replies	Views
Data from CSV Ask the community app-data	3	241	February 23, 2023
Set huge data source before an import block the import action Bug reports actions	3	246	May 31, 2023
When persisting a File Object, receives status code 400 Bug reports	1	184	September 1, 2023
Post Graphql - how large can create batches be? Ask the community	3	29	June 16, 2025
CRUD operations manually to the database Ask the community app-data	2	216	June 20, 2023

Best practice for large csv imports?

Related topics