I am developing a FastAPI application which will be deployed on EKS.
Purpose of this application: the API should get input in the requests, application will extract the filter and sort and pagination information from this request, then use the snowflake connection to connect to snowflake, and get the result (rows of size around 10 mb). now, this request should be converted into proper JSON format. And then the response should be returned back.
Here, for pagination, to know that the next page is present, I am pulling x+1 rows and while sending the response, I will remove the last row. Also, I am not using OFFSET in SQL for pagination, as it is slow, I am using lexographical where clause.
My pain point is that snowflake's snowpark lib is not async, and pandas has gil lock while execution and similar for json.load(), so if I normally try to run the API in load test, it takes a lot of time. Even a single request is taking around 3-4 seconds, in load test this value just skyrockets.
I tried to implement threading, so that snowflake.collect operation and pandas operation and other operations can go in parallel for multiple users. It has helped to some extent, but I have single pod 2 v cpu, so max 4 threads can run in parallel.
For load test, I am still seeing slow response time. Am I doing something wrong here?
In Snowflake query, I want the last row to create pagination. But there is a scenario, I have to perform a join and then i want output like for each user, {all actions}, like:
{ "user1":{{action1}, {action2}},
"user2"...
}
If I try to do this directly in snowflake and get result, then the pagination is lost for action table. I am reading the join from snowlflake and then creating it in this format in my application and returning response.
Is there a way to solve this issue?
asyncdoesn't mean "faster" or "parallel". It means you need fewer threads to do the same work, instead of blocking some while waiting for IO. The IO work will still take the same time. You didn't post any code so we can't guess why things are slow -this value just skyrocketsprobably means the wrong paging technique is used. Askip/takeapproach has to find all the rows it needs to skip and so gets progressively slower.using lexographical where clausewhat does this mean?rows of size around 10 mbyou mean each row is 10MB? What does it contain? With 10MB you'll get details just from the network transfer. And allocating such huge buffers that need to be cleaned. And then even larger JSON strings. If any of these things gets converted to a Pandas dataframe it will also take at least 10MB. We can't guess what's going on but avoiding any intermediate conversions and just streaming the data would be faster. Reading only the data you need. Using Snowflake instead of Pandas for transformations