All Questions
Tagged with query-optimization python
139 questions
3
votes
2
answers
164
views
How do I speed up querying my >600Mio rows?
My database has about 600Mio entries that I want to query (Pandas is too slow). This local dbSNP only contains rsIDs and genomic positions. I used:
import sqlite3
import gzip
import csv
rsid_db = ...
0
votes
1
answer
30
views
Filtering rows While Joining Two Tables(dataframes)
Currently I have 2 tables, lets say symbol_data and cve_data.
symbol_data is structured as below:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ...
0
votes
0
answers
143
views
How to implement join order optimization using dynamic programming in Python?
I am working on a project to optimize SQL queries by determining the optimal join order using dynamic programming in Python.
Context
I am connecting to a PostgreSQL database and need to calculate the ...
0
votes
0
answers
27
views
How to do bulk upload in particular key value for json field in django
I need to update a particular key value in json field and other field should not be impacted or override.
variant_to_update = master_models.Variant.objects.filter( trust_markers__contains = {...
0
votes
1
answer
59
views
Why is subtracting pandas.timedelta from pandas.date not vectorized?
Running this on Pandas 2.2 and I see a PerformanceWarning only when subtracting timedelta from date but not subtracting timedelta from datetime:
import pandas as pd
s1 = pd.DataFrame({"year"...
-1
votes
2
answers
199
views
Swiss Scheduling System for Ping-Pong league
I am working on scheduling 8 showcases for the year. I have dataframes for each showcase that have rank and conference from where each player is from. Each showcase has a different number of players ...
1
vote
0
answers
67
views
Efficient Django query for model with foreign key to itself
I'm building a simple blog post website using Django. Basically there are two models: Post and Comment, where the comment model looks like this:
class Comment:
post = models.ForeignKey(Post, ...
0
votes
0
answers
244
views
Parallelizing Spark's Pandas API Operations
Spark's Pandas API allows for Pandas functions to be performed on top of a Spark dataframe that looks and behaves like a Pandas Dataframe. Pandas has functions that Spark does not have implementations ...
0
votes
2
answers
177
views
What is a Plain Text Translation of this SQL Query?
My SQL knowledge is really basic, and need some help in translating this rather long query (running in a python script querying some AWS athena database, with some f-strings embedded) to plain English ...
2
votes
2
answers
171
views
Using index better than sequential scan when every hundredth row is needed, but only with explicit list of values
I have a table (under RDS Postgres v. 15.4 instance db.m7g.large):
CREATE TABLE MyTable (
content_id integer,
part integer,
vector "char"[]
);
There is a B-Tree index on ...
1
vote
2
answers
121
views
How to address N+1 problem in django with prefetch related?
With the following code I am getting N numbers of queries based on the loop. How to avoid that ?
I tried using prefetch_related but that didn't worked or am i doing the wrong way ?
models
class ...
0
votes
0
answers
1k
views
Optimizing INSERT and Data Retrieval Performance in DuckDB with Large Datasets
I am working on a project involving large datasets, and I am utilizing DuckDB to manage my data. I am encountering performance issues when executing INSERT operations compared to SELECT operations, ...
1
vote
1
answer
74
views
Setting multiple variables from one pandas dataframe.loc request
I'm writing a web app for the end of year laptop return to make our lives easier. I have a .csv file containing device serial numbers and the associated user's first and last name, user code, ...etc. ...
0
votes
1
answer
57
views
Improve MongoDB query in Python
I have a MongoDB database including hiking trails that I visualize in a Jupyter Notebook. Each trail has a start and destination and includes a timestamp when it was updated. So there are multiple ...
0
votes
1
answer
357
views
Django prefetch related used inside a serializer method making too many queries. How to use prefetch related to reduce queries in following case?
By investigating why some apis are taking too much time to respond I encountered that it's because of the db queries increases as the queryset increases. In Django select related and prefetch related ...