Skip to main content

All Questions

3 votes
2 answers
164 views

How do I speed up querying my >600Mio rows?

My database has about 600Mio entries that I want to query (Pandas is too slow). This local dbSNP only contains rsIDs and genomic positions. I used: import sqlite3 import gzip import csv rsid_db = ...
gernophil's user avatar
  • 619
0 votes
1 answer
30 views

Filtering rows While Joining Two Tables(dataframes)

Currently I have 2 tables, lets say symbol_data and cve_data. symbol_data is structured as below: # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ...
fatih's user avatar
  • 1
0 votes
0 answers
143 views

How to implement join order optimization using dynamic programming in Python?

I am working on a project to optimize SQL queries by determining the optimal join order using dynamic programming in Python. Context I am connecting to a PostgreSQL database and need to calculate the ...
user25210056's user avatar
0 votes
0 answers
27 views

How to do bulk upload in particular key value for json field in django

I need to update a particular key value in json field and other field should not be impacted or override. variant_to_update = master_models.Variant.objects.filter( trust_markers__contains = {...
Ayush Kumar's user avatar
0 votes
1 answer
59 views

Why is subtracting pandas.timedelta from pandas.date not vectorized?

Running this on Pandas 2.2 and I see a PerformanceWarning only when subtracting timedelta from date but not subtracting timedelta from datetime: import pandas as pd s1 = pd.DataFrame({"year"...
MinaMirz's user avatar
-1 votes
2 answers
199 views

Swiss Scheduling System for Ping-Pong league

I am working on scheduling 8 showcases for the year. I have dataframes for each showcase that have rank and conference from where each player is from. Each showcase has a different number of players ...
WeAreChatGPT's user avatar
1 vote
0 answers
67 views

Efficient Django query for model with foreign key to itself

I'm building a simple blog post website using Django. Basically there are two models: Post and Comment, where the comment model looks like this: class Comment: post = models.ForeignKey(Post, ...
The 2nd's user avatar
  • 123
0 votes
0 answers
244 views

Parallelizing Spark's Pandas API Operations

Spark's Pandas API allows for Pandas functions to be performed on top of a Spark dataframe that looks and behaves like a Pandas Dataframe. Pandas has functions that Spark does not have implementations ...
Brian Anderson's user avatar
0 votes
2 answers
177 views

What is a Plain Text Translation of this SQL Query?

My SQL knowledge is really basic, and need some help in translating this rather long query (running in a python script querying some AWS athena database, with some f-strings embedded) to plain English ...
Della's user avatar
  • 1,668
2 votes
2 answers
171 views

Using index better than sequential scan when every hundredth row is needed, but only with explicit list of values

I have a table (under RDS Postgres v. 15.4 instance db.m7g.large): CREATE TABLE MyTable ( content_id integer, part integer, vector "char"[] ); There is a B-Tree index on ...
AlwaysLearning's user avatar
1 vote
2 answers
121 views

How to address N+1 problem in django with prefetch related?

With the following code I am getting N numbers of queries based on the loop. How to avoid that ? I tried using prefetch_related but that didn't worked or am i doing the wrong way ? models class ...
D_P's user avatar
  • 862
0 votes
0 answers
1k views

Optimizing INSERT and Data Retrieval Performance in DuckDB with Large Datasets

I am working on a project involving large datasets, and I am utilizing DuckDB to manage my data. I am encountering performance issues when executing INSERT operations compared to SELECT operations, ...
Fabio G.'s user avatar
1 vote
1 answer
74 views

Setting multiple variables from one pandas dataframe.loc request

I'm writing a web app for the end of year laptop return to make our lives easier. I have a .csv file containing device serial numbers and the associated user's first and last name, user code, ...etc. ...
Scobbo's user avatar
  • 63
0 votes
1 answer
57 views

Improve MongoDB query in Python

I have a MongoDB database including hiking trails that I visualize in a Jupyter Notebook. Each trail has a start and destination and includes a timestamp when it was updated. So there are multiple ...
Robbert's user avatar
  • 139
0 votes
1 answer
357 views

Django prefetch related used inside a serializer method making too many queries. How to use prefetch related to reduce queries in following case?

By investigating why some apis are taking too much time to respond I encountered that it's because of the db queries increases as the queryset increases. In Django select related and prefetch related ...
Aswany Mahendran's user avatar

15 30 50 per page
1
2 3 4 5
10