Skip to main content

Questions tagged [etl]

Extract, Transform, Load - process in a database

-1 votes
2 answers
195 views

There is a kinda ETL task of importing data from csv to the database in project with legacy codebase and legacy database**. Data should be validated before persisting to database. Validation includes ...
Rui's user avatar
  • 1,935
7 votes
5 answers
455 views

We have this architecture: queue -> message processor (horizontal scaling) -> RDBMS Sometimes external systems dump 10k messages onto the queue and the message processor of course dutifully ...
jcollum's user avatar
  • 229
-1 votes
1 answer
229 views

I have a wide CSV file of about 350mb, and want to load it into a SQL database and properly model the data to make it easier to use for analysis. I could split the data into tables with python and ...
HappilyCoding's user avatar
0 votes
3 answers
278 views

I'm a software developer and new to data engineering, so this may be a newbie question, but I'm wondering why data integrity checks (for instance, dbt tests) are ran on the data warehouse, rather than ...
samdouble's user avatar
  • 253
1 vote
1 answer
235 views

I need to build a data pipeline to populate a database from various files. This is a common scenario. However, I want to have expert opinions for implementing a pipeline that is robust, modular and ...
Imtiaz's user avatar
  • 23
0 votes
1 answer
98 views

I have an import that needs to grab data from a REST service and import into an web store. It's basically an ETL type of service, but because the REST service can be slow and I don't want to call it ...
user204588's user avatar
1 vote
1 answer
531 views

In our project we are using Django and Django Rest Framework as main application to get/query the data from database and send it to the frontend. Those endpoints are very fast as they should be. ...
Alex T's user avatar
  • 161
2 votes
2 answers
619 views

I have a Django application of 2 GB running and I need to receive a CSV file of more than 1 GB, read it and load the data to a PostgreSQL DB in IBM Cloud. The problem is that if I receive the file, it ...
Elvin Quero's user avatar
0 votes
1 answer
90 views

I have a situation where let's say I have a folder called logs which has N folders. Each folder contains events for a specific event type and each folder has N .log files where each file has multiple ...
Sriram R's user avatar
-1 votes
1 answer
696 views

What are the pros and cons of using agile/iterative approach in ETL/ELT (Extract Transform Load or Extract Load Transform) data warehouses/data lakes/lakehouses systems development? I often find that ...
Eugene Lycenok's user avatar
-2 votes
2 answers
495 views

I have thousands of .csv files with the same structure and, in most of the cases, some column values are the same ones recurring. Each file represents a report on some structures, with numeric ...
BoardsOfConsulting's user avatar
4 votes
2 answers
127 views

I'm trying to think of a scalable solution for my current system. The current system is 3 microscopes 1 processing machine 1. 60-100GB Files come from 2-3 microscopes every 30 minutes 2. That data ...
user3145912's user avatar
-1 votes
1 answer
37 views

I'm developing an ETL process in Python and Pandas to pull data from a rest API, and then dump it into a relational database. A few of the fields that come back contain sensitive that I do not want to ...
einkleindatagal's user avatar
1 vote
2 answers
407 views

We receive product data from vendors on a regular basis to be incorporated into our catalog. The data looks like this: [ { id: 123, collection: Spring, name: New Beginnings, size: 8, price:...
user2468842's user avatar
0 votes
1 answer
53 views

I am looking for any general guidelines to allocate table space quota to different layers/schemas in ETL flow of a data warehouse (% of total space in each layer). As per my research, ETL flow can ...
Curious_Mind's user avatar

15 30 50 per page