Skip to main content

Questions tagged [big-data]

2 votes
1 answer
310 views

I have a web application that contains products and users. There are 10,000+ products and 100,000+ users to give a sense of the scale that's required. For some application specific reasons, I need to ...
kitkat's user avatar
  • 29
0 votes
1 answer
124 views

I want to collect a lot of files (file data + metadata) from local servers to a central server. Files are important, need to ensure that no files are lost Local servers: implement a collector to ...
kietheros's user avatar
  • 117
3 votes
1 answer
1k views

We have an application producing 5k-10k datapoints per second. Each datapoint has more than one metric, alongside its time of creation. We are looking for an efficient, scalable way to store this huge ...
Paul Benn's user avatar
  • 147
5 votes
1 answer
1k views

I have around 125 million event records on s3. The s3 bucket structure is: year/month/day/hour/*. Inside each hour directory, we have files for every minute. A typical filename looks like this: ...
Namah's user avatar
  • 61
1 vote
0 answers
485 views

Note: All of this would be in AWS Hi everyone, What would you guys suggest for building something that: Takes in several different input file types (ex: csv, json, jsonl, xml, .gz, ...) That can be ...
user avatar
0 votes
0 answers
70 views

I want to create a aggregation job that executes a big db query and flush it into BigQuery. My question is should I include only the id of the entities (campaign id, advertiser id, user id) or should ...
Avi L's user avatar
  • 109
0 votes
1 answer
106 views

I have mainly three groups of CSV files (each file is divided into several small files): First group of CSV files have 600+ GB in total (MAYBE 200+ GB if in int, cause CSV calculates by char right?), ...
heisthere's user avatar
  • 101
2 votes
2 answers
3k views

I have an existing production Oracle Database. However, there are performance issues for certain kind of operations, because of the volume of the data, or the complexity of queries. That's why I ...
Klun's user avatar
  • 31
1 vote
1 answer
956 views

I have a general question about loading data into a data warehouse (DW). This is basically a followup to an older question of mine. I have a general understanding problem about fill the [Date] ...
Steffen Mangold's user avatar
3 votes
2 answers
183 views

I have a general question about design pattern for an enterprise application. I read a lot about it but its actually hard to find an answer because most you find it rater about how to design a data ...
Steffen Mangold's user avatar
3 votes
2 answers
2k views

I have a eCommerce like system which produces 5000 user events (of different kind like product search/product view/profile view) per second Now for reporting business users would like to view the ...
M Sach's user avatar
  • 267
1 vote
3 answers
318 views

I have 30-ish million html documents in a file system. There is no emergency, the files are in a reasonable directory tree, it's not breaking the file system. But I'd like to be able to organize and ...
Martin K's user avatar
  • 2,947
0 votes
0 answers
94 views

I have to expose some sensitive data containing a PII column that has a 25 digit number. Rest of the columns aren't PII data. This is done such that the data can be safely shared to the larger ...
stormfield's user avatar
2 votes
0 answers
37 views

I have a reporting system which gets time-series data from numerous meters (here I am referring it as raw_data) I need to generate several reports based on different combinations of the incoming ...
Remis Haroon - رامز's user avatar
2 votes
2 answers
1k views

How do you design a website that allows users to query a large amount of user data, more specifically: there are ~100 million users with ~100TB of data, data is stored in HDFS (not a database) number ...
Minh Thai's user avatar
  • 141

15 30 50 per page
1
2 3 4 5