- My computer has 64 cores
- Microsoft SQL Server Data Tools 16.0.62007.23150 installed
- I do have 500 Mb/s SSD for the moment
One initial question: Which SQL version would be best for 64 cores?
I am new to SQL databases and have understood that it is important how you structure the database so it will go faster later to search and extract the information needed (queries).
All data will be used only for CPU calculations and will not be displayed visually in any dataGridView, report etc. The data will be used for artificial intelligence/random forest.
I believe I have also understood that using data types that take up less memory is good for speed later on also, like using a smallint instead of an int if it will work with a smallint etc.
I like to ask if my structure that I am thinking of is well designed in order to extract information later or if I should do this a bit differently. The database will add stock symbol data and as I notice this database will be extremely big which is the purpose of this question.
This is the whole structure that I have in mind (Example comes after explanation):
- I will use 4 columns. (DateTime|Symbol|FeatureNr|Value)
- DateTime has format down to the minute:
201012051545 - Symbol and FeatureNr has
smallint. For example: MSFT = 1, IBM = 2, AAPL = 3. So as you see. Instead of using strings in the columns, I have putsmallintthat represent those symbols/featureNr. This so search Queries goes faster later. - The database will for example have 50 symbols where each symbol has 5000 features.
- The database will have 15 years of data.
Now I have a few big questions:
If we just filling this database with data for 1 symbol. It will be this many rows in the database:
1440 minutes(1 day) * 365 days * 15 years * 5000 features = 39,420,000,000
Question 1:
39,420,000,000 rows in a database seems like a lot or is this no problem?
Question 2:
The above was just for 1 symbol. Now I had 50 symbols which would mean:
39,420,000,000 * 50 = 1,971,000,000,000 rows.
I don't know what to say about this. Is this to many rows or is it okay? Should I have 1 database per symbol for example and not all 50 symbols in one database?
Question 3:
Not looking at how many rows it is in the database. Do you think the database is well structured for fast search queries. What I ALWAYS will search for every time is this (This will later return 5000 lines(features). Notice that I search for one symbol ONLY and a specific datetime.
I will always do this exact search, and vever any other type of search, if you have any idea how I should best structure the database with those 50 stock symbols.
I will need all 5000 rows/features, where each row is a feature that needs to be fed to the random forest algorithm. This means that each symbol and update 201012051546 have 5000 features/values.
As in Question 2. Should I have one table per symbol. Will this result in faster searches for example?
(symbol = 2, smalldatetime = 201012051546) where I want to return the featureNr and value which would be the below lines: (I will ALWAYS ONLY do this exact search)
201012051546 | 2 | 1 | 76.123456789
201012051546 | 2 | 2 | 76.123456789
201012051546 | 2 | 3 | 76.123456789
Question 4:
Wouldn't it be the most optimal to have 1 table for each symbol and datetime?
In other words: 1 table for symbol = 2 and smalldatetime 1546 which holds 5000 rows of features and then do this for each symbol and datetime?
This will result in 7,884,000 tables per symbol.
Or is this not good in any other way? Notice here that I will need to in a loop later retrieve all features(5000 per table) from all those tables(7,884,000 tables) which is very important that it goes as fast as possible. I know it might be difficult to know but how long time approx: could a process like this with my structure take with a 64 core computer?
1440 minutes(1 day) * 365 days * 15 years = 7,884,000 tables per symbol
My idea for the database/table structure:
smalldatetime | symbol (smallint) | featureNr (smallint) | value (float(53)) 201012051545 | 1 | 1 | 65.123456789 201012051546 | 1 | 1 | 66.123456789 201012051547 | 1 | 1 | 67.123456789 201012051545 | 1 | 2 | 65.123456789 201012051546 | 1 | 2 | 66.123456789 201012051547 | 1 | 2 | 67.123456789 201012051545 | 1 | 3 | 65.123456789 201012051546 | 1 | 3 | 66.123456789 201012051547 | 1 | 3 | 67.123456789 201012051545 | 2 | 1 | 75.123456789 201012051546 | 2 | 1 | 76.123456789 201012051547 | 2 | 1 | 77.123456789 201012051545 | 2 | 2 | 75.123456789 201012051546 | 2 | 2 | 76.123456789 201012051547 | 2 | 2 | 77.123456789 201012051545 | 2 | 3 | 75.123456789 201012051546 | 2 | 3 | 76.123456789 201012051547 | 2 | 3 | 77.123456789 201012051545 | 3 | 1 | 85.123456789 201012051546 | 3 | 1 | 86.123456789 201012051547 | 3 | 1 | 87.123456789 201012051545 | 3 | 2 | 85.123456789 201012051546 | 3 | 2 | 86.123456789 201012051547 | 3 | 2 | 87.123456789 201012051545 | 3 | 3 | 85.123456789 201012051546 | 3 | 3 | 86.123456789 201012051547 | 3 | 3 | 87.123456789