We use RDBMS as another name for SQL DBMS but SQL isn't relational. That isn't news, see this web site and book. SQL allows for but doesn't require relational and 1NF or 3NF are optional. JSON is in the SQL:2106 spec. What would Codd think?
Using Oracle as a SQL DBMS example. First there was support for collection data types, then there was XML and eventually JSON arrived. These let you violate 1NF. I won't argue whether these should be used. I only claim they can be used.
Have there been surveys to document how often the relational approach is used with a SQL DBMS? I assume it is better to think of a distribution of approaches (a value between 0 and 1 where 0 is SQL and 1 is relational) rather than a binary approach of relational vs SQL (not relational). I might call the SQL endpoint the pragmatic approach, but that introduces bias. While I have spent a long time working on SQL DBMS I am usually working under the hood and don't design applications.
Showing posts with label rant. Show all posts
Showing posts with label rant. Show all posts
Wednesday, March 4, 2020
Tuesday, February 11, 2020
Websites that report your browsing to Facebook
There are many things I like about the Facebook app but recording my browsing history via Off Facebook Activity isn't one of them. I know that Google and Microsoft also have the ability to collect my browsing history but I assume they do it directly via their browsers (Edge, Chrome) and I can easily disable that by using Firefox or logging out when using their browser. With Facebook I don't have that option. I can and do opt-out but most users won't.
Facebook doesn't sell your data, data brokers do that. That is worthy of attention but off topic here.
Facebook has collected call logs in the past. It still collects contacts, as do many other apps. This rant is only about the collection of browsing history. I think it is creepy. I hope one day that it becomes illegal. All of this data (call logs, address book, browsing history) is high-risk as in there are external parties (lawyers, insurance companies, government agencies) that would love to access that data.
I get why websites send data back to Facebook (FB). It helps them buy ads targeting FB users who visited their website. I get why FB does this. It helps them sell targeted ads. Neither of these benefit the user.
I don't want websites for hospitals and disease advocacy to report visits to FB. But some do. I hope the law changes to prevent this but GDPR and CCPA have not stopped it.
Who does this?
The short answer is many but not all web sites. I did an experiment and visited web sites that are relevant to me to see who reported my visits to FB. I was happy to learn that some web sites did not.
Some results are ironic.
Some results are ironic, but less than above. Progressive advocacy websites that report my visits to FB include actblue.com, emilyslist.org, citizensforethics.org, motherjones.com, hrw.org, amnesty.org, greenpeace.org, democrats.org, prochoiceamerica.org, Planned Parenthood Action Fund and thinkprogress.org. I am happy, but not surprised, to learn that eff.org doesn't report visits to FB.
Facebook doesn't sell your data, data brokers do that. That is worthy of attention but off topic here.
Facebook has collected call logs in the past. It still collects contacts, as do many other apps. This rant is only about the collection of browsing history. I think it is creepy. I hope one day that it becomes illegal. All of this data (call logs, address book, browsing history) is high-risk as in there are external parties (lawyers, insurance companies, government agencies) that would love to access that data.
I get why websites send data back to Facebook (FB). It helps them buy ads targeting FB users who visited their website. I get why FB does this. It helps them sell targeted ads. Neither of these benefit the user.
I don't want websites for hospitals and disease advocacy to report visits to FB. But some do. I hope the law changes to prevent this but GDPR and CCPA have not stopped it.
Who does this?
The short answer is many but not all web sites. I did an experiment and visited web sites that are relevant to me to see who reported my visits to FB. I was happy to learn that some web sites did not.
Some results are ironic.
- Progressive politicians who are frequent critics on FB privacy are happy to report web site visits to FB including elizabethwarren.com, berniesanders.com, amyklobuchar.com and Ocasio2018.com. Note that Ocasio2018.com redirects to gigalixrapp which is the site that reported my visit. Mayor Pete also reports to FB from peteforamerica.com but I don't recall whether he has been a privacy critic.
- The Guardian has been a source of many "breaking" stories on FB privacy including one on this topic. The Guardian also reports my visits to FB.
Some results are ironic, but less than above. Progressive advocacy websites that report my visits to FB include actblue.com, emilyslist.org, citizensforethics.org, motherjones.com, hrw.org, amnesty.org, greenpeace.org, democrats.org, prochoiceamerica.org, Planned Parenthood Action Fund and thinkprogress.org. I am happy, but not surprised, to learn that eff.org doesn't report visits to FB.
Some results made me sad. Web sites that provide health care services should not report my visits to FB. I prefer that disease advocacy web sites not report visits. The sites listed here are relevant to me. I have visited in the past to make donations and because I was a caregiver for someone with cancer. All of them reported my visits to FB:
- Planned Parenthood does advocacy and is a portal for health care services
- MD Anderson is one of the top cancer hospitals in the world.
- Others that reported visits include Seattle Cancer Care Alliance, The American Cancer Society, Leukemia and Lymphoma Society and the Canadian Cancer Society
I won't list them here but many addiction recovery businesses report visits to FB and so do advocacy web sites for other diseases.
Friday, January 31, 2020
Copyleft vs the DeWitt Clause
There is recent benchmarketing drama between AWS and Microsoft.
Section 1.8 of the AWS service terms includes:
I hope David DeWitt doesn't mind the attention that the DeWitt Clause receives. He has done remarkable database research that has generated so much -- great PhD topics, better DBMS products, a larger CS department at UW-Madison and many jobs. But he is also famous for the DeWitt Clause.
Section 1.8 of the AWS service terms includes:
(ii) agree that we may perform and disclose the results of Benchmarks of your products or services, irrespective of any restrictions on Benchmarks in the terms governing your products or services.Some software includes a DeWitt Clause to prevent users and competitors from publishing benchmark results. I am not a lawyer but wonder if section 1.8 of the AWS service terms allows Amazon to counter with their own benchmark results when their competitors software and services use a DeWitt Clause. This would be similar to the effect of copyleft.
I hope David DeWitt doesn't mind the attention that the DeWitt Clause receives. He has done remarkable database research that has generated so much -- great PhD topics, better DBMS products, a larger CS department at UW-Madison and many jobs. But he is also famous for the DeWitt Clause.
Sunday, January 12, 2020
480p is my friend - video streaming when you don't have fiber
I live in a rural area and doubt I will ever get fiber. I have fixed wireless broadband. Maybe low earth orbit satellites will be an option in the future whether that is via Starlink, Amazon or OneWeb.
I get 16Mbps which is shared by 4 people. One user plays online video games and cares about latency. Video streaming can be a challenge as it frequently consumes too much download bandwidth. Upload is usually not an issue except for FaceTime and other video chat apps.
I put video streaming apps into one of 3 classes -- polite, knows better and rude.
I get 16Mbps which is shared by 4 people. One user plays online video games and cares about latency. Video streaming can be a challenge as it frequently consumes too much download bandwidth. Upload is usually not an issue except for FaceTime and other video chat apps.
I put video streaming apps into one of 3 classes -- polite, knows better and rude.
- Polite apps have video quality settings that are sticky. Set the video quality once to 480p and things are good.
- Knows better apps know better. Set video quality to 480p today and it resets to auto the next time you use the app. Why? Because the apps knows that HD video makes you happy even if you don't have the network bandwidth to support that.
- Rude apps have no video quality settings. They use as much bandwidth as they can get.
To make up for the lack of throttling in many of the video streaming apps I have tried traffic shaping with a Gargoyle router. That is a lot of work, must get MAC addresses for all devices, and the results weren't great. I might have to try it again.
I am not a fan of Auto video quality. Maybe that works great when there are not multiple apps coming and going on the network. But I am skeptical about the ability for it to deal with congestion and I doubt it has any concern for the impact on latency sensitive apps (like online video games).
I am not a fan of Auto video quality. Maybe that works great when there are not multiple apps coming and going on the network. But I am skeptical about the ability for it to deal with congestion and I doubt it has any concern for the impact on latency sensitive apps (like online video games).
Polite apps are great but you still have to select lower quality for apps X users for web and then apps X devices for phone/tablet. Assume you need to revisit this once per year because apps change. Another thing to do is disable autoplay for as many apps as possible. I won't explain how to do that below.
The Apps
Buggy
- Facebook - Video Settings has a Video Default Quality option. AFAIK this setting has no effect for videos in Feed and Watch. Some might call that a bug. I set the option to SD Only but Watch videos start in 720p or 1080p and Feed videos use a variety, some are HD.
Polite
- YouTube on a browser and iOS
- YouTube TV on a browser
- Apple TV - in Settings -> Apps -> iTunes Movies and TV Shows see Limit Purchases and Rentals to SD
- iTunes - purchase movies and TV shows in SD
- Netflix on iOS - go to More -> App Settings and set Video Quality to Standard
- Netflix on web - go to Account -> Playback settings
- Hulu on iOS - go to Account -> Settings -> Cellular Data Usage and choose Data Saver
- Hulu on web - start a video, click on the gear -> Quality -> Data Saver
- Amazon Prime on iOS - select Settings -> Streaming & Downloading
- Amazon Prime on web - start a video, click the gear
- Zoom - defaults to non-HD on a Mac (hope others follow this trend)
- Twitter on web - select Settings -> Data Usage -> Data Saver. The setting doesn't mention whether it changes video quality. While there disable autoplay as well.
- Twitter on iOS - select Settings -> Data usage to disable autoplay and select lower quality videos and images
Knows Better
- YouTube TV on iOS
Rude
- FaceTime
- Chromecast - the only choice is Auto. If you are watching something in 480p on a browser and then start casting the device will try for higher quality.
Unknown
- Snapchat, Instagram, TikTok - I will figure this out when my kids get home
- Facebook Video Chat
Friday, October 25, 2019
Nitpicking papers in the LSM space
Nits:
- LSM compaction does merge, not merge sort. Compaction inputs are already sorted. Merge sort is O(NlogN) for N records. Naive merge is O(NlogK) for N records and K input streams and in some cases RocksDB can do better than naive merge -- see the reference to optimized binary heap. Compaction merges for RocksDB with leveled compaction have two forms -- L0 to L1 and Ln to Ln+1. For L0 to L1 there might be ~4 streams from the L0 and 1 stream from the L1 and size(L1) is usually >= size(L0). For Ln to Ln+1 there is one stream from each and size(Ln+1) is approximately 10 * size(Ln).
- LSMs do large random writes, not sequential writes. Writes are sequential from perspective of a single-file but a busy LSM does compaction and memtable flush concurrently. So there are concurrent reads and writes from perspective of device -- each stream of reads and writes is sequential but they are interleaved at the device level.
- The statement write amplification is approximately 10 * num-levels is some of: the worst case, a hypothesis, an unvalidated performance model. I am a frequent user of unvalidated performance models but they have their limits. There is an amazing paper that measures write-amp in practice and then provides a brilliant explanation. I wish there were more reporting of write-amp from production workloads. LevelDB overstates write-amp because it uses too many levels because the L0 and L1 are small (~10mb).
- It is great to read about new index structures that don't support range query in pursuit of less write-amp. If a workload doesn't need range query then maybe an LSM isn't the best choice. The other reason to use an LSM with leveled compaction is low space-amp, so I hope that some alternatives consider ways to keep space-amp low while also getting low write-amp.
- Consider CPU and IO overhead. That is there are IO and CPU components for read and write amplification. IO gets a lot of attention. CPU needs more attention.
- Benchmarking is hard. See here for an example of a horrible mistake I made which lead to many bogus reports of lousy MySQL performance regressions. I explained some of the problems for LevelDB and RocksDB benchmarks here and here. LevelDB doesn't target high performance, RocksDB is hard to configure, papers rarely provide enough details to support reproduction and even if they did nobody is volunteering their time to reproduce results. I assume that benchmark result accuracy is inversely related to the number of systems that have been tested for a given paper -- expertise is in short supply. My offer for advice still stands. I have read about possible programs that would volunteer grad students to reproduce results. While I think that would be a great education, I wonder if it would advance their academic careers.
Sunday, January 20, 2019
Bugs in Windows 10 parental controls
I use Windows 10 parental controls with my two children. Sometimes I am surprised at the bugs I encounter, but I can't rant too much because of glass houses and stones. My old favorite was that a hard reset before the time limit reached zero allowed my clever child to get more time. Apparently Microsoft takes storage efficiency very seriously and didn't want to waste a disk write and/or fsync on persisting the usage counter every few minutes. I haven't tried to reproduce this recently but never heard back after filing a bug report.
Now I have a new favorite bug. I am 5 hours behind their timezone and granted another hour to my daughter. It is 4pm here and 9pm there. The landing page after granting the time tells me my child can use the computer until 5pm (my timezone). Child tries to login and immediately encounters the timeout dialog. Apparently timezones are a hard problem. But less screen time is a good thing.
Now I have a new favorite bug. I am 5 hours behind their timezone and granted another hour to my daughter. It is 4pm here and 9pm there. The landing page after granting the time tells me my child can use the computer until 5pm (my timezone). Child tries to login and immediately encounters the timeout dialog. Apparently timezones are a hard problem. But less screen time is a good thing.
Monday, January 7, 2019
Define "better"
Welcome to my first rant of 2019, although I have written about this before. While I enjoy benchmarketing from a distance it is not much fun to be in the middle of it. The RocksDB project has been successful and thus becomes the base case for products and research claiming that something else is better. While I have no doubt that other things can be better I am wary about the definition of better.
There are at least 3 ways to define better when evaluating database performance. The first, faster is better, ignores efficiency, the last two do not. I'd rather not ignore efficiency. The marginal return of X more QPS eventually becomes zero while the benefit of using less hardware is usually greater than zero.
There are at least 3 ways to define better when evaluating database performance. The first, faster is better, ignores efficiency, the last two do not. I'd rather not ignore efficiency. The marginal return of X more QPS eventually becomes zero while the benefit of using less hardware is usually greater than zero.
- Optimize for throughput and ignore efficiency (faster is better)
- Get good enough performance and then optimize for efficiency
- Get good enough efficiency and then optimize for throughput
Call to action
I forgot to include this before publishing. Whether #1, #2 or #3 is followed I hope that more performance results include details on the HW consumed to create that performance. How much memory and disk space were used? What was the CPU utilization? How many bytes were read from and written to storage? How much random IO was used? I try to report both absolute and relative values where relative values are normalized by the transaction rate.
Wednesday, July 25, 2018
Default options in MyRocks
We need to make MyRocks easier to configure -- this isn't a new idea. If you are using MyRocks with default options in mid-2018 then you are probably not using bloom filters, compression or my favorite compaction policy.
You can fix all of that by setting rocksdb_default_cf_options. I wish this were the default.
To set rocksdb_default_cf_options but disable compression use:
Some information on my.cnf for MyRocks is here and much more info on RocksDB is on the wiki. If you want to tune, then start with the tuning guide.
Another potential problem with the default configuration is that rocksdb_max_background_jobs=2 so there are 2 threads for compaction, you usually want more
Finally, there were some changes to MyRocks to make it better at avoiding too many tombstones and it looks like that is disabled by default:
If you are only using MyRocks then you might want to set transaction-isolation=READ-COMMITTED because repeatable-read in MyRocks doesn't use gap locks (yet).
Finally, you probably want to set rocksdb_block_cache_size.
You can fix all of that by setting rocksdb_default_cf_options. I wish this were the default.
rocksdb_default_cf_options=The above will enable the default compression type for all levels of the LSM tree which is Snappy in a recent MyRocks build with FB MySQL. But one of the proper distros only provides zlib and doing that for the small levels in the LSM tree (L0, L1, L2) might slow down compaction too much.block_based_table_factory={ cache_index_and_filter_blocks= 1;filter_policy=bloomfilter: 10:false;whole_key_filtering= 1};level_compaction_dynamic_ level_bytes=true;optimize_ filters_for_hits=true; compaction_pri= kMinOverlappingRatio
To set rocksdb_default_cf_options but disable compression use:
rocksdb_default_cf_options=block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio;compression=kNoCompressionTo set rocksdb_default_cf_options and use fast compression for all levels use this after changing $fast to one of kLZ4Compression or kSnappyCompression:
rocksdb_default_cf_options=block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio;compression=$fastAnd to set rocksdb_default_cf_options with a fast compression configuration (no compression for smallest levels, fast compression for mid levels, slower compression for max level) try this after changing $fast and $slow to the appropriate values:
rocksdb_default_cf_options=block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio;compression_per_level=kNoCompression:kNoCompression:kNoCompression:kCompression:$fast:$fast;bottommost_compression=$slowYou can determine which compression libraries are supported by you MyRocks build by looking in $datadir/.rocksdb/LOG. This is from an FB MySQL build that I use:
Compression algorithms supported:
kZSTDNotFinalCompression supported: 1
kZSTD supported: 1
kXpressCompression supported: 0
kLZ4HCCompression supported: 1
kLZ4Compression supported: 1
kBZip2Compression supported: 1
kZlibCompression supported: 1
kSnappyCompression supported: 1
Some information on my.cnf for MyRocks is here and much more info on RocksDB is on the wiki. If you want to tune, then start with the tuning guide.
Another potential problem with the default configuration is that rocksdb_max_background_jobs=2 so there are 2 threads for compaction, you usually want more
Finally, there were some changes to MyRocks to make it better at avoiding too many tombstones and it looks like that is disabled by default:
rocksdb_compaction_sequential_deletes 0rocksdb_compaction_sequential_deletes_count_sd OFFrocksdb_compaction_sequential_deletes_file_size 0rocksdb_compaction_sequential_deletes_window 0Setting rocksdb_max_open_files to -1 can be great for performance but there are side-effects (more open file descriptors, more untracked memory consumption).
If you are only using MyRocks then you might want to set transaction-isolation=READ-COMMITTED because repeatable-read in MyRocks doesn't use gap locks (yet).
Finally, you probably want to set rocksdb_block_cache_size.
Friday, January 12, 2018
XFS, nobarrier and the 4.13 Linux kernel
tl;dr
My day
The Story
I have a NUC cluster at home for performance tests with 3 NUC5i3ryh and 3 NUC7i5bnh. I recently replaced the SSD devices in all of them because previous testing wore them out. I use Ubuntu 16.04 LTS and recently upgraded the kernel on some of them to get the fix for Meltdown.
The NUC7i5bnh server has a Samsung 960 EVO SSD that uses NVMe. I use the HWE kernel to make wireless work. The old kernel without the Meltdown fix is 4.8.0-36 and the kernel with the Meltdown fix is 4.13.0-26. Note that with the old kernel I used XFS with the nobarrier option. With the new kernel I assumed I was still getting nobarrier, but I was not. I have since switched from XFS to ext4.
The NUC5i3ryh server has a Samsung 850 EVO SSD that uses SATA. The old kernel without the Meltdown fix is 4.4.0-38 and the kernel with the Meltdown fix is 4.4.0-109. I continue to use XFS on these.
Results sysbench for NUC5i3ryh show not much regression from the Meltdown fix. Results for the NUC7i5bnh show a lot of regression for the write-heavy tests and not much for the read-heavy tests.
My day
- nobarrier isn't supported as a mount option for XFS in kernel 4.13.0-26 with Ubuntu 16.04. I assume this isn't limited to Ubuntu. Read this for more detail on the change.
- write throughput is much worse on my SSD without nobarrier
- there is no error on the command line when mounting a device that uses the nobarrier option
- there is an error message in dmesg output for this
There might be two workarounds:
- switch from XFS to ext4
- echo "write through" > /sys/block/$device/queue/write_cache
The Story
I have a NUC cluster at home for performance tests with 3 NUC5i3ryh and 3 NUC7i5bnh. I recently replaced the SSD devices in all of them because previous testing wore them out. I use Ubuntu 16.04 LTS and recently upgraded the kernel on some of them to get the fix for Meltdown.
The NUC7i5bnh server has a Samsung 960 EVO SSD that uses NVMe. I use the HWE kernel to make wireless work. The old kernel without the Meltdown fix is 4.8.0-36 and the kernel with the Meltdown fix is 4.13.0-26. Note that with the old kernel I used XFS with the nobarrier option. With the new kernel I assumed I was still getting nobarrier, but I was not. I have since switched from XFS to ext4.
The NUC5i3ryh server has a Samsung 850 EVO SSD that uses SATA. The old kernel without the Meltdown fix is 4.4.0-38 and the kernel with the Meltdown fix is 4.4.0-109. I continue to use XFS on these.
Results sysbench for NUC5i3ryh show not much regression from the Meltdown fix. Results for the NUC7i5bnh show a lot of regression for the write-heavy tests and not much for the read-heavy tests.
- I started to debug the odd 7i5bnh results and noticed that write IO throughput was much lower for servers with the Meltdown fix using 4.13.0-26.
- Then I used sysbench fileio to run IO tests without MySQL and noticed that read IO was fine, but write IO throughput was much worse with the 4.13.0-26 kernel.
- Then I consulted my local experts, Domas Mituzas and Jens Axboe.
- Then I noticed the error message in dmesg output
Tuesday, November 28, 2017
Marketing and the Dunning-Kruger effect
I am wary of user reports that claim product X was lousy for them, then they moved to product Y and everything was awesome. Sometimes this means that product X was lousy -- in general or for their use case. Other times it means the team using product X did a lousy job deploying it. It is hard for the reader to figure this out. It can also be hard for some authors to figure this out thanks to the Dunning-Kruger effect so lousy reports will continue to be published. These reports are not my favorite form of marketing and some of the bad ones linger for years. We deserve better especially in the open-source database market where remarkable progress is being made.
I have written before on benchmarketing. Other posts that mention it are here.
I have written before on benchmarketing. Other posts that mention it are here.
Monday, May 29, 2017
The history of low-concurrency performance regressions in MySQL 5.6, 5.7 and 5.8
Update - the regression isn't as bad as I have been reporting. Read this post to understand why.
What is a reasonable goal for performance regressions between major releases of MySQL (5.6, 5.7, 5.8)? Some regressions are expected because more features means longer code paths. But what if MySQL 5.7 and 5.8 only get 2/3 of the QPS versus MySQL 5.6 at low concurrency?
The low-concurrency performance regressions from MySQL 5.6 to 5.7 continue in MySQL 8. That isn't a surprise for two reasons. First, more features usually make a DBMS slower. Second, all of the performance marketing for new MySQL releases focuses on high-concurrency workloads. One example of the regression occurs for in-memory sysbench where MySQL 5.7 and MySQL 8 get 60% to 70% of the QPS compared to 5.6, which is the same as writing that 5.6 gets 1.43X to 1.67X more QPS than MySQL 5.7 and MySQL 8.
A deployment that I care about uses MySQL 5.6 and we want to use MySQL 8. Alas, we need to reduce the performance regressions for that to happen. I am excited that the community has another chance to make MySQL better because I don't think this will be fixed without us. I have been writing about this problem since 2013. I am also happy to note that the performance schema isn't the problem. I rediscovered this problem when using MySQL 8 on my home test servers and filed bug 86215. While I have published results showing the regression from earlier releases to MySQL 5.6 -- at this point I just want to get back to the low-concurrency QPS we get from MySQL 5.6. I have good memories from MySQL 4.0, 5.0 and 5.1, but I can do without the excitement of running those releases at web-scale.
I first wrote about this problem in 2013 when comparing MySQL 5.6 to previous releases and filed bugs 68825 and 69236. My favorite low-concurrency performance regression will always be bug 29921. I still remember debugging that over a weekend. Parsers are hard to profile when code is generated and everything is in one function. Fortunately I was able to use rdtsc.
Bugs for low-concurrency performance regressions
Bugs filed by me:
Content for low-concurrency performance regressions
Reports from me
The low-concurrency performance regressions from MySQL 5.6 to 5.7 continue in MySQL 8. That isn't a surprise for two reasons. First, more features usually make a DBMS slower. Second, all of the performance marketing for new MySQL releases focuses on high-concurrency workloads. One example of the regression occurs for in-memory sysbench where MySQL 5.7 and MySQL 8 get 60% to 70% of the QPS compared to 5.6, which is the same as writing that 5.6 gets 1.43X to 1.67X more QPS than MySQL 5.7 and MySQL 8.
A deployment that I care about uses MySQL 5.6 and we want to use MySQL 8. Alas, we need to reduce the performance regressions for that to happen. I am excited that the community has another chance to make MySQL better because I don't think this will be fixed without us. I have been writing about this problem since 2013. I am also happy to note that the performance schema isn't the problem. I rediscovered this problem when using MySQL 8 on my home test servers and filed bug 86215. While I have published results showing the regression from earlier releases to MySQL 5.6 -- at this point I just want to get back to the low-concurrency QPS we get from MySQL 5.6. I have good memories from MySQL 4.0, 5.0 and 5.1, but I can do without the excitement of running those releases at web-scale.
I first wrote about this problem in 2013 when comparing MySQL 5.6 to previous releases and filed bugs 68825 and 69236. My favorite low-concurrency performance regression will always be bug 29921. I still remember debugging that over a weekend. Parsers are hard to profile when code is generated and everything is in one function. Fortunately I was able to use rdtsc.
Bugs for low-concurrency performance regressions
Bugs filed by me:
- 68825 - April 2013 -- performance regressions for single-threaded workloads
- 69236 - May 2013 -- performance regressions for single-threaded workloads, part 2
- 74325 - October 2014 (fixed) -- updates to indexed column much slower in 5.7.5
- 74342 - October 2014 -- InnoDB disk reads at 1 thread much slower in 5.7.5
- 86215 - May 2017 -- MySQL is much slower in 5.7 than 5.6
Bugs not filed by me:
- 71130 - December 2013 -- 5.6 SQL thread is much slower than 5.1/5.5
- 78176 - August 2015 -- 5.6.20 is almost twice as slow as 5.0.96
Content for low-concurrency performance regressions
Reports from me
- March 2013 - MySQL 5.6: single-threaded, read-only
- April 2013 - MySQL 5.6: single-thread, update-only
- May 2013 - MySQL 5.6 versus 4.0 for a read-only workload
- May 2013 - MySQL 5.6: single-threaded performance regressions
- September 2013 - MySQL 5.7.2 single threaded performance needs improvement
- September 2013 - MySQL 4.1 forever
- October 2014 - Single thread performance in MySQL 5.7.5 versus older releases via sql-bench
- October 2014 - Low-concurrency performance for point lookups: MySQL 5.7.5 vs previous releases
- October 2014 - Low-concurrence performance regressions for range queries: MySQL 5.7 vs previous releases
- October 2014 - Low-concurrency performance for updates and the Heap engine: MySQL 5.7 vs previous releases
- October 2014 - Low-concurrency performance for updates with InnoDB: MySQL 5.7 vs previous releases
- October 2014 - Page read performance: MySQL 5.7 vs previous releases
- October 2014 - Sysbench cached updates: MySQL 5.7 vs previous releases
- October 2014 - Sysbench IO-bound updates: MySQL 5.7 vs previous releases
- October 2014 - Updates with secondary index maintenance: 5.7 vs previous releases
- August, 2015 - Single-threaded linkbench performance for MySQL 5.7, 5.6, WebScale and MyRocks
- February 2017 - Using modern sysbench to compare MyRocks and InnoDB on a small server
- February 2017 - Part 2 - sysbench, MyRocks, InnoDB and a small server
- May 2017 - Sysbench, in-memory & Intel NUC
- May 2017 - Sysbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
- June 2017 - Insert benchmark, in-memory & Intel NUC for MySQL 5.6, 5.7 and 8
- June 2017 - Insert benchmark, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
- June 2017 - Linkbench, in-memory & Intel NUC for MySQL 5.6, 5.7 & 8
- June 2017 - Linkbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
- June 2017 - Sysbench for MySQL 5.0, 5.1, 5.5, 5.6, 5.7 and 8
- June 2017 - Impact of perf schema on sysbench at low concurrency
- June 2017 - MyISAM, small servers and sysbench at low concurrency
- June 2017 - One more time with sysbench, a small server & MySQL 5.7, 5.7 and 8.0
- June 2017 - Linux perf and the CPU regression in MySQL 5.7
Reports from others:
- March 2013 - Why MySQL Performance at Low Concurrency is Important
- December 2013 - Single thread performance regression in 5.6 - Replication
Saturday, October 15, 2016
scons verbose command line
Hopefully I can find this blog post the next time I get stuck. How do you see command lines when building your favorite open source project? Try one of variants below. I am sure this list will grow over time. The scons variant is my least favorite. I use too many tools for source configuration and compiling. I am barely competent with most of them, but it is easy to find answers for popular tools. I get to use scons with MongoDB. It is less fun searching for answers to problems with less popular tools.
make V=1
make VERBOSE=1
scons --debug=presub
Pagerank seems to be busted for scons. Top results are for too-old versions of scons. Top-ranked results usually tell you how to solve the problem with Python, but users aren't writing scons input files, we are doing things via the command line. At least with MongoDB's use of scons, the separator for construction variables is a space, not a colon. So do LIBS="lz4 zstd" but not LIBS="lz4:zstd".
This is my second scons inspired post. Just noticed my previous one.
make V=1
make VERBOSE=1
scons --debug=presub
Pagerank seems to be busted for scons. Top results are for too-old versions of scons. Top-ranked results usually tell you how to solve the problem with Python, but users aren't writing scons input files, we are doing things via the command line. At least with MongoDB's use of scons, the separator for construction variables is a space, not a colon. So do LIBS="lz4 zstd" but not LIBS="lz4:zstd".
This is my second scons inspired post. Just noticed my previous one.
Thursday, September 15, 2016
Peak benchmarketing season for MySQL
Maybe this is my XKCD week. With Oracle Open World and Percona Live Amsterdam we are approaching peak benchmarketing season for MySQL. I still remember when MySQL 4.0 was limited to about 10k QPS on 4 and 8 core servers back around 2005, so the 1M QPS results we see today are a reminder of the great progress that has been made thanks to investments by upstream and the community.
But getting 1.5M QPS today compared to 1M QPS last year isn't at the top of the list for many (potential) users of MySQL. I use performance, usability, mangeability, availability and efficiency to explain what matters for web-scale DBMS users. My joke is that each of these makes a different group happy: performance -> marketing, usability -> developers, manageability -> operations, availability -> end users, efficiency -> management.
The benchmarketing results mostly focus on performance. Whether InnoDB does a bit more QPS than Amazon Aurora isn't going to make Aurora less popular. Aurora might have excellent performance but I assume people are deploying it for other reasons. I hope we make it easier to market usability, manageability, availability and efficiency in the MySQL community. MongoDB has gone a long way by marketing and then delivering usability and manageability.
Even when limited to performance we need to share more than peak QPS. Efficiency and quality-of-service (QoS) are equally important. QPS without regard to response time is frequently a bogus metric. I get more IOPs from a disk by using a too large queue depth. But more IOPs at the cost of 100 millisecond disk read response times is an expensive compromise. Even when great QPS is accompanied by a good average response time I want to know if there is lousy QoS from frequent stalls leading to lousy 99th percentile response times. Percona has built their business in part by being excellent at documenting and reducing stalls in InnoDB that occur on benchmarks and real workloads.
I have been guilty of sharing too many benchmark reports in the past that ignored efficiency and QoS. I have been trying to change that this year and hope that other providers of MySQL performance results do the same. This is an example of a result that includes performance, efficiency and QoS.
A lot of the RocksDB marketing message has been about performance. Database access is faster with an embedded database than client/server because you avoid network latency. The MyRocks message has been about efficiency. The target has been better compression and less write amplification than InnoDB so you can use less SSD and lower-endurance SSD. For a workload I care about we see 2X better compression and 1/10 the write rate to storage. This is a big deal.
When starting the project we had many discussions about the amount of performance loss (reduced QPS, higher response time) we could tolerate to get more efficiency. While we were vague the initial goal was to get similar QPS and response time to InnoDB for real workloads, but we were willing to accept some regressions. It turned out that there was no regression and similar performance with much better efficiency is a big deal.
But benchmarks aren't real workloads and there will soon be more benchmark results. Some of these will repeat what I have claimed, others will not. I don't expect to respond to every result that doesn't match my expectations. I will consult when possible.
One last disclaimer. If you care about read-mostly/in-memory workloads then InnoDB is probably an excellent choice. MyRocks can still be faster than InnoDB for in-memory workloads. That is more likely when the bottleneck for InnoDB is page write-back performance. So write-heavy/in-memory can still be a winner for MyRocks.
Seriously, this is the last disclaimer. While we are bickering about benchmark results others are focusing on usability and manageability and getting all of the new deployments.
In General
But getting 1.5M QPS today compared to 1M QPS last year isn't at the top of the list for many (potential) users of MySQL. I use performance, usability, mangeability, availability and efficiency to explain what matters for web-scale DBMS users. My joke is that each of these makes a different group happy: performance -> marketing, usability -> developers, manageability -> operations, availability -> end users, efficiency -> management.
The benchmarketing results mostly focus on performance. Whether InnoDB does a bit more QPS than Amazon Aurora isn't going to make Aurora less popular. Aurora might have excellent performance but I assume people are deploying it for other reasons. I hope we make it easier to market usability, manageability, availability and efficiency in the MySQL community. MongoDB has gone a long way by marketing and then delivering usability and manageability.
Even when limited to performance we need to share more than peak QPS. Efficiency and quality-of-service (QoS) are equally important. QPS without regard to response time is frequently a bogus metric. I get more IOPs from a disk by using a too large queue depth. But more IOPs at the cost of 100 millisecond disk read response times is an expensive compromise. Even when great QPS is accompanied by a good average response time I want to know if there is lousy QoS from frequent stalls leading to lousy 99th percentile response times. Percona has built their business in part by being excellent at documenting and reducing stalls in InnoDB that occur on benchmarks and real workloads.
I have been guilty of sharing too many benchmark reports in the past that ignored efficiency and QoS. I have been trying to change that this year and hope that other providers of MySQL performance results do the same. This is an example of a result that includes performance, efficiency and QoS.
MyRocks and RocksDB
A lot of the RocksDB marketing message has been about performance. Database access is faster with an embedded database than client/server because you avoid network latency. The MyRocks message has been about efficiency. The target has been better compression and less write amplification than InnoDB so you can use less SSD and lower-endurance SSD. For a workload I care about we see 2X better compression and 1/10 the write rate to storage. This is a big deal.
When starting the project we had many discussions about the amount of performance loss (reduced QPS, higher response time) we could tolerate to get more efficiency. While we were vague the initial goal was to get similar QPS and response time to InnoDB for real workloads, but we were willing to accept some regressions. It turned out that there was no regression and similar performance with much better efficiency is a big deal.
But benchmarks aren't real workloads and there will soon be more benchmark results. Some of these will repeat what I have claimed, others will not. I don't expect to respond to every result that doesn't match my expectations. I will consult when possible.
One last disclaimer. If you care about read-mostly/in-memory workloads then InnoDB is probably an excellent choice. MyRocks can still be faster than InnoDB for in-memory workloads. That is more likely when the bottleneck for InnoDB is page write-back performance. So write-heavy/in-memory can still be a winner for MyRocks.
Seriously, this is the last disclaimer. While we are bickering about benchmark results others are focusing on usability and manageability and getting all of the new deployments.
Monday, June 23, 2014
Benchmark(et)ing
Benchmarking and benchmarketing both have a purpose. Both also have a bad reputation. A frequently expressed opinion is that benchmark results are useless. I usually disagree. I don't mind benchmarketing and think it is a required part of product development but I am not fond of benchmarketing disguised as benchmarking.
Benchmarketing is a common activity for many DBMS products whether they are closed or open source. Most products need new users to maintain viability and marketing is part of the process. The goal for benchmarketing is to show that A is better than B. Either by accident or on purpose good benchmarketing results focus on the message A is better than B rather than A is better than B in this context. Note that the context can be critical and includes the hardware, workload, whether both systems were properly configured and some attempt to explain why one system was faster.
I spend a lot of time running benchmarks. They are useful when the context for the result is explained and the reader has sufficient expertise. Many benchmark results don't explain the context and not everyone has the time and ability to understand the results. Thus many benchmark results are misunderstood and perhaps benchmarks deserve a bad reputation. Another problem is that benchmarks usually focus on peak performance while peak efficiency is more important in the real world. Our services have a finite demand and we want to improve quality of service and reduce cost while meeting that demand. But that is hard to measure in a benchmark result.
One obvious result that a benchmark can provide is the peak rate of performance that won't be exceeded in the real world. This is useful when doing high level capacity planning or when debugging performance problems in production. Narrow benchmark tests, like read-only with point-lookups, help to provide simple performance models to which more complex workloads can be mapped.
Another result is that performance comparisons are less likely to be useful as the number of systems compared increases. It takes a lot of time and expertise to explain benchmark results and to confirm that best practices were used for all of the systems compared. In my experience results that compare more than two systems tend to be benchmarketing rather than benchmarking.
For a few years my tests were limited to MySQL with InnoDB, but recently I have run tests to compare different products including MySQL/InnoDB, WiredTiger, RocksDB, LevelDB, TokuDB, TokuMX and MongoDB. I am comfortable publishing results for new releases of MySQL & InnoDB and they can be compared to results I previously published. I am less willing to publish results that compare products from multiple vendors especially when small vendors are involved. I don't want to be a jerk to a small vendor and these take more time to evaluate.
I have some advice for people who run benchmarks even though I don't always follow of of it. Some of the advice takes a lot of time to follow.
Benchmarketing is a common activity for many DBMS products whether they are closed or open source. Most products need new users to maintain viability and marketing is part of the process. The goal for benchmarketing is to show that A is better than B. Either by accident or on purpose good benchmarketing results focus on the message A is better than B rather than A is better than B in this context. Note that the context can be critical and includes the hardware, workload, whether both systems were properly configured and some attempt to explain why one system was faster.
I spend a lot of time running benchmarks. They are useful when the context for the result is explained and the reader has sufficient expertise. Many benchmark results don't explain the context and not everyone has the time and ability to understand the results. Thus many benchmark results are misunderstood and perhaps benchmarks deserve a bad reputation. Another problem is that benchmarks usually focus on peak performance while peak efficiency is more important in the real world. Our services have a finite demand and we want to improve quality of service and reduce cost while meeting that demand. But that is hard to measure in a benchmark result.
One obvious result that a benchmark can provide is the peak rate of performance that won't be exceeded in the real world. This is useful when doing high level capacity planning or when debugging performance problems in production. Narrow benchmark tests, like read-only with point-lookups, help to provide simple performance models to which more complex workloads can be mapped.
Another result is that performance comparisons are less likely to be useful as the number of systems compared increases. It takes a lot of time and expertise to explain benchmark results and to confirm that best practices were used for all of the systems compared. In my experience results that compare more than two systems tend to be benchmarketing rather than benchmarking.
For a few years my tests were limited to MySQL with InnoDB, but recently I have run tests to compare different products including MySQL/InnoDB, WiredTiger, RocksDB, LevelDB, TokuDB, TokuMX and MongoDB. I am comfortable publishing results for new releases of MySQL & InnoDB and they can be compared to results I previously published. I am less willing to publish results that compare products from multiple vendors especially when small vendors are involved. I don't want to be a jerk to a small vendor and these take more time to evaluate.
I have some advice for people who run benchmarks even though I don't always follow of of it. Some of the advice takes a lot of time to follow.
- Explain the results. This is the most important suggestion and I try to ignore results that are not explained. Why was one system faster than another (better algorithm, less code bloat, simple perf bug, etc)? Simple monitoring tools like vmstat and iostat are a start but you will eventually need to look at code. Run PMP to understand where threads are busy or waiting. Run Linux perf to see what consumes CPU time.
- Explain whether you made any attempt to properly configure the systems tested. The quality of a benchmark result is inversely related to the number of systems tested because it is less likely that the people doing the test have expertise in all of the systems. Publish the configuration files.
- Explain whether you have expertise in using the benchmark client.
- Explain the context for the test. The result can be described as A is faster than B in this context so you need to explain that context.
- What was the workload?
- What hardware was used (CPU description, #sockets, cores per socket, clock rate, amount of RAM, type of storage)? What rate can the storage sustain independent of the DBMS for both IOPs and MB/second?
- What product versions were used? Comparing your beta versus their GA release that runs in production might be bogus. It can also be bogus to compare production systems with research systems that have no overhead from monitoring, error logging, optimization, parsing and other features required in the real world.
- How was the test deployed? Did clients share the same server as the DBMS? If not what was the network distance & throughput between them.
- Were the file structures and storage devices aged? A b-tree fragments from random updates over time. Write-optimized databases and flash get garbage to be collected. Doing a sequential load into a b-tree or LSM and then immediately running tests means the file structure isn't in a steady state.
- How full was the storage device? Flash and spinning disk perform very differently when full than when empty.
- Was buffered or direct IO used? If buffered IO was used are you sure...
- a good value was used for filesystem readahead?
- a good posix_fadvise calls were done to enable or disable readahead?
Subscribe to:
Posts (Atom)
CPU-bound sysbench on a large server: Postgres 12 to 19 beta1
This has results from sysbench on a small server with Postgres versions 12 through 19 beta1. Sysbench is run with high concurrency (40 conne...
-
I previously used math to explain the number of levels that minimizes write amplification for an LSM tree with leveled compaction. My answe...
-
This has results to measure the impact of calling fsync (or fdatasync) per-write for files opened with O_DIRECT. My goal is to document the ...
-
I need stable performance from the servers I use for benchmarks. I also need servers that don't run too hot because too-hot servers caus...