Timeline for Bubble sort slower with -O3 than -O2 with GCC
Current License: CC BY-SA 4.0
Post Revisions
62 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jun 24, 2022 at 18:49 | comment | added | ddyer | If you're benchmarking a bubble sort, you're using the wrong algorithm. | |
| S Jan 22, 2022 at 3:01 | history | bounty ended | CommunityBot | ||
| S Jan 22, 2022 at 3:01 | history | notice removed | user17242583 | ||
| Jan 21, 2022 at 2:53 | comment | added | Peter Cordes | (update on that last comment: it is sometimes relevant for performance questions to include actual disassembly, so we can look for code alignment issues wrt. 32-byte boundaries, especially on Skylake-family CPUs where the JCC erratum mitigation can create performance pot-holes unexpectedly. You won't see that from Godbolt, and its binary output won't necessarily be linked with identical CRT code so addresses may differ.) | |
| S Jan 21, 2022 at 2:41 | history | bounty started | CommunityBot | ||
| S Jan 21, 2022 at 2:41 | history | notice added | user17242583 | Reward existing answer | |
| Nov 6, 2021 at 21:53 | audit | First questions | |||
| Nov 6, 2021 at 21:53 | |||||
| Nov 6, 2021 at 21:53 | audit | First questions | |||
| Nov 6, 2021 at 21:53 | |||||
| Nov 5, 2021 at 15:37 | audit | First questions | |||
| Nov 5, 2021 at 15:37 | |||||
| Nov 5, 2021 at 15:07 | audit | First questions | |||
| Nov 5, 2021 at 15:07 | |||||
| Nov 4, 2021 at 8:48 | audit | First questions | |||
| Nov 4, 2021 at 8:48 | |||||
| Nov 4, 2021 at 6:50 | audit | First questions | |||
| Nov 4, 2021 at 6:51 | |||||
| Nov 3, 2021 at 17:16 | audit | First questions | |||
| Nov 3, 2021 at 17:17 | |||||
| Nov 3, 2021 at 11:22 | audit | First questions | |||
| Nov 3, 2021 at 13:37 | |||||
| Nov 2, 2021 at 7:26 | audit | First questions | |||
| Nov 2, 2021 at 8:09 | |||||
| Oct 29, 2021 at 12:50 | audit | First questions | |||
| Oct 29, 2021 at 12:52 | |||||
| Oct 26, 2021 at 22:25 | audit | First questions | |||
| Oct 26, 2021 at 22:45 | |||||
| Oct 26, 2021 at 7:45 | audit | First questions | |||
| Oct 26, 2021 at 8:27 | |||||
| Oct 26, 2021 at 4:57 | audit | First questions | |||
| Oct 26, 2021 at 5:18 | |||||
| Oct 24, 2021 at 21:36 | audit | First questions | |||
| Oct 24, 2021 at 21:37 | |||||
| Oct 24, 2021 at 0:29 | audit | First questions | |||
| Oct 24, 2021 at 0:29 | |||||
| Oct 23, 2021 at 13:19 | audit | First questions | |||
| Oct 23, 2021 at 13:32 | |||||
| Oct 23, 2021 at 6:00 | audit | First questions | |||
| Oct 23, 2021 at 6:03 | |||||
| Oct 20, 2021 at 12:50 | audit | First questions | |||
| Oct 20, 2021 at 12:51 | |||||
| Oct 19, 2021 at 14:22 | audit | First questions | |||
| Oct 19, 2021 at 14:22 | |||||
| Oct 19, 2021 at 9:23 | audit | First questions | |||
| Oct 19, 2021 at 9:51 | |||||
| Oct 18, 2021 at 2:43 | audit | First questions | |||
| Oct 18, 2021 at 2:43 | |||||
| Oct 17, 2021 at 17:44 | audit | First questions | |||
| Oct 17, 2021 at 18:00 | |||||
| Oct 17, 2021 at 15:59 | audit | First questions | |||
| Oct 17, 2021 at 15:59 | |||||
| Oct 17, 2021 at 14:48 | history | edited | Peter Mortensen | CC BY-SA 4.0 |
Active reading [<https://en.wikipedia.org/wiki/GNU_Compiler_Collection>]. Removed the shell prompts to avoid confusion. Added some context. Expanded.
|
| Oct 17, 2021 at 9:13 | audit | First questions | |||
| Oct 17, 2021 at 9:13 | |||||
| Oct 16, 2021 at 12:53 | audit | First questions | |||
| Oct 16, 2021 at 13:35 | |||||
| Oct 16, 2021 at 10:40 | audit | First questions | |||
| Oct 16, 2021 at 11:52 | |||||
| Oct 14, 2021 at 20:32 | audit | First questions | |||
| Oct 14, 2021 at 21:24 | |||||
| Oct 14, 2021 at 16:11 | audit | First questions | |||
| Oct 14, 2021 at 16:28 | |||||
| Oct 14, 2021 at 8:54 | audit | First questions | |||
| Oct 14, 2021 at 8:54 | |||||
| Oct 14, 2021 at 2:16 | audit | First questions | |||
| Oct 14, 2021 at 3:57 | |||||
| Oct 13, 2021 at 11:26 | audit | First questions | |||
| Oct 13, 2021 at 11:26 | |||||
| Oct 12, 2021 at 11:58 | history | edited | anon | CC BY-SA 4.0 |
fix godbolt link
|
| Oct 12, 2021 at 1:54 | history | edited | Peter Cordes |
it's not [swap] in general that's relevant, it's swapping adjacent items. Would like to tag [bubble-sort] and [cpu-architecture], but tags are very limited. [auto-vectorization] would be nice, too, but let's go with [cpu-architecture] in case that helps future readers find info about SF stalls.
|
|
| Oct 11, 2021 at 20:41 | comment | added | Peter Cordes | @user253751: disagree; as long as the querent picked the same GCC version on Godbolt as they have locally so the instructions are the same, Godbolt's nice filtering of directives is better. And linking the source+asm on Godbolt makes it better for anyone who wants to see what other GCC versions / options do. | |
| Oct 11, 2021 at 12:27 | history | edited | Wai Ha Lee | CC BY-SA 4.0 |
Embiggened formatting
|
| Oct 11, 2021 at 10:57 | audit | First questions | |||
| Oct 11, 2021 at 11:21 | |||||
| Oct 11, 2021 at 9:49 | comment | added | Stack Exchange Broke The Law | You should include the assembly code that your actual compiler outputs, not from godbolt.org. | |
| Oct 11, 2021 at 6:21 | audit | First questions | |||
| Oct 11, 2021 at 6:36 | |||||
| Oct 11, 2021 at 4:08 | audit | First questions | |||
| Oct 11, 2021 at 4:08 | |||||
| Oct 10, 2021 at 18:18 | audit | First questions | |||
| Oct 10, 2021 at 18:34 | |||||
| Oct 10, 2021 at 2:31 | audit | First questions | |||
| Oct 10, 2021 at 3:16 | |||||
| Oct 9, 2021 at 22:23 | comment | added | Peter Cordes |
@DavidConrad: -Os would make GCC choose not to auto-vectorize, so it would be about the same as -O2 I'd expect, not shooting itself in the foot with store-forwarding stalls and increased latency before it can detect branch mispredicts.
|
|
| Oct 9, 2021 at 22:16 | history | edited | chqrlie | CC BY-SA 4.0 |
added 2 characters in body
|
| Oct 9, 2021 at 21:01 | audit | First questions | |||
| Oct 9, 2021 at 21:25 | |||||
| Oct 9, 2021 at 18:47 | audit | First questions | |||
| Oct 9, 2021 at 19:00 | |||||
| Oct 9, 2021 at 18:07 | comment | added | David Conrad |
At least on older versions of gcc, -Os (optimize for space) sometimes produced the fastest code because of the size of the instruction cache on x86-64. I don't know if that would matter here or if it's still applicable in current versions of gcc but it might be interesting to try it and compare.
|
|
| Oct 9, 2021 at 15:01 | vote | accept | anon | ||
| Oct 9, 2021 at 10:36 | history | became hot network question | |||
| Oct 9, 2021 at 6:54 | history | edited | Peter Cordes |
This is specific to the swap of adjacent elements. We don't have room in tags for [bubble-sort] and [sort] as well, or [cpu-architecture] [amd-processor] etc. :/
|
|
| Oct 9, 2021 at 3:18 | history | edited | anon | CC BY-SA 4.0 |
added 30 characters in body
|
| Oct 9, 2021 at 3:09 | answer | added | Peter Cordes | timeline score: 176 | |
| Oct 9, 2021 at 2:56 | comment | added | Peter Cordes |
@Abel: gcc -Ofast is just a shortcut for -O3 -ffast-math, but there's no FP math here. If you're going to try anything, try -O3 -march=native to let it use AVX2 in case GCC's vectorization strategy could help with wider vectors instead of hurt, whatever it's trying to do. Although I don't think so; it's just doing a 64-bit load and shuffle, not even 128-bit with SSE2.
|
|
| Oct 9, 2021 at 2:49 | history | edited | phuclv |
edited tags
|
|
| S Oct 9, 2021 at 2:35 | review | First questions | |||
| Oct 9, 2021 at 3:19 | |||||
| S Oct 9, 2021 at 2:35 | history | asked | anon | CC BY-SA 4.0 |