3
\$\begingroup\$

Intro

I've watched the channel BranchEducation's video about Computer Memories, and read the first few sections of What Should Every Programmer Know About Memory to understand the memory internals.

I cannot go further and continue researching without understanding one thing. My question below is regarding DDR5 DRAMs specifically, since the video also discusses DDR5.

What I know currently:

  • Memory cells are grouped as memory bank, which is massive 2D array of cells.
  • We can read from or write to a memory bank 8 bits at time, and a DDR5 memory bank requires a 31-bit address to read or write a 8-bit data.
  • In x86-64 architecture, an address has a size of 64-bit.

Question

Suppose that we want to read a 64-bit integer from the memory.

Is this integer data spread across multiple banks? or Is it stored in a single bank?

\$\endgroup\$
5
  • \$\begingroup\$ Are you familiar with protected mode and memory mapping fundamentals? This seems more a question about these intermediate layers, than what's tagged (SDRAM, DDR) specifically. || This question might also be a better fit on Stack or CS. (That's not to say it's off-topic here, but mind that most embedded users here are not using such deeply managed and secured systems as desktop and server CPUs in their work; and if they are, probably as a ready-made platform, not at a deep hardware-design level.) \$\endgroup\$ Commented Dec 23, 2024 at 15:26
  • \$\begingroup\$ I am kinda familiar with them. However, I don't think it is related to indermediate layers. How is it related? \$\endgroup\$ Commented Dec 23, 2024 at 15:29
  • 1
    \$\begingroup\$ @jtxkopt-StandWithPalestine it's intimately related: memory addressing modes are how it's specified how the memory management unit (and related logic) convert addresses that your software sees to addresses that the memory controller sees. That is the whole answer to your 2. question: it's defined in the memory model for the different addressing modes for x86_64, and it's in fact explained in the (by now slightly dated) "What every programmer should…" document, section 4. \$\endgroup\$ Commented Dec 23, 2024 at 15:34
  • 1
    \$\begingroup\$ So, because having two orthogonal questions in a single question post makes it too broad, and because you already have the material you need read, I'll just remove that second question, so that we don't have to close this question as too broad (because too many different questions) nor have to downvote it (because having the right document, but not having read it about the very topic you ask about is insufficient own research) \$\endgroup\$ Commented Dec 23, 2024 at 15:35
  • \$\begingroup\$ Got it, but I already said that I'm in the first sections (not first chapters, sorry), and the beginning of my research - How about my first question? I cannot find any answer for that. \$\endgroup\$ Commented Dec 23, 2024 at 15:39

2 Answers 2

2
\$\begingroup\$

Is this integer data spread across multiple banks? or Is it stored in a single bank?

Switching banks takes time, so in general, compilers will take care to make objects fit in a single bank. In fact, 64 bit integers usually have to be 8-byte-aligned on PC architectures (esp. x86), and the MMU only translates page-wise, which is a multiple of 8 Byte, so that yes, all bits of the integer will fall into the same bank.

However, DDR5 specifies dual-(sub)channel DIMMs; that means one DIMM carries, in the end, two independent memory "entities" that have a 32 bit wide data bus. But the two are identical, so that a natural organization is that the 64 bit integer gets stored half and half – but in the same bank group/bank. (In modern RAM, there's so many banks that you group them, and have groups of them.)

To make matters more complicated, there's multi-channel memory controllers, which, when connected to the right number of memory modules, can execute two requests simultaneously – thus increasing the memory bus width as seen from the CPU side of things to 128 bit. (More channels are also a thing, but you'll find that in server boards.)

\$\endgroup\$
3
  • \$\begingroup\$ Just as an example: my employer makes a software product that requires very high memory and I/O bandwidth to work as intended, and the servers we certify all have 12 memory channels. They also make an FPGA-based product where we chose to not use (G)DDRx at all, but instead use HBM, which is 1024 bit wide. \$\endgroup\$ Commented Dec 24, 2024 at 0:18
  • \$\begingroup\$ "have to be 8-byte-aligned on PC architectures (esp. x86)" Is that actually true? I'm under the impression that x86 is the exception that allows unaligned access, while most other architectures do not. \$\endgroup\$ Commented Dec 24, 2024 at 14:50
  • \$\begingroup\$ @jpa I might be mixing things with C alignment restrictions on common x86_64 compilers/ABIs; gcc.godbolt.org/z/Gzrf9zv8h \$\endgroup\$ Commented Dec 24, 2024 at 15:18
4
\$\begingroup\$
  1. The 64-bit integer is read from a fetch buffer - think of it as L0 D-cache, not RAM.
  2. If the item is not in the fetch buffer, then it’s read from L1 D-cache. If not there, L2 and L3 caches will be checked.
  3. If none of the caches contains the cache line with the integer in it, then one or more cache lines will be fetched from RAM in parallel to fill the caches.

Access to RAM is not really done by the computing part of the CPU. It is done by the cache system, independently from what the CPU functional units are doing.

RAM has not been accessed in 8-bit widths in PCs for decades now. A DDR5 module has a 64-bit-wide data path. But the data into the cache is not fetched 64-bits at a time. It is fetched 64 bytes at a time. So, with just one module, there will be 8 successive 64-bit reads to fill one cache line.

The CPU sees memory in units of cache lines. It doesn’t see individual bytes until the data gets into fetch buffers. The “word size” of the memory hierarchy is 64 bytes or 512 bits.

You will want to read the rest of the WSEPKAM to get a better big picture. All this is covered there IIRC.

That document has to be read multiple times. At first don’t stress over details. Just make sure you read the whole thing. Then you can read it again, focusing more on details.

One “big idea” is that accessing one byte of memory costs the same as accessing (naturally aligned) 64 bytes. Reading randomly distributed single bytes consumes 8x as much memory bandwidth since for every byte a whole cache line is fetched most of the time.

Another big idea is that the memory is hierarchical, and each higher layer is roughly an order of magnitude slower than the layer below it.

Finally, modern CPUs have extensive performance measurement systems built-in. When you are running your code in a profiler, you can see exactly how many cache misses there were at various levels of the cache, how many mispredicted branches there were, how many speculated results were discarded (a waste of energy!), how much cache latency there was, what the instruction throughput was, etc. A modern Intel CPU has way more transistors dedicated to performance monitoring than there were transistors in an entire 80386.

\$\endgroup\$
5
  • \$\begingroup\$ I really appreciate the good stuff and advices - But it doesn't actually answer my question Is this integer data spread across multiple banks? or Is it stored in a single bank?. I don't know, maybe, I am unnecessarily focusing on the wrong question. \$\endgroup\$ Commented Dec 23, 2024 at 16:09
  • 2
    \$\begingroup\$ @jtxkopt-StandWithPalestine The assertion in the question 'We can read from or write to a memory bank 8 bits at time' is an oversimplification to the point of being misleading/wrong in this context, and thus the question is effectively nonsensical. The CPU reads an entire 512 bit cache line as a single operation from the memory on a single bank. Thus reading a 64 bit integer is a single read from one cache line, not reading 8 bytes individually. \$\endgroup\$ Commented Dec 23, 2024 at 18:58
  • \$\begingroup\$ I reported what the video says. Also, I consider the case we don't have a cache hit, and thus eventually reading it from the main memory. Considering DDR5, the video says that 8 bits are read from and/or written to a memory bank. \$\endgroup\$ Commented Dec 23, 2024 at 19:37
  • 2
    \$\begingroup\$ You might be latching onto something that has been simplified for the purposes of discussion. In the case of a cache miss, the required data from DDR5 is read into a cache line first. DDR memory is not setup to read/write one byte at a time - it is too inefficient. \$\endgroup\$ Commented Dec 24, 2024 at 11:38
  • \$\begingroup\$ the video says The video says nonsense. A DDR5 stick has a 64-bit-wide interface. The CPU will do 8 consecutive 64-bit reads to fill the 512-bit cache line. Then when the CPU needs to access a 64-bit integer, it will access it from the cache. The CPU doesn't see RAM. The only "memory" it sees is the cache. The cache system then manages getting the data into and out of the cache. I think it's best you forget anything from random videos. Read WSEPKAM cover to cover several times and you'll have most answers I hope. Then you can come back and ask more questions :) Not before though. \$\endgroup\$ Commented Dec 24, 2024 at 22:38

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.