I have heard that smlnj uses continuation passing style and that is one reason it has been dropped from the proof assistant Isabelle.
Makarius, the author of the answer above, also gave a detailed explanation of why CPS has bad performance properties.
At first approximation, there is a difference in "locality" of memory access, when a program just runs forward on the heap in CPS style, instead of the traditional growing and shrinking of the stack.
However, it's not clear to me why continuation-passing-style implementations need to use the heap in the sense of using the abstraction provided by malloc and free and not some other kind of allocator.
My understanding is that the malloc/free abstraction is maximally flexible in the sense that no constraints at all are imposed on the lifetime of different fragments of memory, as opposed to the stack which is strictly FIFO.
I'm wondering whether all that flexibility is really necessary though, or if more efficient implementations of CPS are possible.
As a quick straw man, if we took an ordinary language without continuations like Python or C, and translated it to CPS (in an extension of the language with continuations), then we could implement that subset of the expanded language using a stack.
In other words, a CPS-y language or intermediate language that doesn't use its newfound powers in exotic ways should theoretically be able to be treated like a non-CPS language.
This makes me wonder whether there are efficient strategies for implementation CPS-y languages.