11
$\begingroup$

Many commonly used programming languages offer built-in mutable container types, or other mutable types that embed references to other objects, and make it trivial to build structures that contain a cycle. For example, in Python a list can easily be made to contain itself:

>>> x = []
>>> x.append(x)
>>> x
[[...]]

(Here, Python detects the cycle while representing the object, and uses ... to denote that; it's not the Ellipsis object.)

However, to my knowledge this sort of result cannot be created from Python code with tuples, which are immutable at the Python level.

In my language design, I want to have a strong representation of the concept of immutability in the type system, with most types having mutable and immutable versions, and the immutable versions potentially having different underlying memory layouts (e.g. to avoid the overhead of exponentially-resizing backing storage, add a precomputed hash for when they are used as keys in a mapping, etc.). It would be nice to be able to allow for immutable, cyclic object graphs.

Do any existing programming languages have a system for this? What might the syntax look like to describe such a structure as a literal expression? (I assume that implementation should be straightforward, since immutability is only a language-level construct and the underlying generated bytecode/intermediate code/machine code can just mutate things anyway.)

$\endgroup$
2
  • 1
    $\begingroup$ Note that Python at least can do that in a single statement x = x[0] = [...], even though it still relies on mutation behind the scenes. $\endgroup$ Commented May 20 at 11:45
  • 1
    $\begingroup$ @MisterMiyagi I've never been a fan of having multiple assignment operators on the same line, in any language. But the point is well taken. $\endgroup$ Commented May 21 at 16:10

7 Answers 7

12
$\begingroup$

Common Lisp has a sharpsign reader macro that can be used to create self-referential data structures at read time. #n= (where n is an integer) creates a label for a structure, and #n# expands to a reference to that structure.

(defconstant *x* '#1=(#1#))

will create a self-referential list.

When printing, you can enable *PRINT-CIRCLE* to have it produce this syntax.

$\endgroup$
5
  • 1
    $\begingroup$ This is definitely the closest thing so far to what I had in mind. The variety in answers makes me wonder about how I phrased the question, but they all still seem valuable. $\endgroup$ Commented May 20 at 19:29
  • 1
    $\begingroup$ Funny how every complicated feature can be answered by "oh, you can easily do that in ALGOL68/LISP" $\endgroup$ Commented May 23 at 5:03
  • 1
    $\begingroup$ @KarlKnechtel In some languajes (at least in C++) you can reference the variable you are initializing inside the initialization expression. E.g. A u = { 0, &u }; for some struct A { int a; A* next; };. $\endgroup$ Commented May 23 at 13:33
  • $\begingroup$ @PabloH this does, however, require an initialization statement. It would be that much better to have syntax that works for an expression. But I guess some languages have let-expressions that could be used as a model... $\endgroup$ Commented May 23 at 16:59
  • $\begingroup$ @PabloH The difference is that it requires a variable to hold the item being referenced. While my example happens to refer to the top-level object, I could just as easily do it with nested elements that aren't in a named variable. I can also use the notation in an anonymous expression such as a function argument. $\endgroup$ Commented May 23 at 18:02
8
$\begingroup$

Lazy evaluation is one way programming languages achieve infinitely self-contained immutable lists.

One way of thinking about lazy evaluation is that expressions are put in stasis until they absolutely need to be evaluated. The expression can be referred to as if it were fully evaluated, but in reality evaluation is deferred. This is in contrast to eager evaluation where the entire expression is fully evaluated when it is encountered.

Many languages, especially those focused on functional programming, contain lazy evaluation. Languages with lazy evaluation include Haskell, OCaml, and Scala.

Take for example Scala:

lazy val x: LazyList[Any] = x #:: LazyList.empty

(Note that in Scala, the LazyList class is immutable)

This creates a lazily evaluated LazyList which has itself as the head of the List, which of course is the entire list. Basically, it's saying "the LazyList is a list of itself, which is a list of itself, which is a list of itself, which is...".

The lazy before the val ensures that the evaluation of the list is deferred until later. Printing the list, depending on implementation, should print something akin to:

LazyList(<not computed>)

(Try it online at scastie)

Notably, the type annotation LazyList[Any] is required in this case. As the list is recursive, inference of the type of the list's contents isn't possible - the type can't be easily calculated automatically. This isn't always a necessity though...the compiler will tell you if it can't figure things out.

In summary, lazy evaluation is a common way of achieving the kind of self recursive list you desire while maintaining immutability. It is far from the only use case of lazy evaluation, but it gets the job done by allowing safe compile time self-referential expressions.

$\endgroup$
3
  • $\begingroup$ My design paradigm is closer to Python's, but this kind of lazy evaluation is definitely worth mentioning here. Syntactically, it seems the interesting part is that an x is available on the left-hand side that the definition can refer to. Is there also a way in Scala to have a "local name" while defining an anonymous value? Like, suppose I construct a mapping where the x from this example is a key; can I do that all in one go? How about more complex structures, where the self-reference is to something internal? $\endgroup$ Commented May 20 at 11:12
  • $\begingroup$ @KarlKnechtel it is possible, but you'd need to make sure that any data structures you use also lazily evaluate their arguments. For example, lazy val x: Map[Any, String] = Map(x -> "Hello") is perfectly valid Scala (it creates a map of x to "Hello"). But attempting to retrieve the value with x.get(x)) hangs in an infinite loop. In terms of design, you could ensure that the standard collections are lazy friendly if that's what you're after. $\endgroup$ Commented May 20 at 11:26
  • 4
    $\begingroup$ Perhaps it would be worth mentioning the idea of Tying the knot in this answer (not necessarily with this link), as it's a useful keyword to know for further reading. $\endgroup$ Commented May 22 at 13:36
6
$\begingroup$

Many languages support referring to identifiers before they are fully defined.

Example in C++:

struct recursive_type {
    recursive_type *x;
    int y;
    recursive_type(recursive_type *_x, int _y) : x(_x), y(_y) {}
} a(&a, 1);

// Or mutual recursive references:
extern recursive_type b;
recursive_type c(&b, 2), b(&c, 3);

PHP:

<?php
$a = [ &$a, 1 ];

$b = [ &$c, 2 ];
$c = [ &$b, 3 ];

var_dump result (for some reasons it has one extra level displayed on tio):

array(2) {
  [0]=>
  *RECURSION*
  [1]=>
  int(1)
}
array(2) {
  [0]=>
  &array(2) {
    [0]=>
    *RECURSION*
    [1]=>
    int(3)
  }
  [1]=>
  int(2)
}
array(2) {
  [0]=>
  &array(2) {
    [0]=>
    *RECURSION*
    [1]=>
    int(2)
  }
  [1]=>
  int(3)
}

Combinatorial Game Suite is an interesting example, that it is not referring to variables, but internal markers on the definition of games, a kind of mathematical object that this language is mostly concerned about.

It has the keyword pass to refer to the innermost level of the game surrounding the keyword, But you could also label it explicitly:

a{a|}               // equivalent to {pass|} = on
a{{a|}|}            // simplifies to {pass|} = on
a{0|{0|{{a|*}|*}}}  // a game that doesn't simplify, from the example in the
                    // help file, but rewritten to make its structure clearer

It works even if a was previously defined, and it would not use the previously defined value. (But note that v, vv, vvv, etc, are not valid variable or label names in CGSuite.)

$\endgroup$
2
  • 1
    $\begingroup$ The C++ example would be better with as much as possible marked const, I think. Does PHP even have a concept of immutability? Regardless, thanks for the syntax examples. (I haven't heard of CGSuite, but I have heard of the corresponding game theory - for example, as used in studying Go endgames or Nim heaps.) $\endgroup$ Commented May 20 at 16:57
  • $\begingroup$ @KarlKnechtel: const everywhere possible is pretty...everywhere. godbolt.org/z/5Yv6a9ars $\endgroup$ Commented May 23 at 15:18
5
$\begingroup$

Let's consider a realistic example: how could you implement a binary tree where nodes have properties left, right and parent in an immutable data structure?

One solution is to actually implement an immutable directed graph with labeled edges. An immutable directed graph with labeled edges could be implemented as an immutable map where keys are nodes and values are immutable maps from string to maybe-node.

If we want to build the binary tree

              X1
             /  \
           X2    X3

We first create objects X1, X2, X3, then create the immutable map:

b = {
  X1 -> {"parent"->null, "left"->X2, "right"->X3},
  X2 -> {"parent"->X1, "left"->null, "right"->null},
  X3 -> {"parent"->X1, "left"->null, "right"->null}
}

Plainly that immutable map can be built up from smaller immutable maps. (And one of those maps we can reuse, and gain some advantages of a persistent data structure.)

And now b[X2]["parent"] is X1. The downside here is that to say the moral equivalent of X1.left.parent we need to say b[b[X1]["left"]]["parent"] which is rather gross. One mitigation for that is to take a cue from languages such as JavaScript or Python and make .s a sugar for ["s"]. That reduces the expression to b[b[X1].left].parent. There are other sugars you might create to make that a little easier on the eyes.

$\endgroup$
2
  • 3
    $\begingroup$ This seems to be approaching the problem from the program developer's perspective rather than the language implementor's. But it's good to highlight that supporting the actual structure might not be strictly necessary, and that the language syntax can make it easier to use workarounds. $\endgroup$ Commented May 20 at 19:28
  • 5
    $\begingroup$ @KarlKnechtel Yes that is exactly what I'm attempting to express in this answer. As a language designer and a compiler developer and a user of the languages I design, it's helpful for me to think about how compiler implementation choices will affect the user experience. $\endgroup$ Commented May 20 at 20:21
3
$\begingroup$

I see two basic ways here

Allow (immutable) initialization of objects independent on order of definition

If one wants data structures with references or cycles one simply could spell out the cycles/references explicitly and assign the intermediate objects a name. This is the way all programmers are used to. The ordinary inline expression syntax only allows for tree-like data structures. I would not change that.

In go-like syntax this could look like (this is just a syntax example)

type A struct {
    Bref *B
}

type B struct {
    Aref *A
}

const Aobj = A{Bref: &Bobj}
const Bobj = B{Aref: &Aobj}

Note: There is no fundamental reason why the above could not work. Currently in go this fails, because go allows arbitrary code expressions in initialization and tries to figure out the initialization order. However for the plain system-defined constructor A{...} or B{...} the order of initialization does not matter and it would, in this special case, be ok to accept this program as legal go.

But you were only asking for the syntax how this could look like. And this is one example.

Also one could envision to define local constants this way, too. Key is to give up the ordering of the definitions.

Two stage initialization/construction

This is not directly related on how to literally specify cyclic data structures, but I think it is important nontheless and completemts the other above.

Clearly, one can construct non-tree datastructures with deeply immutable types when one allows the objects to be mutable in the construction phase.

Or in other words have a two stage construction.

First create the object with all fields initialized with default values and then secondly fill in the correct values (perhaps this is even user-code).

Just after these two phases the object would become immutable. Of course this has the complication that the constructor code has to take care that dependent objects may be not fully constructed yet.

In C++ it is already almost like this. Upon entry of the constructor the 'this' pointer points to an allocated object. In the body of the constructor some recursive algorithm could run, which then uses the 'this' pointer to create cyclic references even though the object is not fully constructed yet.

Key enabler here would be to allow the constructor (and only the constructor) to modify the fields even for immutable types.

Similar thing exists in Python with frozen dataclasses. Since frozen dataclasses are not really a first-class immutable type and just emulate immutability one can in the constructor indeed set the fields. But that just highlights the general problem on how to extend or subclass immutable types.

Remark

BTW your question is missing an important detail. You do not specify what kind of "immutability" you refer to.

Is it "shallow immutability" or "deep immutability" [1]?

Python: tuple itself can contain mutable objects and these objects can be mutated -- they just cannot be replaced in the tuple.

The shallow immutability already solves the covariance / contravariance problems of container types. But the deep immutability is needed for the thread safety. This needs to be taken into account in your design.

$\endgroup$
7
  • 2
    $\begingroup$ I had shallow immutability in mind; I didn't expect it to matter to the question. Of course, "deep immutability" is just applying shallow immutability at every step. I haven't really thought about concurrency at all; I know that's expected in language design nowadays, but it just isn't my field of expertise. It does occur to me that keys for mappings should probably be deeply and not just shallowly immutable. So, at construction, the immutable objects should check whether they have any mutable children, and give themselves a special invalid hashcode value if so. $\endgroup$ Commented May 21 at 16:08
  • $\begingroup$ Yes, this is the way Python does it. Python does have the (additional) concept of hashability which implies this deep immutability. This is stronger concept than the normal (shallow) immutability of types. See docs.python.org/3/glossary.html#term-hashable $\endgroup$ Commented May 22 at 7:17
  • 1
    $\begingroup$ Well, no; it's what certain built-in types do in the reference implementation of Python. User-defined types let you define __hash__ even though you can't ever protect them from mutation, and you can even add explicit APIs for mutation at the same time. Immutability isn't actually reified in the Python type system, even as a runtime-checkable property. Dataclasses at least try to encourage good practice and consistent design, though. $\endgroup$ Commented May 22 at 10:45
  • 1
    $\begingroup$ @AndreasH. "Python does have the (additional) concept of hashability which implies this deep immutability." That generalisation is not true. There are many mutable objects that are hashable. For example, generators are hashable (hash(x for x in range(3)) is well-defined) even though they are stateful and their entire point is mutation. The glossary you link to is actually very conservative in making the immutable <=> hashable link; it even mentions that "instances of user-defined classes are hashable by default", and notably these are all mutable by default as well. $\endgroup$ Commented May 22 at 18:17
  • $\begingroup$ @MisterMiyagi That's what I was trying to explain but I think you did a better job. $\endgroup$ Commented May 23 at 17:00
3
$\begingroup$

In OCaml, bindings shadow each other by default, but an initializer has access to the previous declarations with the same name:

let a = 40
let a = a + 2             (* a = 42 *)

However, you can use the rec keyword to allow a binding's initializer to refer to the binding's name itself.

let a = 40                (* This declaration is basically useless *)
let rec a = 1 :: 2 :: a   (* Here, "a" refers to its own declaration *)

The following code effectively defines a single binding "a" that contains a cyclic list. Note that OCaml is not lazily evaluated. This cyclic initialization is possible because most values in OCaml — including (linked) list nodes — are boxed (heap-allocated), so it's possible for the runtime to know the address of a value before it's actually initialized. The language specifically prohibits using a let rec declaration if the right-hand-side expression could potentially try to read the binding before the value is fully constructed.

$\endgroup$
2
$\begingroup$

Instead of focusing on the syntax of literal expressions for cyclic object graphs (and possibly restricting the facility only for this case), consider a more general approach.

You are already considering having most types with mutable and immutable versions, with different underlying memory layouts. Probably there will be some form of automatic conversion between these types. This alone allows for first constructing the mutable version, and "freezing" the object, say assigning a mut var into a non mut var.

This is nice and suffice for small objects, and overlaps with the case of literal expressions, that tends to be on the small size. But this fails for constructing big objects, objects with very complicated building processes (an async step, for example), and also annoys the programmer with the notion of building "many" dynamic objects only to throw them away after using them for constructing the immutable image.

Keyword here, "constructing". The modern notion of a constructor includes the "all or nothing" effect. Or the object is fully functional after the construction ends, with all the static invariants validated, or the object never comes to be. No intermediary state is visible.

But inside the constructor the object is allowed to be in a invalid state. Say, property a is assigned first, while construction of object for property b is not even started, and can still fail.

Enriching your language with an tmpmut version of immutable types, that have the same nominal fields but have all invariant restrictions lifted, allows for accumulating state for a immutable object, in immutable memory layout, that latter gets "converted" and validated into the typed immutable version.

Use example:

def foo( List mut ) { ... }
def bar( list imm ) { ... }
def fillwith( List tmpmut , source ) { ... }

var li mut = new List( string );   # This is mutable list
foo( l1 );                         # Mutate away

# We want to build a custom immutable list,
# but a partially constructed immutable object cannot
# be accepted where a proper immutable object is expected.

var l2 tmpmut = new List( 10 );  # Note the `tmpmut`
foo( l2 );                       # Fails

fillwith( l2 , source );  # Ok
var l3 List imm = l2;     # Calls tmpmut -> imm operation
bar( l3 );                # Ok

Definitions example:

# Dynamic lists are heap allocated reference objects,
# so its hashcode does not depend on contents.

def List type mut

    var HashCode u32;

    def constructor()
        this.HashCode = System.HashCode.GetOne();

# Immutable lists have a different memory layout,
# and obviously different methods. Immutable lists
# has an automatically generated hashcode that depends
# on lists contents *after* it is completed.

def List type imm

    var ... ;
    var HashCode u32 = lang.auto.HashCode32;

# Temporary immutable Lists have the same memory
# memory layout of immutable counterpart, and a
# different set of methods of mutable and immutable
# versions. In particular, the conversion operation
# from `tmpmut` into `imm`.

def List type tmpmut 

    def _ explicit ( source List tmpmut ) List imm
    {
        # Running dynamic checks

        if source.X then ...

        # Return an `tmpmut` where an `imm` is specified.
        # Here the compiler checks the static invariants of
        # `imm` type, before returning.

        return source;
    }
$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.