10
$\begingroup$

Consider the following example in Rust:

const A: usize = 23;

match 42 {
    A => println!("Matched the constant"),
    x => println!("Matched something else: {x}"),
}

The two patterns A and x mean quite different things. The first pattern matches only the value of the constant A, while second pattern matches any (other) value and binds the matched value to a new variable named x.

Let's call these constant patterns vs. binding patterns. How do various languages distinguish between constant and binding patterns, whether syntactically or otherwise? How do different solutions to this language design problem affect readability, and the ability of a static checker to detect typos or other mistakes?

$\endgroup$
1
  • $\begingroup$ A somewhat similar question would be about binding vs assignment. In Rust I can write (x, y) = (1, 2) which assigns, or let (x, y) = (1, 2); which binds... but I am not even sure there's a syntax allowing a mix of binding & assigning. $\endgroup$ Commented Aug 20 at 17:27

6 Answers 6

11
$\begingroup$

Casing

In Haskell, constant patterns are those where the name begins with an uppercase letter, and binding patterns are those beginning with a lowercase letter.

Human readers can likewise tell the two kinds of pattern apart by the casing of the identifier. If the programmer makes a typo on the name of a constant (or constructor, etc.) then it can be detected at compile-time; it is unlikely that a typo would change the casing of the first letter, but even then the resulting variable would be unused, so there can be a warning.

Scope

In Rust, there is no syntactic difference ─ both kinds of pattern use a bare identifier name. They are distinguished based on whether a constant (or enum member, etc.) of the given name exists in scope.

It is convention that constants (and enum members, etc.) use names beginning with uppercase letters, and other names begin with lowercase letters. This convention is enforced by the linter, and allows readers to distinguish the two kinds of pattern. However, the casing of the identifier does not affect the actual meaning of the pattern.

If the programmer makes a typo in the name of a constant pattern, this results in a binding pattern, which might compile without errors and (incorrectly) match any value. In many situations there will be other patterns to match other cases, which will be unreachable due to the binding pattern matching everything, so the compiler can report an error. In some situations, only the linter will catch the typo. Either way, the diagnostic will be somewhat misleading.

Besides typos, there is also a risk that a wildcard import could silently bring a new name into scope, unintentionally changing a binding pattern into a constant pattern. Some libraries (e.g. Diesel) do declare structs or other constants with names beginning with lowercase letters, while suppressing the linter warning. This will generally result in a type error when such a name is used in a pattern.

Sigils

If a sigil such as $ is used for variable names, then an identifier like A without the sigil is always a constant pattern, and an identifier like $x is a binding pattern.

Human readers can tell apart constant and binding patterns by the presence of the sigil. It is a common enough mistake to accidentally omit the sigil, but it is unlikely that a typo would accidentally introduce the sigil. So the syntax at least protects against the more dangerous mistake of accidentally matching everything (and binding to a variable) when the programmer intended only to match one specific value. The other kind of mistake can be detected at compile-time since a misspelled variable name, without the sigil, will usually not be the name of an existing constant in scope.

$\endgroup$
3
  • $\begingroup$ For historical reasons, uppercase constructors in Haskell aren’t quite the same as constants. It’s nice to define constants as pattern synonyms like pattern RightwardsArrow = '\x2192' so they can also be used as values, but if you want a pattern that compares with an existing constant like rightwardsArrow = '\x2192', the best you can do is a view pattern: (== rightwardsArrow) -> True. Until pattern synonyms can be parameterised, we also can’t factor this out into a synonym like *pattern Eq :: (Eq a) => a -> a; pattern Eq x <- ((== x) -> True) where { Eq x = x }. $\endgroup$ Commented Aug 20 at 15:59
  • 1
    $\begingroup$ You can also use sigils the other way around: x refers to a variable, $x refers to its value. For which there is much more extensive precedent. $\endgroup$ Commented Aug 20 at 17:05
  • $\begingroup$ A binding pattern that doesn't use the binding is also a good indicator of an error, but you'd then have to enforce the use of a specific identifier like _ if you really want to throw away the value. $\endgroup$ Commented Aug 22 at 14:16
6
$\begingroup$

Python's match Statements and Pattern Syntax

Python recently added a match statement and pattern syntax. (Well, it was added in Python 3.10 in October 2021, but that's recent in Python's history!) It has two types of patterns that roughly line up with what you've called binding and constant patterns:

  • A bare name like x is a capture pattern, which always binds x as a variable.
  • A dotted name like x.y is a value pattern, which evaluates x.y as an attribute access and compares the value without binding anything.

(These two can of course be composed with other types of patterns into larger structures.) The latter is maybe a little odd, but several much earlier language design choices shaped the design of Python's pattern syntax.

Python Design

Python is a dynamically-typed interpreted language. Python doesn't distinguish between variables and constants—neither statically nor at runtime. They're both name bindings, and a constant is simply a name that everyone agrees not to reassign, enforced through convention and documentation. This includes things that would almost certainly be constant (if even reified) in less dynamic languages, such as classes, functions, and enumeration members.

Python also doesn't have any special syntax to declare variables. A variable is created when the name is first assigned in the scope, which can happen at any point during the program's execution. A variable's existence is not deterministic, since a name assigned inside a conditional body won't create a variable unless it runs. Even imported names aren't bound until the import statement is evaluated.

Capture Patterns

So, given that it's not possible to distinguish variables and constants at a language level, Python could not reasonably choose to have patterns like x and A mean different things—not without changing a bunch of other well-established language features related to name bindings, or promoting conventions like SHOUTY_CASE to actual syntax.

The designers of Python's match statement decided that capture patterns should look like names. This makes sense, since other language features that bind names (like for loops and except blocks) don't use any special sigils or declarations. Python also already had an existing, very limited pattern-like syntax for assigning multiple names in one statement, which also used bare names for bindings. (But those aren't actually patterns, and there are other subtle but important differences!)

The designers shared their thoughts in PEP 635 (Structural Pattern Matching: Motivation and Rationale):

There were calls to explicitly mark capture patterns and thus identify them as binding targets. According to that idea, a capture pattern would be written as, e.g. ?x, $x or =x. The aim of such explicit capture markers is to let an unmarked name be a value pattern (see below). However, this is based on the misconception that pattern matching was an extension of switch statements, placing the emphasis on fast switching based on (ordinal) values. Such a switch statement has indeed been proposed for Python before (see PEP 275 and PEP 3103). Pattern matching, on the other hand, builds a generalized concept of iterable unpacking. Binding values extracted from a data structure is at the very core of the concept and hence the most common use case. Explicit markers for capture patterns would thus betray the objective of the proposed pattern matching syntax and simplify a secondary use case at the expense of additional syntactic clutter for core cases.

Value Patterns

For value patterns, then, the designers had to choose a different syntax. They chose to use the . of a dotted attribute access to distinguish them. This makes some sense again. It's common (though certainly not ubiquitous) in Python to import modules with a single name like math, and then refer to members of the module as attributes like math.pi. Also, Python enumerations are implemented as classes with their members as class attributes, so a member would be referred to as MyEnum.A.

The designers shared their thoughts here as well in PEP 635:

The observation that constants tend to be written in uppercase letters or collected in enumeration-like namespaces suggests possible rules to discern constants syntactically. However, the idea of using upper- vs. lowercase as a marker has been met with scepticism since there is no similar precedence in core Python (although it is common in other languages). We therefore only adopted the rule that any dotted name (i.e., attribute access) is to be interpreted as a value pattern, for example HttpStatus.OK above. This precludes, in particular, local variables and global variables defined in the current module from acting as constants.

A proposed rule to use a leading dot (e.g. .CONSTANT) for that purpose was criticised because it was felt that the dot would not be a visible-enough marker for that purpose. Partly inspired by forms found in other programming languages, a number of different markers/sigils were proposed (such as ^CONSTANT, $CONSTANT, ==CONSTANT, CONSTANT?, or the word enclosed in backticks), although there was no obvious or natural choice. The current proposal therefore leaves the discussion and possible introduction of such a ‘constant’ marker for a future PEP.

Distinguishing the semantics of names based on whether it is a global variable (i.e. the compiler would treat global variables as constants rather than capture patterns) leads to various issues. The addition or alteration of a global variable in the module could have unintended side effects on patterns. Moreover, pattern matching could not be used directly inside a module’s scope because all variables would be global, making capture patterns impossible.

As I mentioned above, the dotted syntax can cover many things that might be matched using a value pattern, but it's definitely not exhaustive. For one, it doesn't allow matching against constants declared in the same module without introducing a nested namespace, such as a class, just to let you use dotted access. (Python doesn't have any simple syntax to refer to “the current module” from within.)

And while importing modules is common, it's at least as common to import module members directly. Thus there is a common gotcha that

import math

match 42:
    case math.pi:  # Value pattern
        print("It's a match!")
    case _:
        print("Not a match.")

and

from math import pi

match 42:
    case pi:  # Capture pattern!
        print("It's a match!")
    case _:
        print("Not a match.")

do two different things.

Other Rejected Ideas

It may also be interesting to read PEP 622 (Structural Pattern Matching), which was an earlier attempt at the same feature. PEP 622 was superseded by PEP 634 a few months later, but PEP 622 includes an extensive list of rejected ideas that shed some additional light on the final design of the match statement. This includes a list of rejected alternative designs for value patterns.

$\endgroup$
3
  • 4
    $\begingroup$ "Python also already had an existing, very limited pattern-like syntax for assigning multiple names in one statement, which also used bare names for bindings." Note that those also support other assignments targets, prominently attributerefs ("dotted names"). obj.x, obj.y = 1, 2 is perfectly valid for assignment. $\endgroup$ Commented Aug 20 at 7:44
  • 1
    $\begingroup$ Another relevant source is the paper they published on this model "Dynamic Pattern Matching with Python", which also goes into some comparisons with other languages and how their goals differed. There's also the video of the accompanying presentation from a couple of years later (due to Covid). $\endgroup$ Commented Aug 20 at 21:37
  • $\begingroup$ @MisterMiyagi Fair point! I was focused on the fact that multiple assignment doesn't require extra syntax to bind variables (so no ^x or =x), but it's good to point out that they're really different in other ways. It's definitely multiple assignment rather than pattern assignment (like, e.g., Rust has). $\endgroup$ Commented Aug 21 at 0:27
4
$\begingroup$

The pattern-matching system in Grace1 allows only actual literals to be used unadorned as patterns:

1 Patterns as Objects in Grace. Michael Homer, James Noble, Kim B. Bruce, Andrew P. Black, David J. Pearce. In Dynamic Language Symposium (DLS), 2012. https://doi.org/10.1145/2480360.2384581

method fact(x) {
    match (x)
        case { 0 -> 1 }
        case { n -> n * fact(n - 1) }
}

Numbers and strings are "autozygotic" values and match themselves, and their literals can be written directly as patterns to do that. The n above is a binding case, and in fact exactly the syntax for a lambda block of one untyped parameter.

Other patterns can be written following a colon: {x : A -> ...} will bind x if the value of A matches the target, either because that value is a literal, or because A is some other kind of pattern. This also overlaps exactly the syntax for a lambda block of one typed parameter:

match (x)
    case { x : A -> 1 }
    case { y : String -> y.size }

This was a deliberate choice to allow the same mechanic to build up from simple cases to more complex user-defined patterns, and from different starting points (e.g. introduction through simple typecase or literals). Later work explored this point further. The binding and usage cases are syntactically distinct, so typos or semantic errors are no harder to detect than elsewhere.

At run time, the name of a type refers to a pattern that matches instances of that type, so the type namespace is a subset of the ordinary namespace and there is no additional ambiguity. Types are presumably statically known to an analysis tool, so it will be able to distinguish a type and another kind of pattern in this position if required (but all patterns identify the type of the value they will bind anyway).

There is also a second escape hatch: a pattern expression can be surrounded by parentheses to force it to be evaluated, so { (a) -> ...} will have the same effect as { _ : a -> ...}. There doesn't seem to be significant additional value to this, and I don't think I'd bother with it again.


There was an additional kind of binding pattern in the original design from that paper, and it did cause some issues that led to its being removed from later versions. That was mostly because the implementation overhead wasn't worthwhile when in practice the revealed preference of users (including me!) was to avoid using it in favour of alternatives. That preference may have come in part because of difficulty disambiguating uses and mentions as a human, but not so much because of issues of static analysis or typos.

As originally proposed, "function call" syntax in this position, like { A(B("x")) -> ... }, would also be interpreted as a pattern, and so would the "arguments" to allow complex nested destructuring patterns. This did create an ambiguity for the user, if not for the compiler. Although it looked like a method call, it wouldn't actually be one, but rather each "function name" named a pattern (i.e. a noun, not a verb), with the "arguments" also patterns to be combined later.

case { Operator("+", ASTNumber(0), y) -> "Just {y}" }

Here Operator names a pattern that matches an object and extracts three subvalues, each of which is then matched against the corresponding subpattern. The y subpattern is a binding instance, the "+" is a literal, and ASTNumber(0) is another pattern with its own subpattern. This is where there is an ambiguity: ASTNumber(0) is not calling ASTNumber with argument 0, but rather identifying two pattern objects ASTNumber and 0 which the matching system will combine (and _ : ASTNumber(0) would be the version that invoked the single-argument function). y, as an unadorned name, was again not the value of an existing y but a binding instance that would create a variable y in the match scope.

This binding syntax turned out not to get much use: because it's an object-oriented language, people preferred to match and then use the object's accessors to extract data from it afterwards. The semantics is also quite complex, and for example ASTNumber(x) and ASTNumber((x)) have different meanings here but not outside of the pattern context. Users did experience confusion about the exact semantics and either avoided it because of that, or weren't interested in it anyway.

The semantics of nested bindings were also complicated around pattern combinators: the pattern A(x) & B(y) would bind names x and y (if both A and B matched), but A(x) | B(y) would bind neither, because which side had matched was not known statically. The pattern system was also discussed in more depth in Chapter 4 of my thesis, and Section 4.7.7 discusses an alternative approach that unrolls all match-case blocks in a way that would address some of these binding cases, but was found too complex to type and to understand as a user.

Later implementations did not include the binding subpatterns at all, and would fully-evaluate the pattern expression as an ordinary method call. That is, programmers preferred to write, and later implementations only supported, writing the above example as:

case { nd : Operator("+", ASTNumber(0), Number) -> "Just {nd.rhs}" }

where Operator("+", ASTNumber(0), Any) is a single call to a method that combines its three arguments somehow to create a pattern; those arguments may or may not be patterns. There is no privileged kind of destructuring, and no way to bind a new name in the middle. The semantic and syntactic complexity of destructuring binding may have been part of why users tended not to use it, but preference for using the matched value "as" an object seemed to be the commonly articulated reason.

On the implementation side, given the very limited usage, supporting binding patterns was not worth the complexity it imposed on the back-end code. All of the use cases other than binding names were served equally well, or better, by evaluating the pattern expressions as ordinary expressions.


In a language that more ubiquitously used pattern-matching, those concerns might be less impactful, but in this one where there was an easy alternative to access properties of the matched value it appeared that, regardless of experience, programmers preferred to avoid the versions that used nested binding patterns. The "top-level" bindings are syntactically and semantically unambiguous because only actual literals and expressions in "type position" are treated as patterns, and these do not seem to cause any issues with readability or understanding.

$\endgroup$
1
  • $\begingroup$ I think that patterns like Foo::Bar(a, b, c) to bind the fields of a structure, are most useful in languages like Rust where Bar is an enum variant, such that merely binding the whole value to a variable like foo wouldn't allow you to access its fields (as its type would be the Foo enum, not just the Bar variant of it). This could be solved if individual enum variants were types themselves, so that one could bind the whole Foo::Bar structure and then, knowing statically that it's that variant, access its fields. $\endgroup$ Commented Aug 20 at 9:02
4
$\begingroup$

Syntactically, in pattern matching, F# (and others) makes the choice to have identifiers always be "binding patterns" that introduce a new variable, as opposed to evaluating them and match against their value (if they are the name of an existing variable).

let a = 8 // this line could be omitted
match 75 with
| a -> printfn "a: %d" a // a is a new variable.
| b -> failwith "unreachable (statically known)" // warning
// prints "a: 75"

When the other behavior is desired, a guard (when clause) can be used with a condition. The condition in the when is an expression, so it evaluates a normally.

let a = 33
match 33 with
| b when b = a -> printfn "b: %d" b // a is the use of the existing variable
| a -> failwith "unreachable, but that's not known statically"
// prints "b: 33"

Now, regarding the ability of a static checker to detect typos or other mistakes. The simplest mistake here would be the one of the first example: using | a -> when wanting to evaluate an existing a, when instead that would unfortunately match everything and introduce a new binding for a, which would compiles just fine. This mistake is however caught by the visible warning on the next pattern, b, that comes from it being statically known as unreachable. The entire line is also greyed out, at least in my IDE.

A more elaborate mistake would be to try to use the identifier in an expression, which would also be understood as a binding pattern.

let a = 33 // this line could be omitted
match Some 33 with
| Some a -> printfn ":-("
| None -> failwith "unreachable, but that's not known statically"
// prints ":-("

This time the compiler cannot help. Hopefully the developer will have encountered the first mistake first, and learned the way patterns work...


To detect this mistake, I think that there is a way F# could be modified. The alternate design would require that shadowing in variable declaration be made explicit. (More about that topic here)

let a = 8
match Some 75 with
| Some a -> failwith "unreachable because a evaluates to 8, not 75"
| Some shadowing a -> printfn "a: %d" a // a is a new variable here
| None -> failwith "unreachable"
// prints "a: 75"

Here shadowing is a keyword I just made up, that applies to the identifier after it to indicate that it's ok if it shadows. And without it, you'd get a compile error or warning, about the shadowing that's taking place. Mistake: detected.

That's my idea. Of course this won't happen in F#, it would be a large breaking change, contrary to the tradition of F# and its Ocaml ancestry. And it is after all sometimes useful to shadow an identifier by "refining" its meaning using a match, in a way that can often be similar to what is achieved with flow-sensitive type refinement in other languages. Making these cases syntactically heavier in my design might or might not be worth the benefits of the additional error detection and of the simpler way to use a as evaluated.

$\endgroup$
4
  • $\begingroup$ I think the question is more about how the compiler distinguishes between the b and None - aren't both of them identifiers? $\endgroup$ Commented Aug 20 at 12:09
  • $\begingroup$ @Bergi In F#, the cases of a discriminated union take precedence over variable binding, but that is not as error-prone as for the variables, because they usually start with an uppercase character (there's a warning for this). I didn't mention that, because casing is already part of kaya3's answer. $\endgroup$ Commented Aug 20 at 13:40
  • $\begingroup$ An explicit shadowing would help a bit but would still not prevent accidental binding on typo. That is, if I write Some s instead of Some a (my finger slipped on the keyboard), then this is a binding, and I get no warning about accidental shadowing. $\endgroup$ Commented Aug 20 at 17:25
  • $\begingroup$ @MatthieuM. Yes, but you would get a warning about s being unused $\endgroup$ Commented Aug 20 at 18:55
2
$\begingroup$

Implicit versus explicit declaration

The underlying problem is that the binding pattern syntax has the effect of potentially introducing a new variable identifier, with a syntax that does not have the appearance of an identifier declaration. The readability problem, and the detecting of typos and mistakes are also related.

The real problem is that the binding pattern declaration may be shadowed by the availability of an unrelated identifier. And if the problem is cause by the implicit declaration, a solution is to make ti explicit:

const A: usize = 23;
var x;   // explicit declared, yet untyped

match 42 {
    A => println!("Matched the constant"),
    x => println!("Matched something else: {x}"),
}

Now x cannot refer to identifiers declared elsewhere, and neither can be used outside of a context that statically defines/restricts its type.

If there is a keyword for variable declaration, a more localized solution could be:

const A: usize = 23;

match 42 {
    A     => println!("Matched the constant"),
    var x => println!("Matched something else: {x}"),
}

Wildcard pattern

For a Rust-like language, where there is a reserved identifier like _, with the semantics of wildcard pattern, this can be syntactly solved with:

const A: usize = 23;

match 42 {
    A => println!("Matched the constant"),
    _ => println!("Matched something else: {_}"),
}
$\endgroup$
2
$\begingroup$

C# chose to distinguish between constant and binding patterns by reusing its var keyword, which is normally used outside of a pattern matching.

const int a = 11; // a has to be const, to be used in a pattern matching
Console.WriteLine(33 switch
{
    a => "a",
    //var a => a, // error: that variable name is already used
    var b => $"b: {b}"
});
// prints "b: 33"

Note that it is impossible to make the mistake of shadowing an existing variable rather than using its value, because C# simply forbids shadowing.

This is a bit similar to the sigil approach you described in your answer, and remains remarkably consistent with the existing language (and IDE UX: the var can be used for type information tooltip on mouse over, and for go-to-definition or find-all-references, just like in any other variable declaration).

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.