8

Here is a simplified example:

using System;
using System.Linq;

class Program
{
    static void Main()
    {
        var data = new[]
        {
            new { Name = "Alice", Age = 30 },
            new { Name = "Bob", Age = 25 },
            new { Name = "Charlie", Age = 25 }
        };

        var sorted = data
            .OrderBy(x => Tracer("First criterion", x.Age))
            .ThenBy(x => Tracer("Second criterion", x.Name));

        foreach (var item in sorted)
        {
            Console.WriteLine($"{item.Name} - {item.Age}");
        }
    }

    static T Tracer<T>(string label, T value)
    {
        Console.WriteLine($"Computing {label} key: {value}");
        return value;
    }
}

// Output displayed on the screen:
// Computing First criterion key: 30
// Computing First criterion key: 25
// Computing First criterion key: 25
// Computing Second criterion key: Alice
// Computing Second criterion key: Bob
// Computing Second criterion key: Charlie
// Bob - 25
// Charlie - 25
// Alice - 30

We can observe that LINQ evaluates the second criterion (ThenBy) for all elements, and not only for those that remain after the first OrderBy. In other words, the sort keys of all ThenBy clauses are computed for the entire collection.

Is there a way to work around this behavior?

New contributor
zaher al chami is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
7
  • 1
    "for those that remain after the first OrderBy" - what does that mean? What is "remain"ing? Are you expecting "Alice - 30, Bob - 25, Charlie - 25"? Then you might want to swap first and second criiterion. Otherwise, I might have misunderstood the question. As is - that's exactly the result I would have expected from that query. Commented yesterday
  • 1
    What is it that you're expecting the first OrderBy to do? That method only sorts the elements and doesn't do any filtering, therefore the collection will naturally be the same size after the OrderBy as before. So what is the desired behavior that you're trying to achieve? Commented yesterday
  • 4
    Is your question, I am sorting a collection with primary and secondary sort keys. Some elements in my collection have unique primary sort keys so secondary sorting is not necessary. Nevertheless LINQ is evaluating the secondary sort keys for them. This work seems unnecessary, how can I prevent it? Commented yesterday
  • 1
    Please, note, that in your code Tracer is called on map, not on ordering: having item we want to obtain criterium to sort by, say, get Age and this must be done for each item and only then ordering comes. dotnetfiddle.net/lxUUXm Commented yesterday
  • 1
    Is the secondary sort key quick to evaluate, or expensive? Commented yesterday

4 Answers 4

4

BLUF: Define an extension function .ThenByLazy() and supporting helper classes that make use of the System.Lazy<T> class. This will defer property access to only when needed, while caching the result to prevent repeated accesses.

(Note: The extension functions below owe partial credit to Ivan Petrov's earlier answer. I have combined his ideas with mine in this answer.)

I will assume that:

  1. You wish to minimize accesses to the Name property, because it is a relatively expensive property to evaluate in your real-world case.
  2. When that property is accessed during the sort, it should be accessed no more than once per item, and then only for items where Age is not unique.

You can accomplish this by using the Lazy class to defer accessing the property until its value is actually needed. The Lazy class will also cache the property result once it is accessed, preventing repeated property getter references.

Because the default comparator for the Lazy class compares the object references instead of the actual values, you will need to either define a custom comparator or wrap Lazy up in a subclass that defines its own value comparator. In both cases, a .ThenByLazy() extension function can be written to simplify the call.

For the first option, you can define the following custom comparator class:

public class LazyValueComparator<TKey>: Comparer<Lazy<TKey>>
    where TKey : IComparable
{
    public override int Compare(Lazy<TKey> x, Lazy<TKey> y)
    {
        return x.Value.CompareTo(y.Value);
    }
}

With a matching extension function:

public static class Extensions {
    public static IOrderedEnumerable<TSource> ThenByLazy<TSource, TKey>
        (this IOrderedEnumerable<TSource> source, Func<TSource, TKey> keySelector)
        where TKey : IComparable
    {
        return source.ThenBy(
            x => new Lazy<TKey>(() => keySelector(x)),
            new LazyValueComparator<TKey>());
    }
}

The second option is to define a custom Lazy subclass that has a .CompateTo() override:

public class LazyValueComparable<TKey>
    : Lazy<TKey>, IComparable<LazyValueComparable<TKey>> where TKey : IComparable
{
    public LazyValueComparable(Func<TKey> keySelector) : base(keySelector) {}

    public int CompareTo(LazyValueComparable<TKey> other)
    {
        return this.Value.CompareTo(other.Value);
    }
}

With a matching extension function:

public static class Extensions {
    public static IOrderedEnumerable<TSource> ThenByLazy<TSource, TKey>
        (this IOrderedEnumerable<TSource> source, Func<TSource, TKey> keySelector)
        where TKey : IComparable
    {
        return source.ThenBy(x => new LazyValueComparable<TKey>(() => keySelector(x)));
    }
}

In both cases, you can then update your code to:

var sorted = data
    .OrderBy(x => Tracer("First criterion", x.Age))
    .ThenByLazy(x => Tracer("Second criterion", x.Name));

If needed, a matching .ThenByLazyDescending() extension can also be easily added. Corresponding .OrderByLazy() and .OrderByLazyDescending() are also possible, but I envision no particular need for them, as there is little to be saved at the top level of the sort.

See this .NET Fiddle and this .NET Fiddle for demos of the first and second options (with added test data).

Both versions produce same output. Note that the Name property is never referenced more than once for a given item and is not referenced at all for items having unique Age values.

Computing First criterion key: 30
Computing First criterion key: 25
Computing First criterion key: 25
Computing First criterion key: 93
Computing First criterion key: 93
Computing First criterion key: 93
Computing First criterion key: 93
Computing First criterion key: 89
Computing First criterion key: 89
Computing First criterion key: 89
Computing First criterion key: 89
Computing First criterion key: 89
Computing First criterion key: 89
Computing First criterion key: 89
Computing First criterion key: 1
Computing First criterion key: 2
Computing First criterion key: 3
Computing First criterion key: 3
Computing Second criterion key: Happy
Computing Second criterion key: Doc
Computing Second criterion key: Sleepy
Computing Second criterion key: Bashful
Computing Second criterion key: Sneezy
Computing Second criterion key: Dopey
Computing Second criterion key: Curley
Computing Second criterion key: Moe
Computing Second criterion key: Larry
Computing Second criterion key: Shemp
Computing Second criterion key: Grumpy
Computing Second criterion key: Bob
Computing Second criterion key: Charlie
Computing Second criterion key: Three-A
Computing Second criterion key: Three-B
One - 1
Two - 2
Three-A - 3
Three-B - 3
Bob - 25
Charlie - 25
Alice - 30
Bashful - 89
Doc - 89
Dopey - 89
Grumpy - 89
Happy - 89
Sleepy - 89
Sneezy - 89
Curley - 93
Larry - 93
Moe - 93
Shemp - 93

After you've confirmed the desired behavior, you can remove your Tracer() function references, yielding the following final code:

var sorted = data
    .OrderBy(x => x.Age)
    .ThenByLazy(x => x.Name);

Note that using the Lazy class and other techniques above have their own costs, including the creation of closures for the reference functions () => x.Name for each item. There may be some better ways that explicitly pass item references and property getter functions to a wrapper classes, but I have pursued those options at this time.

Addendum:

As Theodor Zoulias mentioned in the comments, the Lazy class may have more overhead baggage that we would like. As a simpler and presumably more efficient alternative, the following code implements an simplified lazy class that accomplished what we want with less overhead.

public class SimpleLazyComparable<T> : IComparable where T : IComparable
{
    Func<T> _factory;
    bool isInitialized = false;
    T _value;

    public SimpleLazyComparable(Func<T> factory)
    {
        _factory = factory;
    }
    
    T Value => isInitialized ? _value : CreateValue();
    
    T CreateValue()
    {
        _value = _factory();
        isInitialized = true;
        return _value;
    }

    public int CompareTo(object? other)
    {
        // For its intended use, the types should always be the same
        var otherLazy = other as SimpleLazyComparable<T>;
        return this.Value.CompareTo(otherLazy.Value);
    }
}

The .ThenByLazy() extension function would then be:

public static class Extensions {
    public static IOrderedEnumerable<TSource> ThenByLazy<TSource, TKey>
        (this IOrderedEnumerable<TSource> source, Func<TSource, TKey> keySelector)
        where TKey : IComparable
    {
        return source.ThenBy(x => new SimpleLazyComparable<TKey>(() => keySelector(x)));
    }
}

See this fiddle for a demo. The results are the same.

Sign up to request clarification or add additional context in comments.

2 Comments

The ​Lazy<T> class has embedded thread-safety features, that are dead weight in a single-thread scenario like this one. You could consider instantiating the ​Lazy<T> with the LazyThreadSafetyMode.None option, or use something custom and lighter (the ​Lazy<T> caches exceptions, which is also redundant in this case).
You could optimize the SimpleLazyComparable<T> by removing the isInitialized field, and using the _factory for this purpose T Value => _factory is null ? _value : CreateValue();.
4

I don't claim that this is production-ready, but you can hack it a bit by introducing an extension method such as ThenByCustom which just saves the keySelector and replaces it with x => x. The actual keySelector is used only when comparing values that have the same primary key in a custom comparer.

This is the idea:

public static class Extensions {
    public static IOrderedEnumerable<TSource> ThenByCustom<TSource, TKey>(
        this IOrderedEnumerable<TSource> source,
        Func<TSource, TKey> keySelector) {
        var customComparer = new KeyComparer<TSource, TKey>(keySelector);
        return source.ThenBy(x => x, customComparer);
    }

    private class KeyComparer<TSource, TKey> : IComparer<TSource> {
        private readonly Func<TSource, TKey> _keySelector;
        private readonly IComparer<TKey> _keyComparer;

        public KeyComparer(Func<TSource, TKey> keySelector) {
            _keySelector = keySelector;
            _keyComparer = Comparer<TKey>.Default;
        }

        public int Compare(TSource x, TSource y) {
            return _keyComparer.Compare(_keySelector(x), _keySelector(y));
        }
    }
}

Your (slightly modified) example then:

static void Main() {
    var data = new[]
    {
            new { Name = "Alice", Age = 30 },
            new { Name = "Charlie", Age = 25 },
            new { Name = "Bob", Age = 25 },
        };

    var sorted = data
        .OrderBy(x => Tracer("First criterion", x.Age))
        .ThenByCustom(x => Tracer("Second criterion", x.Name));

    foreach (var item in sorted) {
        Console.WriteLine($"{item.Name} - {item.Age}");
    }
}

static T Tracer<T>(string label, T value) {
    Console.WriteLine($"Computing {label} key: {value}");
    return value;
}

outputs

Computing First criterion key: 30
Computing First criterion key: 25
Computing First criterion key: 25
Computing Second criterion key: Charlie
Computing Second criterion key: Bob
Bob - 25
Charlie - 25
Alice - 30

if you make the age different, you'd see no Tracer calls for the second criterion.

EDIT:

As mentioned in the comments - we will necessarily invoke the keySelectors multiple times if there are 3 or more elements that need comparing. See fiddle by T. N.

This can be addressed if we introduce some caching on our side with e.g. ConditionalWeakTable<TKey,TValue> for self-cleaning / not causing memory leaks if TSource is reference-type.

public static class Extensions {
    public static IOrderedEnumerable<TSource> ThenByCustom<TSource, TKey>(
        this IOrderedEnumerable<TSource> source,
        Func<TSource, TKey> keySelector)
        where TSource : class {
        var cache = new ConditionalWeakTable<TSource, StrongBox<TKey>>();
        var customComparer = new KeyComparer<TSource, TKey>(keySelector, cache);
        return source.ThenBy(x => x, customComparer);
    }

    private class KeyComparer<TSource, TKey> : IComparer<TSource>
        where TSource : class {
        private readonly Func<TSource, TKey> _keySelector;
        private readonly IComparer<TKey> _keyComparer;
        private readonly ConditionalWeakTable<TSource, StrongBox<TKey>> _cache;

        public KeyComparer(
            Func<TSource, TKey> keySelector,
            ConditionalWeakTable<TSource, StrongBox<TKey>> cache) {
            _keySelector = keySelector;
            _cache = cache;
            _keyComparer = Comparer<TKey>.Default;
        }

        public int Compare(TSource x, TSource y) {
            var keyX = GetCachedKey(x);
            var keyY = GetCachedKey(y);
            return _keyComparer.Compare(keyX, keyY);
        }

        private TKey GetCachedKey(TSource obj) {
            if (_cache.TryGetValue(obj, out var boxed))
                return boxed.Value;

            var key = _keySelector(obj);
            _cache.Add(obj, new StrongBox<TKey>(key));
            return key;
        }
    }
}

Fiddle with caching

6 Comments

This works nicely for two sorting criteria, but if you add a third criterion then you'll get multiple invocations of the keySelector for the second criterion.
Yes, this wrapper technique will defer the initial property access, but at the cost of repeatedly accessing that property for each item when three or more items have the same Age value. See this fiddle.
@TN thanks for the example. Something's got to give at some point if we are not going to rewrite the whole OrderBy implementation to support OP's use case. You can obviously create some WeakDictionary or something self-cleaning that can cache the TKey by TSource that we can have access in when asked to compare by TKey to avoid the repeated reevaluation.
@TN see updated answer
@TheodorZoulias maybe I don't understand fully your comment, but does the caching added in the edited answer address the point of multiple invocation of keySelector?
|
3

I am sorting a collection with primary and secondary sort keys. Some elements in my collection have unique primary sort keys so secondary sorting is not necessary. Nevertheless LINQ is evaluating the secondary sort keys for them. This work seems unnecessary, how can I prevent it?

I'll answer the above specific question, as formulated by @dbc in a comment under the question. It's not possible to get the desirable behavior using only the native LINQ operators OrderBy and ThenBy. Apparently these operators where implemented under the assumption that the keySelector will be inexpensive, and will be dwarfed by the higher number of comparisons between the keys, as well as the core sorting algorithm (swapping elements in arrays). So the keySelector is invoked invariably for all the elements in the sequence, before starting the sorting. You can study the source code here and here. It's not very likely that Microsoft will reconsider these assumptions any time soon.

What you could do now is to preserve the information about which of the elements in the source IOrderedEnumerable<TSource> sequence have unique primary sort keys, by using the GroupBy before the OrderBy. Here is how it can be done:

var sorted = data
    .GroupBy(x => Tracer("First criterion", x.Age))
    .OrderBy(g => g.Key)
    .Select(g =>
    {
        if (g.Count() == 1) return g.AsEnumerable();
        return g.OrderBy(x => Tracer("Second criterion", x.Name));
    })
    .SelectMany(x => x);

Output:

Computing First criterion key: 30
Computing First criterion key: 25
Computing First criterion key: 25
Computing Second criterion key: Bob
Computing Second criterion key: Charlie
Bob - 25
Charlie - 25
Alice - 30

Now this is obviously ugly, and adding a third or fourth sorting criterion would be a coding nightmare, but you can add a couple of custom LINQ operators and make it almost as pretty as the standard LINQ:

var sorted = data
    .GroupOrderBy(x => Tracer("First criterion", x.Age))
    .ThenBy(x => Tracer("Second criterion", x.Name))
    .ThenBy(x => Tracer("Third criterion", x.Surname))
    .SelectMany(x => x);

Here are the two custom operators GroupOrderBy and ThenBy:

public static IEnumerable<IGrouping<TKey, TSource>> GroupOrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IEqualityComparer<TKey> equalityComparer = default,
    IComparer<TKey> comparer = default)
{
    return source
        .GroupBy(keySelector, equalityComparer)
        .OrderBy(g => g.Key, comparer);
}

public static IEnumerable<IGrouping<TNewKey, TSource>> ThenBy<TSource, TKey, TNewKey>(
    this IEnumerable<IGrouping<TKey, TSource>> source,
    Func<TSource, TNewKey> keySelector,
    IEqualityComparer<TNewKey> equalityComparer = default,
    IComparer<TNewKey> comparer = default)
{
    return source.Select(g =>
    {
        if (g.Count() == 1) return g.GroupBy(_ => default(TNewKey));
        return g.GroupOrderBy(keySelector, equalityComparer, comparer);
    })
    .SelectMany(x => x);
}

Online demo.

These two operators have a weakness. If you want to customize the sorting behavior of some property, for example by specifying a case-insensitive string comparison for the Name, you have to pass two different and compatible comparers: one IEqualityComparer<Tkey> and one IComparer<Tkey>. This is because the GroupBy does equality comparisons, while the OrderBy does ordinal comparisons. So you will have to do this:

.ThenBy(x => x.Name, StringComparer.OrdinalIgnoreCase, StringComparer.OrdinalIgnoreCase)

If you pass two incompatible comparers by mistake, the behavior is undefined.

Obviously the above mechanism is significantly heavier than the standard OrderBy and ThenBy. So it makes sense to use it only if you are dealing with really heavy keySelectors.

1 Comment

"implemented under the assumption that the keySelector will be inexpensive, [so it] is invoked invariably for all the elements in the sequence, before starting the sorting." - I would argue that the assumption was that the selector call is expensive, not inexpensive, which is why it is executed up-front only once per element, not for each comparison operation during the sorting.
0

Let's assume that we have a Person class instead of an anonymous type.

class Person(string name, int age)
{
    public string Name { get; } = name;
    public int Age { get; } = age;
}

Then, another possibility is to use Enumerable.Order<T>(IEnumerable<T>, IComparer<T>) instead of OrderBy followed by ThenBy. This allows you to create your own comparison logic accessing and comparing only the necessary properties.

class PersonComparer : IComparer<Person>
{
    public static readonly PersonComparer Instance = new();

    private PersonComparer() { }

    public int Compare(Person? x, Person? y)
    {
        if (ReferenceEquals(x, y)) return 0;
        if (x is null) return -1;
        if (y is null) return 1;

        int ageOrder = Tracer("First criterion x.Age", x.Age).CompareTo(Tracer("First criterion y,Age", y.Age));
        return ageOrder == 0
            ? Tracer("Second criterion x.Name", x.Name).CompareTo(Tracer("Second criterion y.Name", y.Name))
            : ageOrder;
    }
}

We test like this:

var data = new[]
{
    new Person ( "Alice",  30 ),
    new Person( "Bob",  25 ),
    new Person( "Charlie", 25 )
};

var sorted = data
    .Order(PersonComparer.Instance);

The output is longer. This in part dependent on the sorting algorithm and also because we are not capturing the value in a lambda, I assume.

Computing First criterion x.Age key: 30
Computing First criterion y,Age key: 25
Computing First criterion x.Age key: 25
Computing First criterion y,Age key: 25
Computing Second criterion x.Name key: Bob
Computing Second criterion y.Name key: Charlie
Computing First criterion x.Age key: 30
Computing First criterion y,Age key: 25
Bob - 25
Charlie - 25
Alice - 30

But it shows that the second property is only accessed when the first ones are equal.

You could create a Person record instead, then comparison logic would be created by the compiler for free and you could call Order without any parameters.

2 Comments

I had the same idea as well. My tests revealed that it causes multiple invocations of the keySelector for the first sorting criterion, which defeats the whole purpose of this exercise. Add a third criterion, and now you have multiple invocations for the second criterion as well.
See this fiddle with extra test data. If the data contains groups with three or more items having the same age, the number of Name property accesses actually increases.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.