Bucket sort in Rust

Question

First of all, the code:

struct Bucket<H, V> where H: Ord
{
    hash: H,
    values: Vec<V>
}

impl<H, V> Bucket<H, V> where H: Ord {
    fn new(hash: H) -> Bucket<H, V> {
        Bucket {
            hash: hash,
            values: vec![],
        }
    }
}

pub fn bucket_sort<T, F, H>(values: Vec<T>, hasher: F) -> Vec<T>
    where T: Ord, F: Fn(&T) -> H, H: Ord
{

    let mut buckets: Vec<Bucket<H, T>> = vec![];

    for value in values.into_iter() {
        let hash = hasher(&value);
        match buckets.binary_search_by(|bucket| bucket.hash.cmp(&hash)) {
            Ok(index) => {
                buckets[index].values.push(value);
            },
            Err(index) => {
                let mut bucket = Bucket::new(hash);
                bucket.values.push(value);
                buckets.insert(index, bucket);
            }
        }
    }

    let mut sorted_values = Vec::new();
    for bucket in buckets.into_iter() {
        let mut bucket = bucket;
        bucket.values.sort();
        sorted_values.extend(bucket.values);
    }
    sorted_values
}

#[test]
fn test_bucket_sort() {
    let values = vec![5, 10, 2, 99, 32, 1, 7, 9, 92, 135, 0, 54];
    let sorted_values = bucket_sort(values, |int| int / 10);
    assert_eq!(sorted_values, vec![0, 1, 2, 5, 7, 9, 10, 32, 54, 92, 99, 135]);
}

The thing that bothers me about this implementation is that it's destructive: bucket_sort takes ownership of a vector, and builds another one. I first tried to implement a bucket sort with this signature:

pub fn bucket_sort<T, F, H>(values: &[T], hasher: F) -> Vec<T>
where T: Ord, F: Fn(&T) -> H, H: Ord

But I realized that passing a slice means that:

either sorting must happen in place
or elements must implement Copy

Copy-ing is too restrictive and probably inefficient on a large number of values, and I don't know if it's possible to implement in-place sorting with bucket sort.

Apart from this, are there any obvious style/performance flaws in this implementation?

Hooray more Rust questions! Hope you get the review you want, and welcome to Code Review. — Dan
– Dan, Commented Oct 24, 2016 at 16:25

Shepmaster · Accepted Answer · 2016-10-24 17:58:51Z

Great work! One thing that stuck out to me was the use of extend; that shows that you've looked through the API docs to see what's available. Most of the time, people start out by just pushing all the elements from one vector to the other.

But I realized that passing a slice means that:

either sorting must happen in place

or elements must implement Copy

This analysis is correct, congratulations! In fact, this is something I like about Rust's type system. If I hand a method a &mut [T], then I know that it is very likely that it will be doing something in-place, presumably in an efficient manner.

Copy-ing is [...] probably inefficient on a large number of values

Types that implement Copy are those that be be semantically duplicated by copying bits but not executing any additional code. Generally, this is something that processors are good at. Types that are Clone, on the other hand, require some additional computation to occur when they are duplicated. These are the ones you want to be careful about duplicating needlessly. In general, I wouldn't worry about doing some copies.

Your point about Copy potentially being overly restrictive is quite valid though.

where clauses go on a separate line, and each additional restriction also goes on a separate line. This helps readability and understanding as those restrictions can dramatically change how a function should be called.
Use trailing commas just about everywhere in Rust.
I prefer to not add type restrictions on structs or methods that don't need them.
I prefer to not use into_iter when I can just pass the collection directly to something that accepts an IntoIterator.
Pick one of Vec::new or vec![] and be consistent with it.
Prefer for mut bucket in buckets instead of rebinding the variable just to add mutability.
Extend Bucket::new to take a value. You don't really need the ability to create an empty bucket.
The match arms can be one-lined.
flat_map and collect can be used to glom all of the intermediate vectors into one.
It's OK to compare a Vec<T> against an array of T. Use this in the tests to avoid needlessly allocating another vector.

struct Bucket<H, V> {
    hash: H,
    values: Vec<V>,
}

impl<H, V> Bucket<H, V> {
    fn new(hash: H, value: V) -> Bucket<H, V> {
        Bucket {
            hash: hash,
            values: vec![value],
        }
    }
}

pub fn bucket_sort<T, F, H>(values: Vec<T>, hasher: F) -> Vec<T>
    where T: Ord,
          F: Fn(&T) -> H,
          H: Ord
{
    let mut buckets: Vec<Bucket<H, T>> = Vec::new();

    for value in values {
        let hash = hasher(&value);
        match buckets.binary_search_by(|bucket| bucket.hash.cmp(&hash)) {
            Ok(index) => buckets[index].values.push(value),
            Err(index) => buckets.insert(index, Bucket::new(hash, value)),
        }
    }

    buckets.into_iter().flat_map(|mut bucket| {
        bucket.values.sort();
        bucket.values
    }).collect()
}

#[test]
fn test_bucket_sort() {
    let values = vec![5, 10, 2, 99, 32, 1, 7, 9, 92, 135, 0, 54];
    let sorted_values = bucket_sort(values, |int| int / 10);
    assert_eq!(sorted_values,
               [0, 1, 2, 5, 7, 9, 10, 32, 54, 92, 99, 135]);
}

Thank you so much for such a detailed review, especially wrt to Copy and Clone. flat_map is great! — little-dude
– little-dude, Commented Oct 24, 2016 at 19:22

Stack Exchange Network

Bucket sort in Rust

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Bucket sort in Rust

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions