First of all, the code:
struct Bucket<H, V> where H: Ord
{
hash: H,
values: Vec<V>
}
impl<H, V> Bucket<H, V> where H: Ord {
fn new(hash: H) -> Bucket<H, V> {
Bucket {
hash: hash,
values: vec![],
}
}
}
pub fn bucket_sort<T, F, H>(values: Vec<T>, hasher: F) -> Vec<T>
where T: Ord, F: Fn(&T) -> H, H: Ord
{
let mut buckets: Vec<Bucket<H, T>> = vec![];
for value in values.into_iter() {
let hash = hasher(&value);
match buckets.binary_search_by(|bucket| bucket.hash.cmp(&hash)) {
Ok(index) => {
buckets[index].values.push(value);
},
Err(index) => {
let mut bucket = Bucket::new(hash);
bucket.values.push(value);
buckets.insert(index, bucket);
}
}
}
let mut sorted_values = Vec::new();
for bucket in buckets.into_iter() {
let mut bucket = bucket;
bucket.values.sort();
sorted_values.extend(bucket.values);
}
sorted_values
}
#[test]
fn test_bucket_sort() {
let values = vec![5, 10, 2, 99, 32, 1, 7, 9, 92, 135, 0, 54];
let sorted_values = bucket_sort(values, |int| int / 10);
assert_eq!(sorted_values, vec![0, 1, 2, 5, 7, 9, 10, 32, 54, 92, 99, 135]);
}
The thing that bothers me about this implementation is that it's destructive: bucket_sort takes ownership of a vector, and builds another one. I first tried to implement a bucket sort with this signature:
pub fn bucket_sort<T, F, H>(values: &[T], hasher: F) -> Vec<T>
where T: Ord, F: Fn(&T) -> H, H: Ord
But I realized that passing a slice means that:
- either sorting must happen in place
- or elements must implement
Copy
Copy-ing is too restrictive and probably inefficient on a large number of values, and I don't know if it's possible to implement in-place sorting with bucket sort.
Apart from this, are there any obvious style/performance flaws in this implementation?