2
\$\begingroup\$

The following code is meant to generate ranges from arbitrary integer sequences and print them in a bash-like syntax:

main.rs

use ranges::Ranges;
use std::io::stdin;
use std::ops::RangeInclusive;

fn main() {
    println!(
        "{}",
        Ranges::from(read_integers())
            .map(range_to_bash_literal)
            .collect::<Vec<_>>()
            .join(" ")
    )
}

fn read_integers() -> impl Iterator<Item = i64> {
    stdin()
        .lines()
        .take_while(|line| line.is_ok())
        .map(|line| line.unwrap())
        .flat_map(|line| {
            line.split_ascii_whitespace()
                .map(str::to_owned)
                .collect::<Vec<_>>()
        })
        .map(|line| line.parse::<i64>())
        .take_while(|result| result.is_ok())
        .map(|result| result.unwrap())
}

fn range_to_bash_literal(range: RangeInclusive<i64>) -> String {
    if range.start() == range.end() {
        range.start().to_string()
    } else {
        format!("{{{}..{}}}", range.start(), range.end())
    }
}

lib.rs

use std::ops::RangeInclusive;

/// Generate ranges from integer sequences
///
/// # Examples
///
/// ```
/// use std::ops::RangeInclusive;
/// use ranges::Ranges;
///
/// let sequence: Vec<i64> = vec![1, 2, 3, 6, 7, 9, 9, 9, 11, 20, 21, 22, 24, 23, 22];
/// let target: Vec<RangeInclusive<i64>> = vec![1..=3, 6..=7, 9..=9, 9..=9, 9..=9, 11..=11, 20..=22, 24..=22];
/// let ranges: Vec<RangeInclusive<i64>> = Ranges::from(sequence.into_iter()).collect();
///
/// assert_eq!(ranges, target);
/// ```
#[derive(Debug)]
pub struct Ranges<T>
where
    T: Iterator<Item = i64>,
{
    numbers: T,
    start: Option<i64>,
}

#[derive(Eq, PartialEq)]
enum Order {
    Descending,
    Ascending,
}

impl<T> Ranges<T>
where
    T: Iterator<Item = i64>,
{
    pub fn new(numbers: T) -> Self {
        Self {
            numbers,
            start: None,
        }
    }
}

impl<T> Iterator for Ranges<T>
where
    T: Iterator<Item = i64>,
{
    type Item = RangeInclusive<i64>;

    fn next(&mut self) -> Option<Self::Item> {
        let mut order: Option<Order> = None;
        let mut end: Option<i64> = None;

        loop {
            match self.numbers.next() {
                None => {
                    return match self.start {
                        None => None,
                        Some(start) => {
                            self.start = None;

                            match end {
                                None => Some(start..=start),
                                Some(end) => Some(start..=end),
                            }
                        }
                    }
                }
                Some(next) => match self.start {
                    None => {
                        self.start = Some(next);
                    }
                    Some(start) => match &order {
                        None => {
                            if next == end.unwrap_or(start) + 1 {
                                end = Some(next);
                                order = Some(Order::Ascending);
                            } else if next == end.unwrap_or(start) - 1 {
                                end = Some(next);
                                order = Some(Order::Descending);
                            } else {
                                self.start = Some(next);
                                return Some(start..=end.unwrap_or(start));
                            }
                        }
                        Some(order) => {
                            if (order == &Order::Ascending && next == end.unwrap_or(start) + 1)
                                || (order == &Order::Descending && next == end.unwrap_or(start) - 1)
                            {
                                end = Some(next)
                            } else {
                                self.start = Some(next);
                                return Some(start..=end.unwrap_or(start));
                            }
                        }
                    },
                },
            }
        }
    }
}

impl<T> From<T> for Ranges<T>
where
    T: Iterator<Item = i64>,
{
    fn from(value: T) -> Self {
        Self::new(value)
    }
}

Usage

$ echo 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 40 | ~/.cargo/bin/ranges
{1..11} {13..38} 40

How may I improve the code?

\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

main.rs

Overall, main.rs looks pretty good, except for performance. There's a lot of intermediaries.

The biggest culprit here is the range_to_bash_literal function, which allocates a new String every single time. Using a custom BashRange structure would allow you to implement the Display trait for it with the same logic as range_to_bash_literal:

impl fmt::Display for BashRange {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> Result<(), fmt::Error> {
        if self.start == self.end {
            write!(f, "{}", self.start)
        } else {
            write!(f, "{{{}..{}}}", self.start, self.end)
        }
    }
}

Then, we can eliminate the collection into a Vec by using for or for_each, and printing fragments.

And of course, you'll want to buffer stdin/stdout, as otherwise reading/writing can be a bit slowish.

All in all, this means:

fn main() -> Result<(), Box<dyn Error>> {
    let mut stdout = io::BufWriter::new(io::stdout().lock());

    let mut separator = "";
    BashRanges::from(read_integrals())
        .try_for_each(|bash| {
            let result = write!(stdout, "{separator}{bash}");

            separator = " ";

            result
        })?;

    Ok(())
}

And the start of read_integrals will need an io::BufReader::new(io::stdin().lock()).

lib.rs

lib.rs is not quite as idiomatic as it good be, and honestly the logic is hard to follow between the state and the levels of nesting.

Make copyable objects Copy

The Order enum could derive quite a few more traits, in particular Debug (always), and because it's a simple enum: Copy.

If a type is copyable, small, and non-public, there's no reason NOT to make it Copy. Then you don't have to be careful about references, you can just copy it without issues, making your code easier.

Minimally constrain structures

The Ranges structure can exist whether T is an Iterator, or not.

Hence, its definition should be:

struct Ranges<T> {
    numbers: T,
    start: Option<i64>,
}

This makes it easier to implement operations that care nothing about iterating T, for example equality, display, ...

Different types of loops

Rust has 3 types of loops: loop, while, and for, in decreasing order of freedom.

In general, it's better to use the loop with the least amount of freedom, as it means your readers get higher level semantics and don't need to dive deeper into the iteration structure itself.

In your case, you should be using for:

impl<T> Iterator for Ranges<T>
where
    T: Iterator<Item = i64>,
{
    type Item = RangeInclusive<i64>;

    fn next(&mut self) -> Option<Self::Item> {
        let mut order: Option<Order> = None;
        let mut end: Option<i64> = None;

        for next in &mut self.numbers {
            //  Former "Some" case logic.
        }

        //  Former "None" case logic.
    }
}

Short-circuiting is cool, and so are the APIs.

In Rust, the ? operator will get the value inside an Option or Result, or return from the current function.

Hence, the former "self.start == None" block can be rewritten from:

match self.start {
    None => None,
    Some(start) => {
        self.start = None;

        match end {
            None => Some(start..=start),
            Some(end) => Some(start..=end),
        }
    }
}
let start = self.start?;

Some(start..=end.unwrap_or(start))

Note: the assignment self.start = None is pointless whenever self.numbers is a "fused" iterator, which is most of them. If really you want to handle non-fused ones, self.start.take() is better than your current match + assign.

end is weird

Your end variable is declared as an Option, yet every time it's used, it ends up being end.unwrap_or(start).

This means, then, than effectively there's always a good value for it, in all cases it's used. This suggests, thus, than a better way to structure the code should exist. We'll see it later.

Use sub-functions

The logic to determine whether orders match (or not) is scattered throughout next, instead you can create a constructor for Order:

impl Order {
    fn new(current: i64, next: i64) -> Option<Order> {
        if next == current + 1 {
            Some(Order::Ascending)
        } else if next == current - 1 {
            Some(Order::Descending)
        } else {
            None
        }
    }
}

Use if let or let else over match when dealing with Option or Result.

Similar advice to loop: match is the most open-ended form of pattern matching. Instead, when possible, you can use more "constraining" forms, to guide the reader.

Thus, let's rewrite the for loop body:

for next in &mut self.numbers {
    let Some(start) = self.start else {
        self.start = Some(next);
        continue
    };

    ...
}

match on several items at once, to "group" logic.

Matching on tuples is perhaps one of my favorite ways. It cuts down your match + 5 if cases to:

for next in self.numbers {
    let Some(start) = self.start else {
        self.start = Some(next);
        continue
    };

    let current = end.unwrap_or(start);
    let new_order = Order::new(current, next);

    match (order, new_order) {
        (None, Some(_)) => {
            end = Some(next);
            order = new_order;
        }
        (Some(left), Some(right)) if left == right => {
            end = Some(next);
        }
        (_, _) => {
            self.start = Some(next);
            return Some(start..=current);
        }
    }
}

Simplicity & Performance

At the moment, the optional self.start in Ranges causes quite a bit of branching.

Looking closer at the state transitions, though, one can observe that it will be set from the first number of numbers, and from then be set until numbers runs out.

This suggests that a simple modification of the constructor of Ranges could simplify the iterator by making Ranges slightly eager:

impl<T> Ranges<T>
where
    T: Iterator<Item = i64>,
{
    pub fn new(mut numbers: T) -> Self {
        let start = numbers.next();

        Self { numbers, start }
    }
}

Now, we don't need to re-check whether there's a first number at every iteration, as long as self.start not None there was one, and self.start being None will mean we're done.

And because self.start being None means we're done, we can check it first things in next, and otherwise we have a value to initialize end with, and do away with all the end.unwrap_or(start).

This gives us:

impl<T> Iterator for Ranges<T>
where
    T: Iterator<Item = i64>,
{
    type Item = RangeInclusive<i64>;

    fn next(&mut self) -> Option<Self::Item> {
        let start = self.start?;
    
        let mut order: Option<Order> = None;
        let mut end = start;

        for next in &mut self.numbers {
            let new_order = Order::new(end, next);

            match (order, new_order) {
                (None, Some(_)) => {
                    end = next;
                    order = new_order;
                }
                (Some(left), Some(right)) if left == right => {
                    end = next;
                }
                (_, _) => {
                    self.start = Some(next);
                    return Some(start..=end);
                }
            }
        }

        //  Just one last item for the road, and from then on
        //  calling `next` will return `None`.
        self.start = None;

        Some(start..=end)
    }
}

A lot more concise, no?

Disclaimer: not tested, so... the conciseness may hide defects!

\$\endgroup\$
4
  • \$\begingroup\$ Thank you. The general suggestion are good, but your refactored code does not work for me. There are several borrowing, move and trait bound issues that I was unable to resolve, especially regarding for next in self.numbers. Also the iterator never terminates, since it will never return None. \$\endgroup\$ Commented May 31, 2023 at 15:11
  • 1
    \$\begingroup\$ @RichardNeumann: I should have tried to compile... You need to mutably borrow self.numbers, so for next in &mut self.numbers will do the trick, I'll amend the answer. And I completely missed the fact that the optional self.start was used as sentinel for termination, that's indeed important. I need to double-check what to do on that one. \$\endgroup\$ Commented May 31, 2023 at 16:15
  • \$\begingroup\$ Thank you. I already applied some of your suggestions to the upstream code. Link in the original question. \$\endgroup\$ Commented May 31, 2023 at 16:23
  • \$\begingroup\$ @RichardNeumann: I fixed the loop termination issue, I think. It requires start to be Option, but next can bail out early regardless so end shouldn't need to. Untested code though -- hence why I never noticed the absence of termination -- so no guarantee on correctness! \$\endgroup\$ Commented May 31, 2023 at 16:25

You must log in to answer this question.