Here's two implementations to consider. One comes from both your own and TonyM's consideration about using priority encoders. The other is from a sequential approach using an adder. Both work well in simulation.
(However, keep in mind that digital simulators aren't exactly the same as actual logic. Sometimes, the tricks needed for a simulator aren't the same tricks one uses when building something from parts.)
The upper diagram is combinatorial/combinational. The lower diagram is sequential. The adder in the lower diagram is a simplified adder accepting its own latched output as its input, along with the masked bit to add if present.
(In the upper diagram case, note that the bits are provided to the priority encoders in reverse order. Note, these are example cases with some testing applied. Not polished work. Ask questions and I'll try to make some better sense.)
(Image produced using Neemann's DIGITAL program.)
The only complicated bit for the sequential schematic is the section that develops the clocking for the counter. The clocking for the adder is straight-forward. Keep in mind that this is set up for a simulator to function well. So if this were to be built with real ICs and parts, the behavioral schematic may require some changes. But the basic idea would remain. (Don't let that fact deter you. Details matter, of course. Particularly in a system that depends both on rising and falling clock edges. But that is reality. Using both edges of a clock is a thing worth learning about.)


