I do not know the answer to your main question, but I will try to answer the underlying question:
I want to know if it is feasible to implement more efficient trigonometric functions for specific use cases.
There are microcontrollers that do not have hardware implementations
of trig functions, and that is the main point. The AVRs powering many
Arduinos are among them: they don't even have hardware floating point.
The libc provides implementations of the trig functions that are both
quite accurate (about 24 bits of accuracy) and very slow (about
100 µs for a cos() on an Uno).
There are also situations where you do not need that level of accuracy. So yes, custom trig functions that hit the right balance between speed and accuracy can be helpful on Arduinos. Note that the right balance is project specific, so there is no one-size-fits-all, and rolling your own can make sense.
For reference, I once wrote a fixed-point cos() accurate to
9.53 × 10−5 that ran in 6.77 µs on average on
an Uno. I can provide a link if someone is interested. If you can do
better (faster at same accuracy, or more accurate at same speed) I
definitely would love to see that.