> 1. Why can there be such huge differences that alter the characters and their interactions so drastically, even though the staging is supposedly the same?

Opera scores tend to have very little in the way of stage directions.  The earlier the opera, the more likely that is to be true.  This is also true of plays.  For example, in Shakespeare's _As You Like It_, nearly every production has the players on stage gathered around Jaques in a circle when he explains that _ducdame_ is "an invocation to draw fools into a circle," but this stage business is inferred from the spoken line, not made explicit by a stage direction.

As a consequence, one of the parameters of opera that has the most room for original creativity is the staging.  Some directors use willfully anachronistic settings for one reason or another, a notable example being Peter Sellars' modern settings of Mozart operas (see https://en.wikipedia.org/wiki/Peter_Sellars).  Even without such bold statements, the inference of various stage business from the material itself is a fine art.  Some directors miss obvious opportunities -- perhaps knowingly.  Other implied action may be debatable.

> 2. To what extent are singers free to shape their roles? Are they given only a framework of stage directions, with room left for personal interpretation or improvisation? (And doesn't this lead to chaos if every singer does that?)

This depends entirely on the director.  Some productions _are_ chaos; other directors will encourage personal contributions but have the skill to control any chaos that might arise; and yet others will give firmer constraints.  (Compared with theatrical productions, opera will more likely lean toward firmer control by the director for a few reasons, notably the tendency of opera productions to be "double cast": to have most or even all major roles sung by multiple singers over the course of the run.  Another reason will be the varying level of dramatic skill in the cast -- typically a secondary consideration in casting opera singers -- such that some singers may have difficulty adjusting to others' improvisation).

Considering the example of whether Cherubino and the Countess should kiss, you might ask yourself whether this would have happened in a court performance in Mozart's day.  But regardless of whether the answer is _yes_ or _no,_ we can argue that a modern production, for a modern audience, is free to do the opposite.  Modern audiences have modern sensibilities and varying awareness of the etiquette of the 18th century, and opera companies need to fill seats.  An academic production is more likely have a goal of historical accuracy, but an opera company is likely to put that aside (to some degree or another) in the interest of attracting patrons.

> 3. How is an audience to evaluate a production when it can vary so much from one performance to another?

I don't quite understand why differences would be a problem.  As with plays, various considerations can come into play, notably budget and other constraints on set construction.  Yet we can evaluate different productions of different plays.  An obvious point on which to evaluate the Mozart example is whether you agree with the decision for them to kiss -- or not to kiss -- and the manner of the kiss, etc.  And, whether or not you agree with it, whether the director, the conductor, and the cast made the scene "work" dramatically.  This is a matter of opinion.  Different people will have different conclusions.

I hope you don't mean to ask how audiences can judge how "correct" a given production is.  I judge opera productions on how _successful_ they are.  I love the idea of historically accurate productions of 18th century opera.  But I don't know how 18th century operas were staged, do you?  I don't particularly like the idea of Mozart in a New York City laundry room, yet Peter Sellars' productions are dramatically compelling and thought provoking (and, as a consequence, I've warmed to "unconventional" settings both in opera and in theater -- they're often self-conscious and artificial, but sometimes they work, and they can work very well).

If the music is performed well, the drama is cohesive, and the key dramatic moments succeed in moving me without seeming too artificial or contrived, I am satisfied.  If two productions meet these goals but have significant differences in their staging, I will evaluate them as both excellent but different.