In my opinion a fundamental insight into the nature of the Brachistochrone problem was put forward by Jacob Bernoulli.
Jacob Bernoulli was the older brother of Johann Bernoulli. Johann Bernoulli was the mathematician who put out the Brachistochrone problem as a challenge to the mathematicians of his time.
Jacob Bernoulli opened his discussion of the Brachistochrone problem with a lemma illustrated with the following diagram:

Diagram 1
Lemma. Let ACEDB be the desired curve along which a heavy point falls from A to B in the shortest time, and let C and D be two points on it as close together as we like. Then the segment of arc CED is among all segments of arc with C and D as end points the segment that a heavy point falling from A traverses in the shortest time. Indeed, if another segment of arc CFD were traversed in a shorter time, then the point would move along AGFDB in a shorter time than along ACEDB, which is contrary to our supposition.
(The diagram is copied from the file with the scan of the of the actual diagram, as published by Jacob Bernoulli in the Acta Eruditorum, May 1697, pp.211-217)
As of this writing available on Archive.org
Source of the diagram
So what Jacob Bernoulli was pointing out:
If you have the solution, a specific curve, and you take any subdivision of that overall curve, then that subsection is by itself an instance of the Brachistochrone problem, and that is valid down to infinitesimally small subsections.
The same from another angle:
The Brachistochrone curve has a property that I will refer to as being 'concatenable'.
The catenary problem is well suited to illustrate that property:
The curve that is the solution to the catenary problem has the property that every subsection of it is an instance of the catenary problem, down to infinitesimally small subsections.
The catenary problem should yield to an approach where you set up an equation that is generic for any of the infinitesimally small subsections in one go.
As we know: the type of equation that meets those demands is a differential equation.
A differential equation is a global equation in the following sense: the curve that is the solution to the differential equation satisfies the differential relation along the entire length of the curve all in one go.
More generally, the solution space of a differential equation is a space of functions, instead of a space of values, as is the case with, say, the type of equation that finds the roots of a polynomial.
In contrast with the above: there is the well known 'Traveling salesman problem'. Let's say you have a set of 10 cities. Now divide the total area into two adjacent areas A and B, such that each area encompasses 5 cities. Set up an optimized itinary for area A and an itinary for area B. What happens if you put those to itinaries back to back? It is unlikely in the extreme that that would result in an itinary that is optimized for the total set of 10 cities. The problem is non-concatenable.
Specific to the Brachistochrone problem:
The Brachistochrone problem straddles two categories of application of calculus of variations. It's a Statics problem in the sense that the solution is a static shape: which shape has the property that sliding along it happens in the fastest time? But to get at the solution Dynamics must be applied: given a curve, how fast will an object slide down the curve?
So yeah, the Brachistochrone problem is rather a falling-in-between-categories case.
About the dynamics of the Brachistochrone problem:
As an object slides down the incline it accumulates velocity, in accordance with $F=ma$.
The Brachistochrone problem would be significantly harder to solve if the amount of acceleration would in any way be a function of already existing velocity.
Fortunately that is not the case: the amount of velocity that the sliding object gains per unit of height is independent of whatever velocity the sliding object has previously gained.
Further reading:
Paul Rojas, 2014
The straight line, the catenary, the brachistochrone, the circle, and Fermat
Preetum Nakkiran:
Geometric derivation of the Euler-Lagrange equation
Preetum Nakkiran uses the catenary problem as motivating example.
We have that the Euler-Lagrange equation is a differential equation. The explicit expression states a local condition, but of course the demand is that it is satisfied for the entire curve all at once, constituting a global demand.
Since the explicit statement states a local condition it should be possible to derive the Euler-Lagrange equation using differential reasoning only.
Preetum Nakkiran proceeds to do that: the Euler-Lagrange is derived using differential reasoning only; no involvement of integration by parts, or any integration for that matter.
What it takes to have solid expectation that a single extremum exists
Some time ago I implemented what in effect is numerical analysis of the catenary problem. (The implementation is an interactive diagram, part of an article about calculus of variation that is available on my website.)
I divided the catenary in straight subsections, the subsections connect nodes. In the interactive diagram the nodes can be moved vertically. Moving a node changes the length of the total catenary. Counterweights exert a constant force.
When all the nodes are moved down the potential energy of the counterweights is the dominant term; pulling the nodes further down increases the total potential energy.
When all the nodes are up high the weight of the catenary has large mechanical advantage, so moving the nodes down lowers the total potential energy.
To find the node arrangement with the lowest possible potential energy: iterate to converge onto the extremum.
If there would be 4 nodes:
To the left of node 1 is the suspension point, to the right of node 1 is node 2.
Move node 1, finding the extremum, relative to the current position of node 2.
Proceed to adjust node 2: find the extremum, relative to the current positions of node 1 and node 3.
Rinse and repeat.
Keep cycling the sequence of nodes.
This iterative process converges onto a global extremum. The fact that the solution space is subdividable ensures that there is a single, unique solution.
The larger the number of nodes, the better the approximation onto the analytical solution.
Porting the numerical analysis reasoning to the Brachistochrone problem
The crudest implementation would be a single node in between the two fixed end points.
That would give a profile consisting of two straight sections, joined at the node. Obtain the gained velocity from the height difference. Changing the height of the node changes the length of each section. Under those circumstances the existence of an extremum is garanteed.
With a large number of nodes:
Then each triplet of nodes is converged to its extremum, iterating over all the noddes. In the converged end state: each triplet is at its extremum, therefore the convergeed state is a global extremum.