Generalized additive models with automatic smoothness selection take some of the guess-work out of needing to specify exactly the number of degrees of freedom you expect $f(x)$ to take/use.
The idea is that the user specifies that the smooth function $f(x)$ should be represented in the model with $K$ basis functions. $K$ represents the upper limit on expected wiggliness of the true (but unknown) function. Then a wiggliness penalty is used to balance the fit of the data with the complexity of the function. A smoothing parameter, $\lambda$, is introduced as a hyperparameter that is optimised during model fitting, wherein we find estimates of the model coefficients (for any parametric terms plus the coefficients for the basis functions of the spline) that maximise the penalised log likelihood
$$
\mathcal{L}_p(\boldsymbol{\beta}) = \mathcal{L}(\boldsymbol{\beta}) + \frac{1}{2} \lambda \boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta}
$$
where $\mathcal{L}$ is the log-likelihood of the data and $\mathcal{L}_p$ the penalised log-likelihood given $\boldsymbol{\beta}$, $\boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta}$ is the wiggliness penalty, and $\lambda$ is the smoothing parameter that controls the fit-to-data vs complexity tradeoff.
By default, $\boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta}$, the wiggliness penalty is a measure of the squared second derivative of the function over the range of $x$.
We would typically choose $\lambda$ with REML or ML smoothness selection, but generalised cross validation (GCV) or AIC are other options, among many criteria available depending on the specific software being used.
The point of all this is that, to some extent, you can relax a little about getting the right degrees of freedom for the spline term(s). What is important here (and in splines in general), is that the basis for $f(x)$ is rich enough to contain the true function or a close approximation to $f(x)$. The general advice is to set $K$, the number of basis functions for the smooth $f(x)$, a little larger than the expected wiggliness of the true but unknown function: say you think $f$ uses ~5 degrees of freedom, there are more functions of ~5 EDF contained in a basis of size 7 or 8, than one of size 5. Hence, one might set $K = 8$ and then allow the wiggliness penalty to shrink away some of the unneeded wiggliness implied by 8 basis functions.
Of course, like anything in statistics, there is no free lunch; automatic smoothness selection methods are not infallible, and care needs to be taken when specifying $K$. If there is unmodelled autocorrelation in the response, or other clustering, things can go wrong (so tell the model about these features of the data(!) or use something like neighbourhood cross validation (NCV) to do the smoothness selection). Likewise, if you know the function has to be monotonic but don't enforce this in the model, $\hat{f}(x)$ may be too wiggly: again, a solution is to tell the model about this and enforce shape constraints on the $\boldsymbol{\beta}$.
Likewise, setting $K$ really large and hoping the smoothness selection will fix the problem so that you don't actually have to think. All this will do is
- make you model fitting take much longer because there are many $\boldsymbol{\beta}$ to keep track of, and
- increase the risk that the model adapts to the specific sample of data you have observed, reducing its effectiveness as a model for unseen data and suggesting the relationship between $y$ and $x$ is more complex than it truly is.
There are other modelling choices you often need to make with splines, like choosing where to place the knots. This can avoided if you use a low-rank eigen basis such as the low-rank thin plate regression spline basis of Wood (2003), which are the default in the mgcv package for R, which is a popular, performant, and extensive suite of functions for fitting GAMs and related models.
Using AIC in the manner you mention is going to get problematic if want to use the model for more than prediction: if you are interested in the p values of the estimated smooth(s) or effects, this form of model selection is going to make a complete mess of things. Simon Wood and others have developed theory that allows computing p values for smooth subject to automatic smoothness selection that have approximately the right type I error rate, so the automatic selection I describe above still can result in p values you can use if you really must use p value. Not so for the kind of AIC selection you mention.
A further problem is that dealing with each covariate in turn is going to be suboptimal in the sense that reducing the degrees of freedom for $x_2$ might mean $x_1$ should now use more degrees of freedom than you initially worked out when you looked at $x_1$ without fiddling with $x_2$. Automatic smoothness selection methods avoid this problem.
In summary, in a GAM, the first choice is to decide whether the relationship between a covariate and the response is potentially non-linear. If so, represent that relationship as a spline, and choose the number of basis functions, $K$, to use for that spline, using the points I mention above. Then fit the model using REML or ML smoothness selection (or NCV if $y$ is autocorrelated). Then you do the usual model checks, plus you need to check that your choice(s) for $K$ were OK, and there are heuristic tests for this, e.g. Pya & Wood (206), as implemented in mgcv::k.check() for example, and other ways are possible too by fitting the model to the residuals replacing $y$ and using larger $K$.
I somewhat disagree with Peter Flom's comment
Yes, they [splines] … can be hard to interpret
While yes, the effect of $x$ in $f(x)$ can't be summarised in a single coefficient, but the component functions of a GAM can be plotted so we can visualise exactly how the response changes as a function of $x$ while holding all other effects constant.
References
Pya, N., Wood, S.N., 2016. A note on basis dimension selection in generalized additive modelling. arXiv [stat.ME].
Wood, S.N., 2003. Thin plate regression splines: J. R. Stat. Soc. Series B Stat. Methodol. 65, 95–114. https://doi.org/10.1111/1467-9868.00374