single responsibility
I'm accustomed to seeing lots of SqlAlchemy model classes
that are narrowly focused on representing rows
and on relational integrity (like foreign keys).
Such classes will only occasionally need a """docstring""",
as they are self explanatory.
But in your case, it sounds like the first thing you need
to do is write a one-sentence docstring explaining what the
single
responsibility of that code is.
RDBMS integrity, like UNIQUE index or
FK,
is definitely within scope.
Here are some things that I feel are out of scope:
- supporting JOINs that a higher level library routine, or app level routine, should handle. (Consider creating a named JOIN query by issuing CREATE VIEW and defining a SqlAlchemy model for that relation.)
- ML APIs (Consider creating ML report table(s) which JOIN to the Article table using FK.)
- other APIs (same as for ML APIs)
- redis queuing (other than possibly offering a field where some higher level library handler can store an essential redis ID, if PK does not suffice. Or maybe we want a redis queueing table on the side, with a FK back to Article.)
- any presentation layer code that converts to an ES-compatible dict
- any ES handlers that interact with an ElasticSearch backend
In other words, the Article model should focus on
Codd & Date relational theory,
without being distracted by the many things in the world
that will sometimes happen to interact with a stored relation.
transition
Recommend you do this using a new name, like class Article1,
as it clearly will be a Breaking Change.
Now, go clean up all the breakage by implementing
additonal code layered atop the Article1 model.
Hopefully your existing automated integration tests
will help guide this effort.
Alternatively, break things a little bit at a time.
Start by evicting any ES interactions to some new layer.
Then any dict presentation.
Then APIs, including those for machine learning.
Then calls to other models (prefer declarative SQL, like FK or
a VIEW that contains JOIN(s), in order to tell SqlAlchemy
and the RDBMS backend how those models relate to an article).
When you add new code or a new module, be sure to write
a docstring for it. That way you, and future maintenance
engineers, will know what is in- or out-of-scope,
and won't be tempted to def kitchen_sink()
in a place where it doesn't belong.
layers
Spread the code out among other models. ... move much of the code from Article into other related model.
Your second approach is the one I most closely agree with.
But it sounds like you contemplate pushing e.g. ES interactions
into some other SqlAlchemy model that currently happens to be "small".
Don't do that.
Organize your code in layers, and only allow dependencies
in one direction: down the abstraction hierarchy.
So "app" is most abstract and sits at the top,
above a "search" layer that interacts with ES,
and that's above a "persistence" layer that knows about table rows.
Possibly a machine_learning.py module would similarly be
in the middle, and would not interact with "search" at all.
The OP doesn't offer enough details so it's hard to say
exactly where the isolation boundaries would go.
Occasionally audit your import statements to
verify that code only depends on lower abstraction layers.
If you ever encounter a cyclic dependency, you will know
at once that you have done the Wrong Thing and should undo it.
Be sure to write automated
test suites
which exercise the newly created layers.
Keep running the tests while you're authoring new target code,
and you'll wind up with decent
coverage.
Pointing the SqlAlchemy connect string at
a throw-away sqlite RDBMS file offers a very
convenient way to produce small integration tests.
If, for example, your ES backend is not running
or otherwise unavailable, the other tests should
still offer a Green bar.
documentation
Write a ReadMe.md or Confluence wiki entry describing
- the old design
- pain points induced by the old design
- the new design (can be an evolving section!)
- architectural summary of the new design
The first three will be helpful to you and your colleagues
as you work on paying down some of the accumulated technical debt.
The fourth item will help future maintenance engineers figure out
if they are "doing it the Right Way" or "falling down that same
old rabbit hole again" when they add new features and fix bugs.