Django - fulltext search with PostgreSQL and Elasticsearch

Question

I have a Django and Django REST Framework powered RESTful API (talking to a PostgreSQL DB backend) which supports filtering on a specific model.

Now I want to add a fulltext search functionality.

Is it be possible to use Elasticsearch for fulltext search and then apply my existing API filters on top of these search results?

Paolo Melchiorre · Accepted Answer · 2018-06-20 15:23:59Z

I would suggest you consider using PostgreSQL only to do what you asked for.

In my opinion it is the best solution because you will have the data and the search indexes directly inside PostgreSQL and you will not be forced to install and maintain additional software (such as Elasticsearch) and keep the data and indexes in sync.

This is the simplest code example you can have to perform a full-text search in Django with PostgreSQL:

Entry.objects.filter(body_text__search='Cheese')

For all the basic documentation on using the full-text search in Django with PostgreSQL you can use the official documentation: "Full text search"

If you want to deepen further you can read an article that I wrote on the subject:

"Full-Text Search in Django with PostgreSQL"

Ali · Accepted Answer · 2018-06-22 13:22:48Z

0

Your question is too broad to be answered with code, but it's definitely possible.

You can easily search your elasticsearch for rows matching your full-text criteria.

Then get those rows' PK fields (or any other candidate key, used to uniquely identify rows in your PostgreSQL dB), and filter your django ORM-backed models for PKs matching those you found from your Elasticsearch.

Pseudocode would be:

def get_chunk(l, length):
    for i in xrange(0, len(l), length):
        yield l[i:i + length]

res = es.search(index="index", body={"query": {"match": ...}})

pks = []
for hit in res['hits']:
    pks.append(hit['pk'])

for chunk_10k in get_chunk(pks, 10000):
    DjangoModel.objects.filter(pk__in=chunk_10k, **the_rest_of_your_api_filters)

EDIT
To resolve a case in which lots and lots of PKs might be found with your elastic query, you can define a generator that yields successive 10K rows of the results, so you won't step over your DB query limit and to ensure best update performance. I've defined it above with a function called get_chunk.

Something like that would work for alternatives like redis, mongodb, etc ...

edited Jun 22, 2018 at 13:22

answered Jun 20, 2018 at 10:03

Ali

4,5361 gold badge31 silver badges47 bronze badges

5 Comments

Dušan Maďar Over a year ago

Yeah, this can be a solution. Potentially problematic, though. For example if the search returns thousands of results stackoverflow.com/questions/1009706/….

Paolo Melchiorre Over a year ago

@dm295 the first of a long series of problematic issue with Elasticsearch.

Ali Over a year ago

@dm295 of course. You should have a means of chunking requests of more than n-K pks in your postgresql DB. I had the same requirement for MongoDB and found that __in queries with a chunk of 10K~50K yield-ed the best performance. Also MongoDB had it's own limitation of 16Mbs for each query, so in many cases, doing a vanilla __in would have resulted in failure. It's easy to do and I'll do an edit just in case.

Ali Over a year ago

@DušanMaďar I stumbled across this question today and figured you didn't have any feedback on it. Did you find a better solution? It'd be perfect if you could share any solution you might have ended up implementing better than this (without like ditching postgresql or ...) Thanks

Dušan Maďar Over a year ago

@SpiXel basically we went with ES only search and had to rewrite filters, etc. to work with ES.

Collectives™ on Stack Overflow

Django - fulltext search with PostgreSQL and Elasticsearch

2 Answers 2

Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Linked

Related