4

I have a Django and Django REST Framework powered RESTful API (talking to a PostgreSQL DB backend) which supports filtering on a specific model.

Now I want to add a fulltext search functionality.

Is it be possible to use Elasticsearch for fulltext search and then apply my existing API filters on top of these search results?

2 Answers 2

4

I would suggest you consider using PostgreSQL only to do what you asked for.

In my opinion it is the best solution because you will have the data and the search indexes directly inside PostgreSQL and you will not be forced to install and maintain additional software (such as Elasticsearch) and keep the data and indexes in sync.

This is the simplest code example you can have to perform a full-text search in Django with PostgreSQL:

Entry.objects.filter(body_text__search='Cheese')

For all the basic documentation on using the full-text search in Django with PostgreSQL you can use the official documentation: "Full text search"

If you want to deepen further you can read an article that I wrote on the subject:

"Full-Text Search in Django with PostgreSQL"

Sign up to request clarification or add additional context in comments.

Comments

0

Your question is too broad to be answered with code, but it's definitely possible.

You can easily search your elasticsearch for rows matching your full-text criteria.

Then get those rows' PK fields (or any other candidate key, used to uniquely identify rows in your PostgreSQL dB), and filter your django ORM-backed models for PKs matching those you found from your Elasticsearch.

Pseudocode would be:

def get_chunk(l, length):
    for i in xrange(0, len(l), length):
        yield l[i:i + length]

res = es.search(index="index", body={"query": {"match": ...}})

pks = []
for hit in res['hits']:
    pks.append(hit['pk'])

for chunk_10k in get_chunk(pks, 10000):
    DjangoModel.objects.filter(pk__in=chunk_10k, **the_rest_of_your_api_filters)

EDIT
To resolve a case in which lots and lots of PKs might be found with your elastic query, you can define a generator that yields successive 10K rows of the results, so you won't step over your DB query limit and to ensure best update performance. I've defined it above with a function called get_chunk.

Something like that would work for alternatives like redis, mongodb, etc ...

5 Comments

Yeah, this can be a solution. Potentially problematic, though. For example if the search returns thousands of results stackoverflow.com/questions/1009706/….
@dm295 the first of a long series of problematic issue with Elasticsearch.
@dm295 of course. You should have a means of chunking requests of more than n-K pks in your postgresql DB. I had the same requirement for MongoDB and found that __in queries with a chunk of 10K~50K yield-ed the best performance. Also MongoDB had it's own limitation of 16Mbs for each query, so in many cases, doing a vanilla __in would have resulted in failure. It's easy to do and I'll do an edit just in case.
@DušanMaďar I stumbled across this question today and figured you didn't have any feedback on it. Did you find a better solution? It'd be perfect if you could share any solution you might have ended up implementing better than this (without like ditching postgresql or ...) Thanks
@SpiXel basically we went with ES only search and had to rewrite filters, etc. to work with ES.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.