0

I have an application that represents user-defined data in two views:

  • A hierarchical view (parent-child nodes)

  • A flat view (a table where all attributes are listed in a row)

In PostgreSQL it handles one of these views well, but not both. It can't handle well a set of 1 000 000 records. In the hierarchical view, PostgreSQL performs well — it's easy to retrieve all nodes at a certain level of the hierarchy without much overhead. However, generating the flat view (which requires joining fields across multiple levels) results in slow performance, especially with complex or deeply nested hierarchies.

A user might define a hierarchy of books like this:

Author
└── Year published
   └── Book category
      └── Book title

Example hierarchy:

Jake A.
├── 2002
│  └── Romance
│     ├── "The Day of Love"
│     ├── "Broken Heart"
│     └── "Unveiled Secret"
├── 2005
  └── Romance
     └── "The Last Hope"

In the flat view, this would be represented as:

Author    | Year | Category | Book Title
--------------------------------------------
Jake A.   | 2002 | Romance  | The Day of Love
Jake A.   | 2002 | Romance  | Broken Heart
Jake A.   | 2002 | Romance  | Unveiled Secret
Jake A.   | 2005 | Romance  | The Last Hope

Users should be able to sort and filter the flat data by any column.

In order to maintain performance, I consider creating two separate tables: one for the hierarchical view and another for the flat view, keeping them in sync is the best way. However, this would require data duplication, which might be suboptimal.

How do I maintain good performance representing user-defined data in both the hierarchical and the flat view?

4
  • 2
    Do not delete & repost questions, edit them per feedback. Especially do not subvert site protocols; the original version was closed when you deleted it. You are wasting the time of readers. Nb Poorly received question posts count more towards question posting rate limits when deleted. Commented Jun 14 at 7:41
  • This question is similar to: What are the options for storing hierarchical data in a relational database?. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Jun 14 at 7:45
  • Please read the edit help & advanced help to learn to format properly (using Markdown when possible) & reflect that in your posts. Commented Jun 14 at 7:47
  • Ask exactly 1 (clear specific researched non-duplciate) question per question post. Etc etc per feedback on your original post. Commented Jun 14 at 8:00

1 Answer 1

0

This is the table schema I propose without duplication:

  • authors(id, name)
  • categories(id, name)`
  • books(id, author_id, year, category_id, title)

The above assumes a book was written by a single author and is of a single category. If this is inaccurate for your example, then we can elaborate on the schema according to your actual scenario.

Then this is how you can write your query:

select authors.name, books.year, categories.name, books.title
from authors
join books
on authors.id = books.author_id
join categories
on books.category_id = categories.id
order by books.author_id, books.year, books.category_id

And of course you need an index on books(author_id, year, category_id)

to quicken it up. So, when you are to display it flatly, you loop the results and add a <tr> for the record and <td> for each field, or something like that if you are to display it hierarchically.

If you need to display them hierarchically, then while you loop your records, always store the previous author_id, year and category_id values so you know when you need to close the inner <ul>s.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.