I have an application that represents user-defined data in two views:
A hierarchical view (parent-child nodes)
A flat view (a table where all attributes are listed in a row)
In PostgreSQL it handles one of these views well, but not both. It can't handle well a set of 1 000 000 records. In the hierarchical view, PostgreSQL performs well — it's easy to retrieve all nodes at a certain level of the hierarchy without much overhead. However, generating the flat view (which requires joining fields across multiple levels) results in slow performance, especially with complex or deeply nested hierarchies.
A user might define a hierarchy of books like this:
Author
└── Year published
└── Book category
└── Book title
Example hierarchy:
Jake A.
├── 2002
│ └── Romance
│ ├── "The Day of Love"
│ ├── "Broken Heart"
│ └── "Unveiled Secret"
├── 2005
└── Romance
└── "The Last Hope"
In the flat view, this would be represented as:
Author | Year | Category | Book Title
--------------------------------------------
Jake A. | 2002 | Romance | The Day of Love
Jake A. | 2002 | Romance | Broken Heart
Jake A. | 2002 | Romance | Unveiled Secret
Jake A. | 2005 | Romance | The Last Hope
Users should be able to sort and filter the flat data by any column.
In order to maintain performance, I consider creating two separate tables: one for the hierarchical view and another for the flat view, keeping them in sync is the best way. However, this would require data duplication, which might be suboptimal.
How do I maintain good performance representing user-defined data in both the hierarchical and the flat view?