Skip to content

Conversation

@alkis
Copy link
Contributor

@alkis alkis commented May 29, 2024

  1. Make ColumnMetaData.type optional
  2. Make ColumnMetaData.path_in_schema optional
  3. Add ColumnMetaData.schema_index. This is the ordinal in FileMetaData.schema this column corresponds to. This allows sparse representation of columns in a rowgroup.
  4. Deprecate ColumnMetaData.encoding_stats and replace with ColumnMetaData.is_fully_dict_encoded.

ref Parquet Metadata evolution

Jira

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does
1. Make `ColumnMetaData.type` optional
2. Make `ColumnMetaData.path_in_schema` optional
3. Add `ColumnMetaData.schema_index`. This is the ordinal in `FileMetaData.schema` this column corresponds to. This allows sparse representation of columns in a rowgroup.
@alkis alkis force-pushed the t2-metadata-improvements branch from 9f5b94e to f0c75b9 Compare May 30, 2024 10:24
* This implies that ColumnMetaData can be sparse in a rowgroup, if for example
* a column does not have any data pages in a rowgroup.
*/
17: optional i32 schema_index;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I accidentally discovered https://issues.apache.org/jira/browse/PARQUET-183 which can be fixed with this field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants