Skip to content

Missing base64_decode on PDF-Keyword-Reading (AppleKeywords) #32

@stevenbuehner

Description

@stevenbuehner

When having a PDF with Keywords added by apple there are some mistakes happening. Especially when non ASCI-Chars are involved. I added a Test.pdf to demonstrate this.

There are three "normal" keywords labled Test1, Test2, and Test3. A fourth keyword is a bit more complex. It contains a comma (!) and some german umlaute: Base64 encoded äöü and comma, foo bar.

This is the exiftool XML-Output of the Keywords and AppkeKeywords section:

<PDF:Keywords>
  <rdf:Bag>
   <rdf:li>Test1</rdf:li>
   <rdf:li>Test2</rdf:li>
   <rdf:li>Base64 encoded äöü and comma</rdf:li>
   <rdf:li>foo bar</rdf:li>
   <rdf:li>Test3</rdf:li>
  </rdf:Bag>
 </PDF:Keywords>
 <PDF:AppleKeywords>
  <rdf:Bag>
   <rdf:li>Test1</rdf:li>
   <rdf:li>Test2</rdf:li>
   <rdf:li rdf:datatype='http://www.w3.org/2001/XMLSchema#base64Binary'>
/v8AQgBhAHMAZQA2ADQAIABlAG4AYwBvAGQAZQBkACAA5AD2APwAIABhAG4A
ZAAgAGMAbwBtAG0AYQAsACAAZgBvAG8AIABiAGEAcg==
</rdf:li>
   <rdf:li>Test3</rdf:li>
  </rdf:Bag>
 </PDF:AppleKeywords>

As you can see there are two problems infolved:

  1. in the "PDF:Keywords" section the comma IN THE KEYWORD ITSELF is recognized and split to two separat Keywords. Well. That is a Exiftool problem and not part of this issue
  2. The "PDF:AppleKeywords is recognized correctly. But it base64 encodes the umlaute. Up to here everything is fine. The issue though is, that PHPExiftool does not decode the String and returns the ugly string.
    Instead I would expect, that PHPExiftool recognizes the Attribute rdf:datatype='http://www.w3.org/2001/XMLSchema#base64Binary' and automatically decodes the string.

If I have seen this right, this behaviour is already implemented for Mono-Types (see source). But I guess it needs to be also implemented for Multi-Types.

This is the Testfile, mentioned: Test.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions