-
Notifications
You must be signed in to change notification settings - Fork 37
Missing base64_decode on PDF-Keyword-Reading (AppleKeywords) #32
Copy link
Copy link
Open
Description
When having a PDF with Keywords added by apple there are some mistakes happening. Especially when non ASCI-Chars are involved. I added a Test.pdf to demonstrate this.
There are three "normal" keywords labled Test1, Test2, and Test3. A fourth keyword is a bit more complex. It contains a comma (!) and some german umlaute: Base64 encoded äöü and comma, foo bar.
This is the exiftool XML-Output of the Keywords and AppkeKeywords section:
<PDF:Keywords>
<rdf:Bag>
<rdf:li>Test1</rdf:li>
<rdf:li>Test2</rdf:li>
<rdf:li>Base64 encoded äöü and comma</rdf:li>
<rdf:li>foo bar</rdf:li>
<rdf:li>Test3</rdf:li>
</rdf:Bag>
</PDF:Keywords>
<PDF:AppleKeywords>
<rdf:Bag>
<rdf:li>Test1</rdf:li>
<rdf:li>Test2</rdf:li>
<rdf:li rdf:datatype='http://www.w3.org/2001/XMLSchema#base64Binary'>
/v8AQgBhAHMAZQA2ADQAIABlAG4AYwBvAGQAZQBkACAA5AD2APwAIABhAG4A
ZAAgAGMAbwBtAG0AYQAsACAAZgBvAG8AIABiAGEAcg==
</rdf:li>
<rdf:li>Test3</rdf:li>
</rdf:Bag>
</PDF:AppleKeywords>
As you can see there are two problems infolved:
- in the "PDF:Keywords" section the comma IN THE KEYWORD ITSELF is recognized and split to two separat Keywords. Well. That is a Exiftool problem and not part of this issue
- The "PDF:AppleKeywords is recognized correctly. But it base64 encodes the umlaute. Up to here everything is fine. The issue though is, that PHPExiftool does not decode the String and returns the ugly string.
Instead I would expect, that PHPExiftool recognizes the Attributerdf:datatype='http://www.w3.org/2001/XMLSchema#base64Binary'and automatically decodes the string.
If I have seen this right, this behaviour is already implemented for Mono-Types (see source). But I guess it needs to be also implemented for Multi-Types.
This is the Testfile, mentioned: Test.pdf
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels