2

I have a problem at the moment. I want to modify some XML Values. For example I want to remvove the <![CDATA[" and the "]]> words from the values.

The strange thing is that it is working for title, price and image_link but not for url...

This is my code:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load('data/kinguin.xml');

$past = time();
echo '(Kinguin) - Starting to remove tags' . "\n";
deleteChildren($dom, 'id');
echo '(Kinguin) - id removed' . "\n";
deleteChildren($dom, 'description');
echo '(Kinguin) - description removed' . "\n";
deleteChildren($dom, 'google_product_category');
echo '(Kinguin) - google_product_category removed' . "\n";
deleteChildren($dom, 'brand');
echo '(Kinguin) - brand removed' . "\n";
deleteChildren($dom, 'mpn');
echo '(Kinguin) - mpn removed' . "\n";
deleteChildren($dom, 'condition');
echo '(Kinguin) - condition removed' . "\n";
deleteChildren($dom, 'product_type');
echo '(Kinguin) - product_type removed' . "\n";
deleteChildren($dom, 'availability');
echo '(Kinguin) - availability removed' . "\n";
deleteChildren($dom, 'quantity');
echo '(Kinguin) - quantity removed' . "\n";
deleteChildren($dom, 'identifier_exists');
echo '(Kinguin) - identifier_exists removed' . "\n";

removeCDATA($dom, 'title');
echo '(Kinguin) - title CDATA removed' . "\n";
removeCDATA($dom, 'price');
echo '(Kinguin) - price CDATA removed' . "\n";
removeCDATA($dom, 'image_link');
echo '(Kinguin) - image_link CDATA removed' . "\n";
removeCDATA($dom, 'url');
echo '(Kinguin) - url CDATA removed' . "\n";

$dom->saveXML();
$dom->save('data/kinguin.xml');

$xml = file_get_contents('data/kinguin.xml');
renameTags($xml, 'link', 'url', 'data/kinguin.xml');
echo '(Kinguin) - Renamed link' . "\n";

$now = time();
echo "(Kinguin) - Time needed: " . ($now - $past) . "s" . "\n";
echo "\n";

Functions:

function deleteChildren($dom, $children){
    $root = $dom->documentElement;
    $marker = $root->getElementsByTagName($children);
    for($i = $marker->length - 1; $i >= 0 ; $i--){
        $child = $marker->item($i);
        $marker->item($i)->parentNode->removeChild($child);
    }
}

function renameTags($xml, $old, $new, $path){
    $dom = new DOMDocument('1.0', 'utf-8');
    $dom->preserveWhiteSpace = false;
    $dom->formatOutput = true;
    $dom->loadXML($xml);

    $nodes = $dom->getElementsByTagName($old);
    $toRemove = array();
    foreach ($nodes as $node) {
        $newNode = $dom->createElement($new);
        foreach ($node->attributes as $attribute) {
            $newNode->setAttribute($attribute->name, $attribute->value);
        }

        foreach ($node->childNodes as $child) {
            $newNode->appendChild($node->removeChild($child));
        }

        $node->parentNode->appendChild($newNode);
        $toRemove[] = $node;
    }

    foreach ($toRemove as $node) {
        $node->parentNode->removeChild($node);
    }

    $dom->saveXML();
    $dom->save($path);
}
function removeCDATA($dom, $tagName){

    $root = $dom->documentElement;
    $marker = $root->getElementsByTagName($tagName);
    for($i = $marker->length - 1; $i >= 0 ; $i--){
        $rename = $marker->item($i)->textContent;
        $newValue = preg_replace('/(<!\[CDATA\[)/', '', $rename);
        $newValue = preg_replace('/(]]>)/', '', $newValue);
        $newValue = preg_replace('/( EUR)/', '', $newValue);
        //ey-Shop\Cronjob.php on line 350 PHP Warning:  preg_replace(): Delimiter must not be alphanumeric or backslash in 351

        $marker->item($i)->nodeValue = $newValue;
    }
}

This is the XML Output:

<?xml version="1.0" encoding="UTF-8"?>
<rss>
  <channel xmlns:g="http://base.google.com/ns/1.0" version="2.0">
    <title>google_EUR_english_1</title>
    <item>
      <title>Anno 2070 Uplay CD Key</title>
      <g:price>3.27</g:price>
      <g:image_link>http://cdn.kinguin.net/media/catalog/category/anno_8.jpg</g:image_link>
      <url><![CDATA[http://www.kinguin.net/category/4/anno-2070/?nosalesbooster=1&country_store=1&currency=EUR]]></url>
    </item>
    <item>
      <title>Anno 2070: Deep Ocean DLC Uplay CD Key</title>
      <g:price>4.75</g:price>
      <g:image_link>http://cdn.kinguin.net/media/catalog/category/anno-2070-deep-ocean-releasing-this-spring-1089268_1.jpg</g:image_link>
      <url><![CDATA[http://www.kinguin.net/category/5/anno-2070-deep-ocean-expansion-pack-dlc/?nosalesbooster=1&country_store=1&currency=EUR]]></url>
    </item>
    <item>

This is the error message:

Warning: removeCDATA(): unterminated entity reference  All Stars-Racing Transformed RU VPN in C:\Users\Jan\PhpstormProjects\censored\Cronjob.php on line 353
PHP Warning:  removeCDATA(): unterminated entity reference  SUV DLC Steam Gift in C:\Users\Jan\PhpstormProjects\censored\Cronjob.php on line 353

Line 353:

$marker->item($i)->nodeValue = $newValue;

Greetings and Thanks!

1
  • 1
    Why do you think you need to remove a CDATA section from an XML document? Any XML parser can handle it. And if you still think you need to do it then I think doing $marker->item($i)->textContent = $marker->item($i)->textContent; suffices as the textContent is a plain string anyway. Commented Jan 3, 2017 at 10:44

2 Answers 2

1

If you really think you need to remove any CDATA section(s) from an element node then simply do $foo->textContent = $foo->textContent, see http://sandbox.onlinephpfunctions.com/code/cca5093433218c7c134f120725988fe6808f906c which does

function removeCDATA($dom, $tagName){

    $marker = $dom->getElementsByTagName($tagName);
    for($i = $marker->length - 1; $i >= 0 ; $i--){
        $marker->item($i)->textContent = $marker->item($i)->textContent;
    }
}

   $xml = '<root><items><item><url><![CDATA[http://example.com/search?a=1&b=2&c=3]]></url></item><item><url><![CDATA[http://example.com/search?a=4&b=5&c=6]]></url></item></items></root>';

   $doc = new DOMDocument();
   $doc->loadXML($xml);

   removeCDATA($doc, 'url');

   echo $doc->saveXML();

and outputs

<root><items><item><url>http://example.com/search?a=1&amp;b=2&amp;c=3</url></item><item><url>http://example.com/search?a=4&amp;b=5&amp;c=6</url></item></items></root>
0

If you remove the CDATA section you end up with an element containing a naked & character, this is not legal as & can only exist on its own as its named entity escape (&amp;) or inside a CDATA section.

This is why the CDATA is there in the first place & should probably be left as is for the consuming parser to handle.

1
  • It's not a question of whether the link works, before the link can work you need a well-formed XML document, and if you edit your document to make it ill-formed there's no way of even extracting the link to see if it works or not. Commented Jan 3, 2017 at 13:11

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.