Wednesday, April 11, 2012

Parsing BIG XML in PHP

I need to parse an XML that is big. f.ex 100mb (it can be even more).

For Example:
Xml looks like this:

<body>Don't forget me this weekend!</body>

x 1000000 different notes(or even more)


Each note has un unique ID. When I Parse an XML, I need to first find if note by specific ID exists in DB if no than INSERT it.

The problem is in Performance(it takes 2 hours). I try to take all ids from the DB (but is also big) with one SELECT, so I dont ask DB each time and I have them in PHP Array (Memory).

$sql = "SELECT id FROM 'notes'";
$ids = Array with all ids

I 've also parsed an XML with xml_parser in a loop:

while($data = fread($Xml, '512')) {
xml_parse($xmlParser, $data);

I think that parse an XML with simple_xml_parser may generate a too big variable for PHP to handle it.

And than when I have a note ID I check if it exists in $ids:

if (!array_search($note->id, $ids)) {
//than insert it

But it takes too long. So I found that PHP comes with special Arrays called Juddy Arrays but I don't know exactly if they are for this - I mean for quick parse BIG Arrays.

I think also with Memcached, to store the ids from DB in many variables, but I want to find a proper solution.

In DB table there are also indexes, to speed up the process. The XML grows every week :) and it conatins every time all notes from the last XML plus new notes.

How to fast parse BIG ARRAYS in PHP? Are Judy Arrays for this?

No comments:

Post a Comment