I need to parse an XML that is big. f.ex 100mb (it can be even more).
For Example:
Xml looks like this:
<notes>
<note>
<id>cdsds32da435-wufdhah</id>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
x 1000000 different notes(or even more)
</notes>
Each note has un unique ID. When I Parse an XML, I need to first find if note by specific ID exists in DB if no than INSERT it.
The problem is in Performance(it takes 2 hours). I try to take all ids from the DB (but is also big) with one SELECT, so I dont ask DB each time and I have them in PHP Array (Memory).
$sql = "SELECT id FROM 'notes'";
...
$ids = Array with all ids
I 've also parsed an XML with xml_parser in a loop:
while($data = fread($Xml, '512')) {
xml_parse($xmlParser, $data);
}
I think that parse an XML with simple_xml_parser may generate a too big variable for PHP to handle it.
And than when I have a note ID I check if it exists in $ids:
if (!array_search($note->id, $ids)) {
//than insert it
}
But it takes too long. So I found that PHP comes with special Arrays called Juddy Arrays http://php.net/manual/en/book.judy.php but I don't know exactly if they are for this - I mean for quick parse BIG Arrays.
I think also with Memcached, to store the ids from DB in many variables, but I want to find a proper solution.
In DB table there are also indexes, to speed up the process. The XML grows every week :) and it conatins every time all notes from the last XML plus new notes.
QUESTION?
How to fast parse BIG ARRAYS in PHP? Are Judy Arrays for this?
No comments:
Post a Comment