Simple Parallel Processing with PHP and proc_open()
You can find a reasonably sensible example of how to perform parallel processing with PHP using the popen()
function at wellho.net. Sometimes though, the requirement to pass all the parameters to your function via command line arguments can be limiting. For example, maybe you are processing a lot of HTML pages, or large associative arrays. In these cases, you can get slightly more flexibility by using the proc_open()
command and using Unix pipes to pass data.
With both techniques, you need an array of items you want to process in parallel, and a separate file that can process the item for you. Let’s say for example, you want to read a bunch of files, then append some complicated data to them. Your PHP class without parallel processing might look something like the following:
<?php
/**
* File Appender.
*/
class FileAppender
{
/**
* Files for appending.
*
* @var array
*/
public $fileList
= array(
'textfile-number-1.txt',
'textfile-number-2.txt',
'../other-data/a-different-text-file.txt',
);
/**
* Data to append.
*
* @var array
*/
public $forAppending
= array(
'Some long and complicated data…',
'Strange and wonderful things like snowmen ☃ and clouds ☁',
'Chess pieces: ♚♛♜♝♞♟♔♕♖♗♘♙',
);
/**
* Append all the data.
*
* @return array
*/
public function appendAll()
{
$files = $this->fileList;
$strings = $this->forAppending;
$output = array();
foreach ($files as $i => $file) {
$output[] = self::appendData($file, $strings[$i]);
}
return $output;
}//end appendAll()
/**
* Append data.
*
* @param string $filename The file to append data to.
* @param string $data The data to append.
*
* @return string The contents of $filename with $data appended.
*/
public static function appendData($filename, $data)
{
$contents = file_get_contents($filename);
$contents .= $data;
return $contents;
}//end appendData()
}//end class
This is very simple, but can be rather slow. To run in parallel, we first create a separate PHP file to do the processing:
<?php
/**
* Append data to a file.
*
* @file append_to_file.php
*/
// Load our File Appender Class.
require_once 'FileAppender.class.php';
// We expect the filename to be passed as the first command line argument.
$filename = $argv[1];
// We read the data to append from STDIN.
$data = file_get_contents('php://stdin');
// Push the result back out to STDOUT.
echo FileAppender::appendData($filename, $data);
We now change our append all function to run in parallel:
/**
* Append all in parallel.
*
* @return array
*/
public function appendAllParallel()
{
$files = $this->fileList;
$strings = $this->forAppending;
// Descriptor specification. This sets up our Unix pipes so that PHP can
// pass data to the function as if it were writing to a file. It gets
// data back the same way, by reading from the file. The final entry tells
// PHP to pass any errors straight through to STDERR.
$descriptorSpec = array(
0 => array('pipe', 'r'),
1 => array('pipe', 'w'),
2 => array('file', 'php://stderr', 'a')
);
// Kick of the parallel processing.
$handles = array();
foreach ($files as $i => $file) {
// Create the command to run, being careful to use escapeshellarg().
$cmd = 'php append_to_file.php '.escapeshellarg($file);
// Run the process. This will modify $pipes so that it contains file
// handles as specified by $descriptorSpec.
$procHandle = proc_open($cmd, $descriptorSpec, $pipes);
// We will just assume that $procHandle was created OK. Really, you
// should check that proc_open() does not return false.
$readFileHandle = $pipes[0];
$writeFileHanle = $pipes[1];
// Keep track of those file and process handles for later.
$handles[] = array(
'process' => $procHandle,
'file' => $readFileHandle,
);
// Pass the data to the process so it can read it from STDIN.
fwrite($readFileHandle, $strings[$i]);
// Close the file handle because we don't need it any more.
fclose($readFileHandle);
}//end foreach
// We've kicked all the processes off. Now we need to get the data back.
$output = array();
foreach ($handles as $handleData) {
// Read the data back from our process and close the file handle.
$output[] = fgets($handes['file']);
fclose($handes['file']);
// Close the process handle.
pclose($handles['process']);
}
return $output;
}//end appendAllParallel()
As you can see, this makes the code a lot more complicated. It is more work to make things run in parallel, but sometimes to make things fast, that’s what you have to do.