Skip navigation

Monthly Archives: October 2009

Especially for those mathematicians living in Warsaw 🙂

Sure, you could get a lot of stuff signed by celebrities and put it on ebay, but THIS goes on your resume!

So, here is a first-program in MPI which i did about 6 months ago in Prof. Calvin Lin’s Parallel systems class.

MPI is a communication protocol for multiprocessors. The way it works, in short, is that there is a root process (kind of like the root thread in a multi threaded application). The difference in MPI is all the processes exist simultaneously (just like physical processors would already exist instead of being spawned). The root process would divide up the work to do, and send it to all the other processors via messages, hence message passing.

Each process can send messages and receive messages, but there has to be more control and discipline in message passing and sending. Its more trickier to get the message passing in the right order and structure.  But, hands-on, here is a basic example.

This is what a basic application should be structured like. The main would initialize MPI (5-9), and then get the total number of processes participating in the computation as well as the current process number. This is where it differs from multi-threading because unlike in multithreading where you spawn threads and tell what function they would execute, all MPI processes have to execute the same code. So you need the process ID to “divide” the work. In this case, line 15 does the initialization and division of work and line 17 would do the actual work. So if you were to sum an array, the initialization would read in the array from file, and the line 17 would invoke the worker thread which would sum the data in its allocated data. The initialization can be done by any process, here i have used MY_ROOT_PROCESS to represent the root process.

int main(int argc, char** argv)
{
    int mynode, totalnodes;
    int mpiResCode = MPI_Init(&argc, &argv);
	if(mpiResCode!=MPI_SUCCESS)
	{
		printf("\nFailed to initialize MPI.");
		MPI_Abort(MPI_COMM_WORLD,mpiResCode);
	}
    MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
    MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
	for(int i=0;i<100;i++)
	{
		if(mynode==MY_ROOT_PROCESS)
			doInitialize();
		else
			doWork();
	}
    MPI_Finalize();
    return 0;
}

The trick is getting the messages right. For instance, in the worker process which is waiting for work to be allocated to it, you would have something like follows;

MPI_Recv(&localDataSize,1,MPI_INT,MY_ROOT_PROCESS,MY_SIGNAL_CONTROL,MPI_COMM_WORLD,&status);
localbuff = new int[localDataSize];
MPI_Recv(localbuff,localDataSize,MPI_INT,MY_ROOT_PROCESS,MY_SIGNAL_CONTROL,MPI_COMM_WORLD,&status);
   ...
   ...
   ...
MPI_Send(&numVals,1,MPI_INT,i+1,MY_SIGNAL_CONTROL,MPI_COMM_WORLD);
MPI_Send(globalData+startIndex,numVals,MPI_INT,i+1,MY_SIGNAL_CONTROL,MPI_COMM_WORLD);
		

and in the root process, which is responsible for sending out work, you would have a similar sending commands. Notice that the sequence must match, since the message passing is stateless, and maintains no information between communications.
The first three lines are in the recipient of the work. It would, in this example, first receive a size of the work it should expect, it then allocates a buffer for that size and then it receives the work. In the sender (root process in our example), the sender first sends the size, and then all the data. After the data transfer has taken place, it can work on the data. But once the data processing has been completed, similar transfer has to take place for the results to be combined. This would be easier to do naively, but if you were to use a tree-structured reduce operation, this becomes just a bit trickier.