Producer-consumer scheme with MPI

Below is a simple implementation of the producer-consumer parallel strategy with MPI. It’s just a dummy example, and could probably be improved greatly, but it is a nice illustration of the producer-consumer model, as well as uses for MPI_ANY_SOURCE, MPI_ANY_TAG, and MPI_Status. #include <stdio.h> #include <mpi.h> #include <time.h> #include <stdlib.h> #include <math.h> // Producer-consumer scheme

CUDA: efficient parallel reduction

CUDA is a very powerful API which allows us to run highly parallel software on Nvidia GPUs. It is typically used to accelerate specific operations, called kernels, such as matrix multiplication, matrix decomposition, training neural networks et cetera. One such common operation is a reduction: adding up a long array of numbers. One simple implementation