QuickMP API


The QuickMP API is composed of the following C++ macros. Code samples are also given here to demonstrate basic usage.

Parallel Loop Definition

QMP_PARALLEL_FOR(indexName, start, numIter[, schedule]): Defines the beginning of a parallel for loop. The arguments are the name of the integer index variable (accessible within the loop), the starting value of the index, the number of iterations to perform, and (optionally) the schedule hint. The index counts up from the starting value. The valid schedule hints are: quickmp::SEQUENTIAL (default, better for equal-duration loop iterations; similar to OpenMP "static" schedule with default (equal) chunk size) and quickmp::INTERLEAVED (better for non-equal-duration loop iterations; similar to OpenMP "static" schedule with chunk size 1).

QMP_END_PARALLEL_FOR: Defines the end of a parallel for loop.

QMP_PARALLEL_FOR(i, 0, 10000)
  processData(i);
QMP_END_PARALLEL_FOR

Thread Pool-Related

QMP_SET_NUM_THREADS(numThreads): Specifies the number of threads to use in subsequent parallel for loops. This is optional; without calling this, the system will use one thread per processor. If used, this must be called outside any parallel for loops. This can be called any number of times. This destroys and creates the internal thread pool, which might take time, so use it sparingly.

unsigned int numProcs = QMP_GET_NUM_PROCS();
QMP_SET_NUM_THREADS(numProcs / 2);

QMP_GET_NUM_THREADS(): Returns the number of threads currently being used. In sequential code sections this returns 1; in parallel for loops this returns the total number of threads allocated for use in parallel for loops.

unsigned int numThreadsSerial = QMP_GET_NUM_THREADS();
QMP_PARALLEL_FOR(i, 0, 10000)
  unsigned int numThreadsParallel = QMP_GET_NUM_THREADS();
  processData(i);
QMP_END_PARALLEL_FOR

QMP_GET_MAX_THREADS(): Returns the total number of threads allocated for use in all parallel for loops.

unsigned int threadPoolSize = QMP_GET_MAX_THREADS();

QMP_THREAD_NUM(): The zero-based index of the current thread. This is only valid within a parallel for loop code section. Note: this is not a function call like most other macros (i.e. don't use () at the end).

QMP_PARALLEL_FOR(i, 0, 10000)
  unsigned int threadNum = QMP_THREAD_NUM;
  QMP_CRITICAL(0);
  std::cout << threadNum << std::endl;
  QMP_END_CRITICAL(0);
QMP_END_PARALLEL_FOR

Miscellaneous

QMP_GET_NUM_PROCS(): Returns the number of processors in the current machine at runtime.

unsigned int numProcs = QMP_GET_NUM_PROCS();
QMP_SET_NUM_THREADS(numProcs / 2);

QMP_IN_PARALLEL(): Returns true if called within a parallel for loop and false otherwise.

void foo()
{
  if (QMP_IN_PARALLEL())
  {
    QMP_CRITICAL(0);
    std::cout << "parallel foo()" << std::endl;
    QMP_END_CRITICAL(0);
  }
  else
  {
    std::cout << "serial foo()" << std::endl;
  }
}
...
foo();
QMP_PARALLEL_FOR(i, 0, 10000)
  foo();
QMP_END_PARALLEL_FOR

Shared Data and Synchronization

QMP_CRITICAL(id): Defines the beginning of a critical section used for synchronization. This is necessary to protect shared variables which are read and written by multiple threads. The given id should be unique for each critical section within a parallel for loop. Keep the ids low to avoid allocating too many internal critical sections. Be very careful to use matching ids for the begin and end.

QMP_END_CRITICAL(id): Defines the end of a critical section used for synchronization. The given id must match the id given at the beginning of the critical section. Keep the ids low to avoid allocating too many internal critical sections. Be very careful to use matching ids for the begin and end.

int sum = 0;
QMP_SHARE(sum);
QMP_PARALLEL_FOR(i, 0, 10000)
  QMP_USE_SHARED(sum, int);
  QMP_CRITICAL(0);
  ++sum;
  QMP_END_CRITICAL(0);
QMP_END_PARALLEL_FOR

QMP_BARRIER(): Defines a barrier routine used to synchronize threads. Each thread blocks at the barrier until all threads have reached it. This can be expensive and can often be avoided by splitting one parallel for loop into two.

QMP_PARALLEL_FOR(i, 0, 10000)
  processDataStep1(i);
  QMP_BARRIER();
  processDataStep2(i);
QMP_END_PARALLEL_FOR

QMP_SHARE(variableName): Exposes the given variable to any parallel for loops later in the same scope. The arguments are the variable's type and name. This must be called outside the loop. The variable must remain valid as long as they are being accessed by any loops. The variable name used here must match the one given to QMP_USE_SHARED. Data members of objects must be given alone; instead of QMP_SHARE(myObject.x), make a reference int& x = myObject.x and then QMP_SHARE(x). Statically-allocated arrays must be given as pointers; for example, int myData[50] requires a pointer int* myDataPtr = myData, then QMP_SHARE(myDataPtr), not QMP_SHARE(myData).

QMP_USE_SHARED(variableName, variableType): This provides access to the given variable within the parallel for loop, which must have been exposed before the beginning of the loop. This must be called within the loop. The variable name used here must match the one given to QMP_SHARE. Statically-allocated arrays must be given as pointers; for example, int myData[50] requires a pointer int* myDataPtr = myData exposed via QMP_SHARE(myDataPtr) then accessed via QMP_USE_SHARED(int*, myDataPtr).

int sharedValue = 8;
QMP_SHARE(sharedValue);
QMP_PARALLEL_FOR(i, 0, 10000)
  QMP_USE_SHARED(sharedValue, int);
  foo(sharedValue);
QMP_END_PARALLEL_FOR