| Version 7 (modified by , 16 years ago) ( diff ) |
|---|
Batch Systems on MacOS X
Torque
Torque is an attractive batch system to use because this would capitalize on the existing LAL experience with using torque to manage the grid resources. It would also allow integration directly into the existing Computing Elements (CEs) at LAL.
The latest version of torque (2.4.6) compiles on MacOS X without too many problems. The code uses the
deprecated stat64 structure, so the flag --disable-gcc-warnings must be used. The configure
command used to compile torque was:
./configure --disable-gcc-warnings --disable-gui --prefix=/usr/local/torque --with-server-home=/var/spool/pbs --disable-drmaa
Unfortunately when trying to actually run the server, the log file fills with the following error message:
03/15/2010 08:23:24;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::PBS_Server, wait_request failed
and the server is not functional. The error comes from a low-level routine within torque that waits for data on a socket, then processes
that data with a call back function. The function is wait_request defined in net_server.c. Unfortunately, the actual error code
is not returned, so without debugging it isn't possible to know exactly what has gone wrong.
Sun Grid Engine
Slurm
This is a batch system used at many supercomputing centers and is noted for its simplicity and scalability. In the grid context, torque wrappers exist for the slurm command line utilities that would make integration into the grid straight-forward.
Although MacOS X is listed on the site as one of the supported platforms. It uses a non-posix call (GNU extension)
within the C library that does not exist on MacOS X: getgrent_r(). This call is in the file partition_mgr.c.
This is intended to be a reentrant (thread-safe) version of getgrent() that allows looping over the defined groups.
Hence the code does not compile on MacOS X.
The manpages for similar C functions are explicitly marked as being thread-safe. However, this particular function is not so marked and testing would have to be done to determine whether it is indeed thread-safe. Rewriting the code would be fairly easy to do if getgrent() on MacOS X is indeed thread-safe.
This problem occurs with both the production release of slurm (2.1.4) and the development release (2.2.0-0.pre2).
