.com
files for the sub-searches
MCMM
- Multi minimum conformational searching.
MULT
- Multi-conformer minimization.
FEAV
, FESA
- Free-energy perturbation.
When attempting to use the distributed BatchMin facility for the first time, users should first make certain that simple test jobs can be started up on the remote machine using
rbm
. To make certain of this, the following must be done:Communication between remote hosts uses the "rbm" mechanism also used by MacroModel to initiate BatchMin jobs on remote hosts. This mechanism is described in the MacroModel User Manual.
rhost
mechanism. The most common way of doing this is by means of .rhost
files in the user's home directories.
.com
files must use relative, not absolute, pathnames for the names of the input and output files.
NPRC
- to the command file, and by creating a single additional setup file giving the names of the hosts over which the job is to be distributed. In addition, depending on how UNIX is configured at a given site, the name of the system on which the master process is running may have to appear in the user's .rhosts file on the slave machines. The entire procedure is best understood by means of an example. The .com
file shown here distributes a 5000-step MCMM
search of cyclodecane over five hosts.
With the exception of the
NPRC
command, this file is identical to the .com
file which would be used to perform the same task on a single host.In addition to the
.com
file, a file called dhostfiles.dat is needed. This file contains a list of the hosts which will be used for the slave processes. Since the NPRC
command specifies distribution over five hosts, BatchMin will use the first five hosts named in the file. A dhostfiles.dat file is normally created by the user in the directory from which the job to be run. If BatchMin does not find such a file there, it looks in the directory given by the environment variable BATCH_ROOT
; however, in our experience a local file is usually preferable. A machine having N processors is listed in the file N times, allowing the program to create up to N sub-processes on it. A sample dhostfiles.dat is shown here:
Here we assume that host1 is a four-processor machine, so it is listed four times in the file; in this run, five processors are requested, but only two processes will be assigned to host1 at any one time, because of the sequence in which the hosts are listed in the file.
The second token in each line of the dhostfiles.dat file specifies which executable is to be used on the remote machine. The remote server searches for an executable program bearing this name first in $MMOD_ROOT/run/exec and then, if not found, in
$MMOD_ROOT/run/mmdat
. (These directory names are actually taken from the inetd.conf
entry for the bminrd
process.) If the second token is not present then bmin.auto
will be used by default. In the sample file shown above, the first two entries would be found in the .../run/mmdat
directory and the remaining entries would be found in .../run/exec
. As demonstrated by the first two entries, the bmin.auto
family of scripts may be specified. As discussed in Chapter 3, Running BatchMin, these select a BatchMin of appropriate size and/or optimization level for the molecule being simulated and the platform in use.The third token specifies the user-id on the remote machine under whose account the job is to be run. This token, like the second, is optional. If absent, the user-id which launched the master process is assumed. Of course, appropriate rhosts permissions must be in effect to use either the default or a non-default user name.
If fewer hosts are listed in the file than the number requested in the
NPRC
command, then a warning will be issued in the log file and the job will be distributed over the number of hosts listed. In most cases the name of the host running the master process can also be included in the file dhostfiles.dat. This will cause a slave BatchMin job to be run on that host, along with as the master; however, the master will be largely dormant, since little CPU time is required by BatchMin to control the sub-searches and collate the results.The second argument of the
NPRC
command indicates that the MCMM
job is to be broken up into sub-searches of 100 steps each. The value of this parameter is important for achieving maximum efficiency in the distributed searching procedure; see the discussion of efficiency below. For a multi-conformer minimization, rather than a conformational search, this argument represents the number of structures to be minimized by each slave process.The third argument in the
NPRC
file indicates that the progress of the sub-searches will be monitored every 60 seconds. This is suitable or most jobs, but could be increased if the job is very large, in order to avoid the overhead associated with monitoring.Note the use of a
SEED
op-code in the command file. The value given here will be used for generating unique seeds for the sub-searches' random number generator.
NPRC
command is set to a non-zero value, then before initiating the search BatchMin will perform an energy calculation of the first structure in the .dat file on each host. If any significant discrepancy between the hosts is detected then the job will be terminated and a warning message printed in the log file. We strongly suggest that this facility be turned on before beginning any project with distributed procedures. If a force-field or solvent file is present in the directory from which the distributed process is run then this will be copied to the working directory on all the remote hosts and used in each sub-search. This is meant to ensure consistency on all hosts, in the event that the user is providing a modified force field or solvent parameterization; recall that local .fld and .slv files override the default files in the BATCH_ROOT directory.
While the distributed job is running, temporary files will be created. These have the form filename_@n.
sfx
, where .sfx
is a filename suffix from the set {com,dat,log,out} and n is an integer which represents one of the subprocesses. These files will be removed by the master process once the job has successfully completed, unless
DEBG
940 has been specified.
Reducing the size of the subprocess too much, however, can reduce the efficiency of the overall procedure. There are two reasons for this. First, there is a fixed overhead associated with each sub-search, resulting from reading the force-field and assigning the parameters for each sub-search. Second, it is usual to employ a "usage-directed" strategy in the
MCMM
search algorithm. In order to reduce the need for inter-process communication, each sub-process builds up usage information only from the structures it itself has generated. If each sub-process generates only a very small number of structures, insufficient information is created within a sub-process to provide meaningful usage direction.These factors are illustrated in the following results for the 10000 step search of cycloheptadecane: First, the search distributed over 5 workstations achieved a speedup of 4.5 over that of a single processor. This represents 90% efficiency in the distribution process. Second, reducing the size of the sub-searches to 200
MCMM
steps slightly reduced the efficiency of the search both in terms of structures searched per minute and in terms of the total number of unique structures found. We recommend that each sub-search should constitute at least 5% of the total search. Finally the comparison with the Micro-Vax and Cray computers show that using the distributed procedure on even a modest number of workstations gives a two-order-of-magnitude speed-up over the "standard" single-processor platforms of a few years,and in fact gives performance in the supercomputer range.
Installation of Distributed BatchMin
Please refer to the MacroModel User Manual, Appendix 5.