Running libEnsemble¶
Note
You do not need the mpi communication mode to use the
MPI Executor. The communication modes described
here only refer to how the libEnsemble manager and workers communicate.
Uses Python’s built-in multiprocessing module.
The comms type local and number of workers nworkers for running simulators
may be provided in libE_specs.
Run:
python myscript.py
Or, if the script uses the parse_args function
or an Ensemble object with Ensemble(parse_args=True),
this can be specified on the command line:
python myscript.py -n N
libEnsemble will run on one node in this scenario. To
disallow this node
from app-launches (if running libEnsemble on a compute node),
set libE_specs["dedicated_mode"] = True.
This mode can also be used to run on a launch node of a three-tier
system (e.g., Summit), ensuring the whole compute-node allocation is available for
launching apps. Make sure there are no imports of mpi4py in your Python scripts.
Note that on macOS and Windows, the default multiprocessing method is "spawn"
instead of "fork"; to resolve many related issues, we recommend placing
calling script code in an if __name__ == "__main__": block.
Limitations of local mode
Workers cannot be distributed across nodes.
In some scenarios, any import of
mpi4pywill cause this to break.Does not have the potential scaling of MPI mode, but is sufficient for most users.
This option uses mpi4py for the Manager/Worker communication. It is used automatically if you run your libEnsemble calling script with an MPI runner such as:
mpirun -np N python myscript.py
where N is the number of processes. This will launch one manager and
N-1 simulator workers.
This option requires mpi4py to be installed to interface with the MPI on your system.
It works on a standalone system, and with both
central and distributed modes of running libEnsemble on
multi-node systems.
It also potentially scales the best when running with many workers on HPC systems.
Limitations of MPI mode
If launching MPI applications from workers, then MPI is nested. This is not supported with Open MPI. This can be overcome by using a proxy launcher. This nesting does work with MPICH and its derivative MPI implementations.
It is also unsuitable to use this mode when running on the launch nodes of
three-tier systems (e.g., Summit). In that case local mode is recommended.
Run the Manager on one system and launch workers to remote
systems or nodes over TCP. Configure through
libE_specs, or on the command line
if using an Ensemble object with
Ensemble(parse_args=True),
Reverse-ssh interface
Set comms to ssh to launch workers on remote ssh-accessible systems. This
co-locates workers, functions, and any applications. User
functions can also be persistent, unlike when launching remote functions via
Globus Compute.
The remote working directory and Python need to be specified. This may resemble:
python myscript.py --comms ssh --workers machine1 machine2 --worker_pwd /home/workers --worker_python /home/.conda/.../python
Limitations of TCP mode
There cannot be two calls to
Ensemble.run()orlibE()in the same script.
Further Command Line Options¶
See the parse_args function in Convenience Tools for
further command line options.
Environment Variables¶
Environment variables required in your run environment can be set in your Python sim or gen function. For example:
os.environ["OMP_NUM_THREADS"] = 4
set in your simulation script before the Executor submit command will export the setting
to your run. For running a bash script in a sub environment when using the Executor, see
the env_script option to the MPI Executor.
Running on Multi-Node Systems¶
For running on multi-node platforms and supercomputers, there are alternative ways to configure libEnsemble to resources. See the Running on HPC Systems guide for more information, including some examples for specific systems.