Get Started with aiida-trains-pot#

Welcome to aiida-trains-pot! This guide will walk you through setting up and running the graphene example located in examples/graphene on GitHub.

Prerequisites#

To use aiida-trains-pot, ensure you have the following prerequisites installed:

  • AiiDA

  • QuantumESPRESSO, MACE, and LAMMPS codes installed on your computing environment and configured in AiiDA

  • Access to a GPU-enabled HPC cluster with SLURM support (e.g., Leonardo cluster at CINECA)

Note: The example in this guide uses QuantumESPRESSO, MACE, and LAMMPS workflows configured with the appropriate GPU parameters.

Step 1: Setup the Environment#

Before starting, ensure you load your AiiDA profile and import necessary dependencies. In this example, the graphene structure gr8x8.xyz and required configuration files are located in the examples/graphene directory.

from aiida.orm import load_code, load_computer, Str, Dict, List, Int, Bool, Float
from aiida import load_profile
from aiida.engine import submit
from aiida.plugins import WorkflowFactory, DataFactory
from ase.io import read
import yaml
import os
from aiida_trains_pot.utils.restart import models_from_trainingwc,  models_from_aiidatrainspotwc
from aiida_trains_pot.utils.generate_config import generate_lammps_md_config

load_profile()

Step 2: Define and Load the Codes#

In this example, we use Quantum ESPRESSO (QE), MACE, LAMMPS, and committee evaluation codes. Make sure these are installed and available as AiiDA codes. Examples of configuration YAML files can be found in examples/setup_codes:

QE_code                 = load_code('qe7.2-pw@leo1_scratch_bind')
MACE_train_code         = load_code('mace_train@leo1_scratch_mace')
MACE_preprocess_code    = load_code('mace_preprocess@leo1_scratch_mace')
MACE_postprocess_code   = load_code('mace_postprocess@leo1_scratch_mace')
LAMMPS_code             = load_code('lmp4mace@leo1_scratch')
EVALUATION_code         = load_code('committee_evaluation_portable')

Step 3: Set Machine Parameters#

Customize machine parameters for each code (time, nodes, GPUs, memory, etc.). Here’s an example for configuring Quantum ESPRESSO:

QE_machine = {
    'time': "00:05:00",
    'nodes': 1,
    'gpu': "1",
    'taskpn': 1,
    'cpupt': "8",
    'mem': "70GB",
    'account': "***",
    'partition': "boost_usr_prod",
    'qos': "boost_qos_dbg"
}

Repeat this process for each code (MACE, LAMMPS and committee evaluation), adapting the parameters as needed.

Step 4: Load the Graphene Structure#

Load the graphene structure gr8x8.xyz:

input_structures = PESData([read(os.path.join(script_dir, 'gr8x8.xyz'))])

Step 5: Setup the TrainsPot Workflow#

The TrainsPot workflow combines several tasks. Use get_builder() to get the workflow’s builder and give the input structures.

Note: Passing pseudopotentials pay attention to give one per each atomic species present in the dataset. Hence in getting the pseudos from SSSP library you should pass to the get_pseudos method a structure containing all the atomic species present in the dataset.

builder = TrainsPot.get_builder(
            abinitiolabeling_code     = QE_code,
            abinitiolabeling_protocol = 'stringent',
            pseudo_family             = 'SSSP/1.3/PBE/precision',
            md_code                   = LAMMPS_code,
            md_protocol               = 'vdw_d2',
            dataset                   = input_structures,
        )

The workflow has several steps, each of them can be enabled or disabled by setting the corresponding flags. Can be also specified a maximum number of active learning loops, threshold of energy, force and stress and maximum selected frames to label:

builder.do_dataset_augmentation = Bool(True)
builder.do_ab_initio_labelling  = Bool(True)
builder.do_training             = Bool(True)
builder.do_exploration          = Bool(True)
builder.max_loops               = Int(1)

builder.thr_energy              = Float(2e-3)
builder.thr_forces              = Float(5e-2)
builder.thr_stress              = Float(1e-2)
builder.max_selected_frames     = Int(1000)

Step 6: Configure Dataset Augmentation#

Data augmentation starts from few input configuration (just one graphene structure in this example) and increases the size of the dataset generating new configurations. The augmented dataset con contain the input structures, isolated atoms (one per each atomic species present in the input structures), and distorted configurations. Various parameters for data augmentation can be adjusted:

builder.dataset_augmentation.do_rattle_strain_defects           = Bool(True)
builder.dataset_augmentation.do_input                           = Bool(True)
builder.dataset_augmentation.do_isolated                        = Bool(True)
builder.dataset_augmentation.do_clusters                        = Bool(True)
builder.dataset_augmentation.do_slabs                           = Bool(True)
builder.dataset_augmentation.do_replication                     = Bool(True)
builder.dataset_augmentation.do_check_vacuum                    = Bool(True)
builder.dataset_augmentation.do_substitution                    = Bool(True)

builder.dataset_augmentation.rsd.params.rattle_fraction         = Float(0.6)
builder.dataset_augmentation.rsd.params.max_sigma_strain        = Float(0.3)
builder.dataset_augmentation.rsd.params.n_configs               = Int(80)
builder.dataset_augmentation.rsd.params.frac_vacancies          = Float(0.2)
builder.dataset_augmentation.rsd.params.vacancies_per_config    = Int(1)
builder.dataset_augmentation.clusters.n_clusters                = Int(80)
builder.dataset_augmentation.clusters.max_atoms                 = Int(30)
builder.dataset_augmentation.clusters.interatomic_distance      = Float(1.5)
builder.dataset_augmentation.slabs.miller_indices               = List([[1, 0, 0], [1, 1, 0], [1, 1, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 0, 1]])
builder.dataset_augmentation.slabs.min_thickness                = Float(10)
builder.dataset_augmentation.slabs.max_atoms                    = Int(600)
builder.dataset_augmentation.replicate.min_dist                 = Float(24)
builder.dataset_augmentation.replicate.max_atoms                = Int(600)
builder.dataset_augmentation.vacuum                             = Float(15)
builder.dataset_augmentation.substitution.switches_fraction     = Float(0.2)
builder.dataset_augmentation.substitution.structures_fraction   = Float(0.1)

Step 7: Configure Ab Initio Labelling (Quantum ESPRESSO)#

PW parameters are already populated once defining the builder according to pseudo_family and protocol

Step 8: Configure MACE and LAMMPS for Training and Exploration#

MACE parameters can be written in a yaml file as in mace_config.yml Additonal information about the MACE parameters can be found in the MACE documentation.

Note: In latest release of MACE (v0.3.8) the training can fail if using multiple GPUs and the training stops earlier following patience criteria. To avoid this issue, when using multiple GPUs, set patience parameter to a large value (e.g., 1000).

Here we load the MACE configuration file, preprocess and posprocess codes, set the number of potentials in the committee:

MACE_config                                   = os.path.join(script_dir, 'mace_config.yml')
with open(MACE_config, 'r') as yaml_file:
    mace_config = yaml.safe_load(yaml_file)
builder.training.mace.train.mace_config       = Dict(mace_config)

builder.training.mace.train.code              = MACE_train_code
builder.training.mace.train.preprocess_code   = MACE_preprocess_code
builder.training.mace.train.postprocess_code  = MACE_postprocess_code
builder.training.mace.train.do_preprocess     = Bool(True)


builder.training.num_potentials = Int(3)

As for MACE, also for LAMMPS, simulation parameters can be loaded from file, i.e. lammps_md_params.yml. The additonal information about the LAMMPS parameters can be found in the LAMMPS documentation:

lammps_params_yaml = os.path.join(script_dir, 'lammps_md_params.yml')
with open(lammps_params_yaml, 'r') as yaml_file:
    lammps_params_list = yaml.safe_load(yaml_file)
builder.exploration.params_list = List(lammps_params_list)

Otherwise generate_lammps_md_config can be used to generate simple LAMMPS parameter either for NVT or NPT simulations:

temperatures                     = [30, 35, 40, 45]
pressures                        = [0]
steps                            = [500]
styles                           = ["npt"]
timestep                         = 0.001
builder.exploration.params_list  = generate_lammps_md_config(temperatures, pressures, steps, styles, timestep)
builder.exploration.parameters   = Dict({'control':{'timestep': timestep,},})

Step 9: Setup Committee Evaluation#

Since committee evaluation uses a portable code, the computer should be explicitly set:

builder.committee_evaluation.code = EVALUATION_code
builder.committee_evaluation.metadata.computer = load_computer('leo1_scratch')

Step 10: Submit the Workflow#

Once everything is set up, submit the workflow:

calc = submit(builder)

This guide should help you get started with aiida-trains-pot! For more information on AiiDA workflows, check the AiiDA documentation.