Creating a template

This page describes how to create a template. A similar description is also provided in the Tutorials. Please see Tutorials for additional information on how to use Retropaths.

A template describes the essential substructure of a chemical reaction in terms of wildcards, R-groups.

We create a template from a few different objects:

1. Name
2. Conditions
3. molecule that you start with
4. changes
5. rules

Name

The name of a template is a simple string, it must be unique for different reactions, and typically reflects the common name for a reaction.

Conditions

Conditions is a python object that can be used to filter the templates that apply in a Reaction Pot via the Library. Conditions also contain additonal information like references (doi).

The conditions are - Acidity - Solvent

Temperature and light could also be a flag but are currently not used.

Acidity

Acidity is an Enum class with a few simple functions like is_compatible_with

class Acidity(Enum):
    acid = 1
    neutral = 2
    not_basic = 3
    basic = 4
    not_neutral = 5
    not_acid = 6
    all_pH = 7

Compatibility in this function works like this:

a.is_compatible_with(b)

means that A is compatible with the condition- imposed by B.
if A is any and B is ACID, this will be FALSE!
if A is Acid and B is ANY this will be TRUE!
if A is NOT_ACID and B is BASIC, this will be FALSE- (because neutral is not permitted)

Solvent

Solvent is an Enum class with similar functionality as Acidity

class Solvent(Enum):
    water = 1
    no_water = 2
    any = 3

Conditions Example

conditions_library = \
    {'temperature' : 2, # not used
     'light' : False, # not used
     'pH' : Acidity.Acidic,
     'solvent' : Solvent.water, 
     'catalyst' : '', #non-functional
    'doi' : ['https://en.wikipedia.org/wiki/Amadori_rearrangement'],
    }

Molecule

A molecule is a python object which represents the molecule. Under the hood, a molecule is a networkx object but interfaces with rdkit and openeye to translate between smiles strings and other formats.

You can then create a Retropaths Molecule from smiles using Molecule from_smiles() function

With the retropaths molecule you can do many things including: - adding them together to create a new molecule - You can draw the molecule in two different modes: rdkit and d3

The default mode is rdkit

a = Molecule.from_smiles('CCCC')
b = Molecule.from_smiles('O')
p_without_r = a + b
p_without_r.draw()

docs/Example rdkit representation

With the d3 mode you can visualize the node index with node_index=True, and charge with charge=True . We can also change the size (smaller number is bigger)

p_without_r.draw(mode='d3',charges=True, node_index=True,size=(500,500))

Also within d3 mode you can see that single bonds are one bond, double bonds have two bonds, triple bonds have three bonds, and aromatic bonds are wide bonds. Furthermore, charges can be visualized with the keyword charges=True

docs/Example d3 representation

You can substitute atoms in the molecule with the substitute_group or substitute_groups() functions

docs/Example substitution

Changes

Changes is an object created from a dictionary that must contain the keywords "delete", "single", "double", "triple", and "charges"

changes_d = {'delete':[(0,4),(1,5)],
             'single':[(2,3),(3,4),(2,5)],
             'double':[(0,1)],
           'aromatic':[],
             'triple':[],
            'charges':[]}
changes = Changes.from_dict(changes_d)

You can visualize what changes you want to make by drawing the molecule with mode='d3'

The changes object can be printed

print(changes)

bonds=BondsChanges(delete=[(0, 1)], single=[(1, 2)], aromatic=[], double=[], triple=[]) charges=ChargeChanges(charges=[(0, -1), (2, 1)])

Rules

Rules restrict the application of a template to certain reactants and products

We currently have three types of rules: 1. Enforce Rules 2. At least one rule 3. Avoid Formation rule However, enforce rules is by far the most important one.

Enforce Rules

An enforce rule is a list of Matching Groups for a specific R group label (e.g. R1). A template that is subgraph isomorphic to a molecule can only be applied to produce a product if each value of the R groups in the molecule matching satisfy their respective enforce rules.

The matching groups that compose an enforce rule can contain regular elements, A Atom, and must contain a link atom.

For example, in a template we can enforce that R1 must be HL, CA3L, or CA2L. HL : a hydrogen CA3L : An aliphatic carbon that is single bonded to three A atoms CA2L: An aromatic carbon that is aromatic bonded to two A atoms.

TODO

Say how to create this and what it's functions are

At least one rule

This can be used in conjuction with enforce rules to ensure that at least one R groups, in a group of R groups, is a particular value. This can be useful for reactions such as aromatic nucleophilic substitution and Diels-Alder.

Avoid Formation

We can also ensure that the products produced by a template don't match a particular matching group. This can be useful for ensuring that certain high-energy products aren't formed.

Matching Groups

A matching group is a molecule with a link atom, L, that we can use for rules, and template generators. For example, when we are calculating the subgraph isomorphisms of a template molecule with Rs on a real molecule we first remove the R groups and have a mapping between the template skeleton and the real molecule. We can then remove the skeleton of the real molecule that is isomorphic to the template to obtain the real value of the "real value of R groups". We also have a mapping between the real value of the R group and the R-group label in the template, e.g. R1.

Before applying the isomorphism mapping changes to the real molecules skeleton, we can then check to see that the value of the R group capped with an L is subgraph isomorphic to the groups that they must satisfy (which are matching groups). We call this an enforce rule.

When a matching group is used in a template generator the link atom is where we create the bond between the skeleton and the matching group. We always save the matching groups in the variable MG.

You can view the matching groups by loading them into a notebook and using its draw functions:

Matching_groups_folder = os.getenv('TEMPLATE_FOLDER')
MG = MatchingGroupsData.load(Matching_groups_folder)
MG.draw('OSO2[L]')

docs/Example matching group

A matching group can also contain special atoms including "R" groups, and "A" atoms.

A Atom

A atoms are special nodes accompanied by a list of element_matching that can match several different elements. element_matching is typically defined as [H,C,N,O,S,F,Cl,Br,I]. MG with R groups are never used in enforce rules however, and only in template generators.

Template Example

In this example, the template is stored in a variable named "prova" but this name is arbitrary. We could call it whatever we wish as long as it obeys the naming conventions of Python. The template will be saved as a pickle file with the save function.

When we have the template we can draw it using its draw function

reaction_name = "Amadori-Rearrangement"

conditions_library = {'temperature' : 2,
                            'light' : False, 
                               'pH' : 5, 
                          'solvent' : 3, 
                         'catalyst' : '',
                              'doi' : [
                              'https://en.wikipedia.org/wiki/Amadori_rearrangement'
                              ]
                              } 
conditions = Conditions(**conditions_library)

a = Molecule.from_smiles('OCC=N')
p_without_r = a
p_precursor = p_without_r.substitute_groups([6,'R1'](6,'R1'))

changes_d = {'delete':[(0,4),(1,5)],
             'single':[(2,3),(3,4),(2,5)],
             'double':[(0,1)],
           'aromatic':[],
             'triple':[],
            'charges':[]}
changes = Changes.from_dict(changes_d)

rules=Rules.create_default_rules_for_molecule(p_precursor, MG, allow_aromatic=True)

prova = ReactionTemplate.from_components(reaction_name,
                                         p_precursor, 
                                         changes_d, 
                                         conditions, 
                                         rules=rules, 
                                         MG=MG,
                                         side_reaction=True)
prova.save(folder=saving_folder)
prova.draw(size=(400,400),charges=True,node_index=True)

docs/Example template with indices

Template Generator

Generating all of the templates can be arduous. Fortunately, we do have some techniques for making this easier

A chemical reaction is often described in terms of groups which generalize the action of a particular group. For example, in Nucleophilic Substition we can can generalize Nucleophile and Leaving group as follows:

Template Generator

The Nucleophile can be a variety of groups such as alcohols, thiols, and anhydrides (plus others). The Leaving group can be groups such as halides, anhydrides, thiols, and alcohols (plus others)

Example template hierarchy

In retropaths we explicitly generate the templates for each of these different possibilities

(We could alternatively create enforce rules for the R groups, but we prefer to keep the R group rules simple : H, aliphatic carbon and aromatic carbon).

To do this efficiently, Retropaths has code that takes lists of Matching Groups to substitute. In contrast to the matching groups that are used in rules, the matching groups for template generators can contain R-groups, but cannot contain A-Atoms.

Template Generator Example

reaction_name = "H-Donor-Water"
conditions_thing = {'temperature' : 3,
                    'light' : False,
                    'pH' : 7,
                    'solvent' : 1,
                    'catalyst' : '',
                    'doi' : ['https://en.wikipedia.org/wiki/Acid%E2%80%93base_reaction']}
conditions = Conditions(**conditions_thing)

a = Molecule.from_smiles('[[[H](%5BH)([H])')
b = Molecule.from_smiles('[H]')
p_without_r = a + b
p_precursor = p_without_r.substitute_groups()

changes_d = {'delete' : [(0,1)],
             'single' : [(1,4)],
             'double' : [],
           'aromatic' : [],
             'triple' : [],
            'charges' : [(0,-1),(4,1)]}
changes = Changes.from_dict(changes_d)
rules = Rules.create_default_rules_for_molecule(p_precursor, MG)
preprova = ReactionTemplate.from_components(reaction_name,
                                            p_precursor,
                                            changes_d,
                                            conditions,
                                            rules)

Acceptor = ['NR2L','NR3L','C(=O|[=O)(OL)[R](OL)%5BR)(OL)[R]][H]',
            'C=[O+][L]','C(=[O+][L])N','RCOL','RCOHL+','O([L])[R]','OR2L+','C=NL+'] 
repl_list = [[('Acceptor', x)] for x in Acceptor]

a = template_generator(p_precursor, changes, repl_list, MG)

for tup, name, p, new_changes in zip(repl_list,*a):
    full_name = reaction_name + '-' + name
    rules = Rules.create_default_rules_for_molecule(p, MG,allow_aromatic=True)
    if tup[0][1] == 'O([L])[R]':
        rules.add_new_R_pattern_to_enforce_rule('R17',['CZZZL','NZZL','OZL'],MG)
    prova = ReactionTemplate.from_components(full_name,
                                            p, 
                                            new_changes.to_dict(), 
                                            conditions, 
                                            rules, 
                                            MG = MG, 
                                            spectators = [])
    prova.save(folder = saving_folder)
    bs += prova.draw(size=(300,300), node_index=True, charges=True, string_mode=True)