Input Generators#
Input generators offer several unique capabilities for formatting text input for programs, including syntax highlighting rules and capabilities for creating multiple files as part of one input (e.g., separate files for geometry and keywords).
Avogadro will call input generator scripts using several command-line arguments to generate JSON both for the user interface form and for the input to send to the external programs.
Script Entry Points#
The script must handle the following command-line arguments:
--debug
Enable extra debugging output. Used with other commands. It is not required that the script support extra debugging, but it should not crash when this option is passed.--print-options
Print the available options supported by the script, e.g. simulation parameters, etc. See below for more details.--generate-input
Read an option block from stdin and print input files to stdout. See below for more details.--display-name
Print a user-friendly name for the input generator. This is used in the GUI for menu entries, window titles, etc.
Specifying parameters with --print-options
#
The format of the --print-options
output must be a JSON object of
the following form:
{
"userOptions": {
...
},
"highlightStyles": [
{
"style": "Descriptive name",
"rules": [
{
"patterns": [ ... ],
"format": { ... }
},
...
],
},
...
],
"inputMoleculeFormat": "cjson"
}
The userOptions
block contains a JSON object keyed with option names
(e.g. “First option name”), which are used in the GUI to label simulation
parameter settings. Various parameter types are supported.
Special Parameters#
Some parameters are common to most calculation codes. If the following parameter names are found, they will be handled specially while creating the GUI (e.g., the charge and spin will be placed on one line)
It is recommended to use the names below for these options to provide a consistent interface and ensure that MoleQueue job staging uses correct values where appropriate.
Option Name |
Type |
Description |
---|---|---|
“Title” |
string |
Input file title comment, MoleQueue job description. |
“Filename Base” |
string |
Input file base name, e.g. “job” in “job.inp”. |
“Processor Cores” |
integer |
Number of cores to use. Will be passed to MoleQueue. |
“Calculation Type” |
stringList |
Type of calculation, e.g. “Single Point” or “Equilibrium Geometry”. |
“Theory” |
stringList |
Levels of QM theory, e.g. “RHF”, “B3LYP”, “MP2”, “CCSD”, etc. |
“Basis” |
stringList |
Available basis sets, e.g. “STO-3G”, “6-31G**”, etc. |
“Charge” |
integer |
Charge on the system. |
“Multiplicity” |
integer |
Spin multiplicity of the system. |
Syntax Highlighting#
Rules for syntax highlighting can be specified as a collection of regular expressions or wildcard patterns and text format specifications in the “highlightRules” array. The highlightRules format is:
"highlightStyles": [
{
"style": "Style 1",
"rules": [ (list of highlight rules, see below) ],
},
{
"style": "Style 2",
"rules": [ (list of highlight rules, see below) ],
},
...
],
The style name is unique to the style object, and used to associate a set of highlighting rules with particular output files. See the –generate-input documentation for more details.
The general form of a highlight rule is:
{
"patterns": [
{ "regexp": "^Some regexp?$" },
{ "wildcard": "A * wildcard expression" },
{ "string": "An exact string to match.",
"caseSensitive": false
},
...
],
"format": {
"preset": "<preset name>"
}
}
or,
{
"patterns": [
...
],
"format": {
"foreground": [ 255, 128, 64 ],
"background": [ 0, 128, 128 ],
"attributes": ["bold", "italic", "underline"],
"family": "serif"
}
}
The patterns array contains a collection of fixed strings, wildcard expressions, and regular expressions (using the QRegExp syntax flavor, see the QRegExp documentation) that are used to identify strings that should be formatted. There must be one of the following members present in each pattern object:
regexp A
QRegExp
-style regular expression. If no capture groups (“(…)”) are defined, the entire match is formatted. If one or more capture groups, only the captured texts will be marked.wildcard A wildcard expression
string An exact string to match. Any pattern object may also set a boolean caseSensitive member to indicate whether the match should consider character case. If omitted, a case-sensitive match is assumed.
The preferred form of the format member is simply a specification of a preset format. This allows for consistent color schemes across input generators. The recognized presets are:
“title”: A human readable title string.
“keyword”: directives defined by the target input format specification to have special meaning, such as tags indicating where coordinates are to be found.
“property”: A property of the simulation, such as level of theory, basis set, minimization method, etc.
“literal”: A numeric literal (i.e. a raw number, such as a coordinate).
“comment”: Sections of the input that are ignored by the simulation code.
If advanced formatting is desired, the second form of the format member allows fine-tuning of the font properties:
foreground color as an RGB tuple, ranged 0-255
background color as an RGB tuple, ranged 0-255
attributes array of font attributes, valid strings are “bold”, “italic”, or “underline”
family of font. Valid values are “serif”, “sans”, or “mono” Any of the font property members may be omitted and default QTextCharFormat settings will be substituted.
The input generator extension will apply the entries in the highlightRules object to the text in the order they appear. Thus, later rules will override the formatting of earlier rules should a conflict arise.
{
"patterns": [
...
],
"format": {
"foreground": [ 255, 128, 64 ],
"background": [ 0, 128, 128 ],
"attributes": ["bold", "italic", "underline"],
"family": "serif"
}
}
The patterns
array contains a collection of fixed strings, wildcard
expressions, and regular expressions (using the QRegExp syntax flavor, see
the QRegExp documentation) that are used to identify strings that should be
formatted.
There must be one of the following members present in each pattern object:
regexp
AQRegExp
-style regular expression. If no capture groups (“(…)”) are defined, the entire match is formatted. If one or more capture groups, only the captured texts will be marked.wildcard
A wildcard expressionstring
An exact string to match.
Any pattern object may also set a boolean caseSensitive
member to indicate
whether the match should consider character case. If omitted, a
case-sensitive match is assumed.
The preferred form of the format
member is simply a specification of a
preset format. This allows for consistent color schemes across input generators.
The recognized presets are:
"title"
: A human readable title string."keyword"
: directives defined by the target input format specification to have special meaning, such as tags indicating where coordinates are to be found."property"
: A property of the simulation, such as level of theory, basis set, minimization method, etc."literal"
: A numeric literal (i.e. a raw number, such as a coordinate)."comment"
: Sections of the input that are ignored by the simulation code.
If advanced formatting is desired, the second form of the format
member
allows fine-tuning of the font properties:
foreground
color as an RGB tuple, ranged 0-255background
color as an RGB tuple, ranged 0-255attributes
array of font attributes, valid strings are"bold"
,"italic"
, or"underline"
family
of font. Valid values are"serif"
,"sans"
, or"mono"
Any of the font property members may be omitted and default QTextCharFormat settings will be substituted.
The input generator extension will apply the entries in the highlightRules
object to the text in the order they appear. Thus, later rules will
override the formatting of earlier rules should a conflict arise.
Requesting Full Structure of Current Molecule#
The inputMoleculeFormat
is optional, and can be used to request a
representation of the current molecule’s geometry when
--generate-input
is called. The corresponding value
indicates the format of the molecule that the script expects. If this value
is omitted, no representation of the structure will be provided.
note Currently valid options for inputMoleculeFormat are “cjson” for Chemical JSON or “cml” for Chemical Markup Language.
Handling User Selections: --generate-input
#
When --generate-input
is passed, the information needed to generate
the input file will be written to the script’s standard input
channel as JSON string of the following form:
{
"cjson": {...},
"options": {
"First option name": "Value 2",
"Second option name": "Value 1",
...
}
}
The cjson
entry will contain a Chemical JSON representation
of the molecule if inputMoleculeFormat
is set to “cjson” in the
--print-options
output.
Similarly, a cml
entry and CML string will exist if a Chemical Markup
Language representation was requested.
It will be omitted entirely if inputMoleculeFormat
is not set.
The options
block contains key/value
pairs for each of the options specified in the userOptions
block of the
--print-options
output.
If the script is called with --generate-input
, it must write a JSON
string to standard output with the following format:
{
"files": [
{
"filename": "file1.ext",
"contents": "...",
"highlightStyles": [ ... ]
},
{
"filename": "file2.ext",
"filePath": "/path/to/file/on/local/filesystem"
},
...
],
"warnings": ["First warning.", "Second warning.", ... ],
"mainFile": "file2.ext"
}
The files
block is an array of objects, which define the actual input
files. The filename
member provides the name of the file, and
either contents
or filePath
provide the text that goes into the file.
The contents
string will be used as the file contents, and filePath
should contain an absolute path to a file on the filesystem to read and use
as the input file contents.
The optional highlightStyles
member is an array of strings describing any
highlight styles to apply to the file (see --print-options
documentation).
Each string in this array must match a style
description in a highlighting
rule in the --print-options
output.
Zero or more highlighting styles may be applied to any file.
The order of the files in the GUI will match the order of the files in the array, and the first file will be displayed first.
The warnings
member provides an array of strings that describe non-fatal
warnings to be shown to the users. This is useful for describing
the resolution of conflicting options, e.g. “Ignoring basis set for
semi-empirical calculation.”. This member is optional and should be omitted
if no warnings are present.
The mainFile
member points to the primary input file for a calculation.
This is the file that will be used as a command line argument when executing
the simulation code (if applicable), and used by MoleQueue to set the
$$inputFileName$$
and $$inputFileBaseName$$
input template keywords.
This is optional; if present, the filename must exist in the files
array.
If absent and only one file is specified in files
, the single input file
will be used. Otherwise, the main file will be left unspecified.
Automatic Generation of Geometry#
The generation of molecular geometry descriptions may be skipped in the script and deferred to the InputGenerator class by use of a special keyword. The “contents” string may contain a keyword of the form
$$coords:[coordSpec]$$
where [coordSpec]
is a sequence of characters.
The characters in [coordSpec]
indicate the information needed about each
atom in the coordinate block.
Other keywords that can be used in the input files are:
$$atomCount$$
: Number of atoms in the molecule.$$bondCount$$
: Number of bonds in the molecule.
Coordinate Blocks#
The characters in the specification string indicate the information needed about each atom in the coordinate block.
#
: Atom index (one-based index)Z
: Atomic number (e.g. “6” for carbon)G
: GAMESS-styled Atomic number (e.g. “6.0” for carbon)S
: Element symbol (e.g. “C” for carbon)N
: Element name (e.g. “Carbon”)x
: X cartesian coordinatey
: Y cartesian coordinatez
: Z cartesian coordinatea
: ‘a’ lattice coordinate (unit cell required)b
: ‘b’ lattice coordinate (unit cell required)c
: ‘c’ lattice coordinate (unit cell required)0
: A literal “0”. Useful for optimization flags.1
: A literal “1”. Useful for optimization flags._
: A space character. Useful for alignment.
For example, the specification string
__SZxyz110
will be replaced by a molecule-specific block of text similar to the following:
C 6 1.126214 0.765886 0.000000 1 1 0
C 6 0.819345 -0.564955 0.000000 1 1 0
C 6 -0.598383 -0.795127 0.000000 1 1 0
C 6 -1.310706 0.370165 0.000000 1 1 0
S 16 -0.285330 1.757144 0.000000 1 1 0
H 1 2.130424 1.185837 0.000000 1 1 0
H 1 1.548377 -1.375303 0.000000 1 1 0
H 1 -1.033768 -1.794407 0.000000 1 1 0
H 1 -2.396173 0.450760 0.000000 1 1 0
Error Handling#
In general, these scripts should be written robustly so that they will not fail under normal circumstances. However, if for some reason an error occurs that must be reported to the user, simply write the error message to standard output as plain text (i.e. not JSON), and it will be shown to the user.
Debugging#
Debugging may be enabled by defining AVO_PYTHON_SCRIPT_DEBUG
in the process’s
environment. This will cause the --debug
option to be passed in
all calls to generator scripts, and will print extra information to the
qDebug() stream from within Avogadro. The script is free to handle the
debug flag as the author wishes.