Libraries
Libraries are reusable building blocks for building parsers.
Libraries contain rules, entities and other logic that can be reused across applications. They are built-in, defined by users or are configured dynamically at runtime. Libraries provide the opportunity to not only reuse declarative grammars but also introduce algorithmic processes and procedures into applications that are opaque in their operation. There are five types of grammar library:
- User defined grammar libraries as described in Grammars
- Internal grammar libararies built into Nets
- External grammar libraries supplied as a DLL or shared library
- System context grammars include runtime runtime parameters and environment variables
- The system command grammar defines common operating system commands
Internal Libraries
Internal grammar libraries are built into NETS and cannot be changed. They provide features like character classification and common encoding/decoding algorithms.
Library Name | Description |
---|---|
Character Type Library | The nets-ctype grammar library defines common character classification rules and
entity references. Variants are provided for multi byte and wide
character strings.
|
Utility Library | The nets-utility grammar library defines rules for echoing input to output, progress monitoring,
sending output to the null stream, line counting and hexdumps.
|
Encoding Library | The nets-encoding grammar library defines common encoding and decoding rules.
These include asciitowchar , wchartoascii , base64encode and base64decode to name a few.
|
XML Library | The nets-XML grammar library defines common XML entity references for
XML derived grammars. Also included in this package is XML encoding.
|
External Libraries
Executable external libraries provide a way of extending Nets. Included in NETS is
the 'iconv'
library. Users can develop their own external grammar libraries (see the Code section
of this guide for more details). NETS searches in the -grammar_libpath
path for DLLs
and shared libraries that conform to the NETS grammar library interface specification or the current directory if no -grammar_libpath
is specified. If they are found they are automatically
loaded and their rules and entities made available to the NETS runtime.
Library Name | Description |
---|---|
Iconv Library | The nets-iconv grammar library supports conversion of
strings using any combination of source and target character using the
POSIX standard 'iconv' function. The Iconv Library is an external grammar library.
|
Encoder-Decoder Library | An extensible package of utilities. |
System Runtime Grammars
In addition to the built-in and external grammars, other grammars are dynamic and defined at runtime.
Library Name | Description |
---|---|
Context Grammar | The context grammar defines entities derived from command line and environment parameters.
It includes entities like 'grammar' , 'input' and 'output' .
|
Command Grammar | The command grammar defines operating system commands that can be referenced with
a grammar.
|
To view the full list of installed grammars and their rules and entities run NETS with
the -printlibrary
and -grammar_xml=grammar-xml
to generate an XML file containing all the user defined
grammars and rules in the current execution including the context grammar.
>nets-parser -loglevel=3 -grammar=default.g -grammar_xml=default.xml -printlibrary -input=default.in -output=default.out
Context Grammars
The context grammar is generated at runtime from the NETS command line parameters and system environment variables.
The -loadenv
command line parameter forces all environment variables to be loaded into the context grammar.
A typical example of the context grammar is shown below. The -grammar_xml
command line parameter
can be used to report on the context for any run of the parser. Treating command parameters and environment variables in this way
enables them be referenced easily within the grammar through Entity References. The locale (shown below) is defined when Nets
runs and is taken from the current system locale.
# Command Line
>nets-parser -loglevel=3 -grammar=default.g -input=default.in -output=default.out -grammar=default.g -grammar_xml=grammar.xml
-error=stderr
<!-- grammar.xml -->
<grammar id="context">
<entity id="ini">default.ini</rule>
<entity id="error">pipe:stderr</rule>
<entity id="grammar_start">start</rule>
<entity id="loglevel">3</rule>
<entity id="input">default.in</rule>
<entity id="output">default.out</rule>
<entity id="grammar">default.g</rule>
<entity id="grammar_xml">grammar.xml</rule>
<entity id="locale">English_United States.1252</rule>
</grammar>
Command Grammars
The command grammar is loaded at runtime from the NETS 'command.g'
grammar. It includes the 'command'
rule which is part of the
built-in library. NETS searches in the -grammar_libpath
path for command.g grammar file.
In some installations (such as cloud environments) -grammar_libpath
is not available, and so cannot be overridden,
so only a fixed set of commands are made available to users.
<!-- GXML -->
<grammar id="nets_command">
<rule id="cp">
<ruleref idref="command" name="copy"/>
</rule>
<rule id="rm">
<ruleref idref="command" name="del"/>
</rule>
<rule id="command"/>
</grammar>
Name | Description |
---|---|
rule command |
input none
output none
attributes name, parametersExecutes the command identified by name in 'command.g' with parameters identified by the 'parameter' Attribute. (* NEBNF *)
|
Core Attributes
Core attributes are the set of attributes used by all rules and include id
, input
, output
, encoding
, echo
, predicate
, onbefore
, onafter
, onerror
,
loginfo
and logmessage
. They are denoted by 'coreattrs'
in the library definitions that follow.
Character Type Library
The nets-ctype
grammar library defines common character classifications and
entity references. This library also supports custom character ranges using a
minimum to a maximum value. Variants are provided for multi byte and wide
character strings.
Name | Description |
---|---|
rule alnum |
input single multi-byte character in the current locale
output noneattributes coreattrsA union of the alpha and digit character classifications |
rule alpha |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny alphabetic character in the current locale |
rule ascii |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny ASCII 7 bit character in the current locale |
rule char |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny character in the current locale |
rule cntrl |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny control character in the current locale |
rule digit |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny decimal digit in the current locale |
rule lower |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny lower case letter in the current locale |
rule print |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny printable character in the current locale |
rule punct |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny punctuation character in the current locale |
rule space |
input single multi-byte character in the current locale
output noneattributes coreattrsAny space character in the current locale |
rule upper |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny upper case character in the current locale |
rule xdigit |
input single multi-byte character in the current localeoutput noneattributes coreattrsAny hexadecimal digit in the current locale |
rule range |
input single multi-byte character in the current localeoutput noneattributes coreattrs, min, maxAny character with the min and max range defined in the current locale |
rule walnum |
input single wide character in the current localeoutput noneattributes coreattrsA union of the walpha and wdigit character classifications |
rule walpha |
input single wide character in the current localeoutput noneattributes coreattrsAny alphabetic wide character in the current locale |
rule wascii |
input single wide character in the current localeoutput noneattributes coreattrsAny ASCII 7 bit wide character in the current locale |
rule wchar |
input single wide character in the current localeoutput noneattributes coreattrsAny multibyte wide character in the current locale |
rule wcntrl |
input single wide character in the current localeoutput noneattributes coreattrsAny control wide character in the current locale |
rule wdigit |
input single wide character in the current localeoutput noneattributes coreattrsAny decimal wide digit in the current locale |
rule wlower |
input single wide character in the current localeoutput noneattributes coreattrsAny lower case wide character in the current locale |
rule wprint |
input single wide character in the current localeoutput noneattributes coreattrsAny printable wide character in the current locale |
rule wpunct |
input single wide character in the current localeoutput noneattributes coreattrsAny punctuation wide character in the current locale |
rule wspace |
input single wide character in the current localeoutput noneattributes coreattrsAny space wide character in the current locale |
rule wupper |
input single wide character in the current localeoutput noneattributes coreattrsAny upper case wide character in the current locale |
rule wxdigit |
input single wide character in the current localeoutput noneattributes coreattrsAny hexadecimal wide digit in the current locale |
rule range |
input single wide character in the current localeoutput noneattributes coreattrs, min, maxAny wide character with the min and max range defined in the current locale |
entity \a |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx07 beep |
entity \b |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx08 backspace |
entity \t |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx09 horizontal tab |
entity \n |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx0A newline |
entity \v |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx0B vertical tab |
entity \f |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx0C form feed |
entity \r |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx0D carriage return |
entity \" |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx22 double quote |
entity \' |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx27 single quote |
entity \? |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx3F question mark |
entity \\ |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsx5C backslash |
rule \xhh |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsAny character code hh defined in hexadecimal |
XML Library
The nets-xml grammar library defines common entity references for grammars using XML grammar notation.
Name | Description |
---|---|
entity < |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsLess than |
entity > |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsGreater than |
entity & |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsAmpersand |
entity ' |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsSingle quote |
entity " |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsDouble quote |
entity &#nnn; |
input single multi-byte or wide character in the current localeoutput single multi-byte or wide character in the current localeattributes coreattrsAny character code nnn defined in decimal |
rule toXML |
input streamed DOM (SDOM)output wide character XMLattributes coreattrsConverts a streamed DOM (SDOM) to wide character XML. |
rule toTextTree |
input streamed DOM (SDOM)output wide character XMLattributes coreattrsConverts a streamed DOM (SDOM) to a text tree showing all DOM details. |
Utility Library
The nets-utility internal library contains utility rules.
Name | Description |
---|---|
rule echo |
input sequence of bytesoutput sequence of bytesattributes coreattrs, echo_file, max'echo' copies bytes from the input stream to the output stream without modification. The echo_file attribute defines a stream which will contain a copy of the input. This is useful when deubbing pipes between processes. Max sets the maximum number of bytes to be copied. (* NEBNF *)
|
rule pv |
input sequence of bytesoutput sequence of bytesattributes coreattrs, sample_time'pv' is modeled on the Unix/Linux pv command which monitors progress of data through a pipe. pv echos bytes from input to output and generates progress information in the error stream. If the input stream is a file, pv reports a percentage complete in the error stream. If the input stream is a pipe, pv reports bytes processed in the error stream. pv reports progress every n seconds determined by the sample_time Attribute. The following example reports progress every ten seconds. (* NEBNF *)
|
rule null |
input sequence of bytesoutput sequence of bytes to nullattributes coreattrs'null' takes a sequence of bytes from the input and output them to the null device. (* NEBNF *)
|
rule newline, nl |
input noneoutput noneattributes noneIncrements the input stream's newline count and newline offset. Used in grammars to keep track of the line number and byte count for input lines. 'newline' and 'nl' are synonyms. |
rule newlineout, nlo |
input noneoutput noneattributes noneIncrements the output stream's newline count and newline offset. Used in grammars to keep track of the line number and byte count for output lines. 'newlineout' and 'nlo' are synonyms. |
rule hexdump |
input sequence of bytesoutput ASCII hexdump of bytesattributes none'hexdump' takes a sequence of bytes from the input and output and creates a hexdump style output. (* NEBNF *)
Output
|
Iconv Library
The nets-iconv
grammar external library provides access to the commonly used 'iconv' character encoding/decoding
utility.
Name | Description |
---|---|
rule iconv |
input sequence of charactersoutput sequence of charactersattributes coreattrs, input_encoding, output_encodingIconv converts a sequence of characters in the input stream to a sequence of character in the output stream. The process stops either when the sequence of source characters is exhausted or an error in the conversion occurs. The Attributes input_encoding and output_encoding refer to any valid combination of input and output encoding supported by the 'iconv' application installed on the system. The following example shows iconv configured to convert ASCII input to WCHAR_T output. (* NEBNF *)
|
Enocoder (Decoder) Library
The nets-encoder built-in grammar library provides access to commonly used encoding/decoding utilities.
Name | Description |
---|---|
rule asciitowchar |
input sequence of ASCII charactersoutput sequence of wide charactersattributes coreattrs'asciitowchar' converts a sequence of ASCII characters in the input stream to a sequence of wide character in the output stream.
The process stops either when the sequence of source characters is exhausted. No checking is carried out on the validity of the ASCII character.(* NEBNF *)
|
rule wchartoascii |
input sequence of wide charactersoutput sequence of ASCII charactersattributes coreattrs'wchartoascii' converts a sequence of wide characters in the input stream to a sequence of ASCII character in the output stream.
The process stops either when the sequence of source characters is exhausted. No checking is carried out on the suitability of the
wide character and its validity of the ASCII character.(* NEBNF *)
|
rule base64encode |
input sequence of bytesoutput sequence of base64 encoded bytesattributes coreattrs'base64encode' converts a sequence of bytes in the input stream to a sequence of base64 encoded bytes in the output stream.
The process stops when the sequence of source characters is exhausted.(* NEBNF *)
|
rule base64decode |
input sequence of base64 encoded bytesoutput sequence of bytesattributes coreattrs'base64decode' converts a sequence of base64 encoded bytes in the input stream to a sequence of bytes in the output stream.
The process stops either when the sequence of source characters is exhausted or an invalid base64 sequence is encountered.
(* NEBNF *)
|
rule runlengthencode |
input sequence of bytesoutput sequence of run length encoded bytesattributes coreattrs'runlengthencode' converts a sequence of bytes in the input stream to a sequence run length encoded bytes in the output stream.
The process stops when the sequence of source characters is exhausted.(* NEBNF *)
|
rule runlengthdecode |
input sequence of run length encoded bytesoutput sequence of bytesattributes coreattrs'runlengthdecode' converts a run length encoded sequence of bytes in the input stream to bytes in the output stream.
The process stops either when the sequence of source characters is exhausted or an error in the encoding scheme is found.(* NEBNF *)
|
rule hextochar |
input sequence of two hexadecimal single byte charactersoutput sequence of multi-byte charactersattributes coreattrs'hextochar' converts a sequence of hex encoded characters (2 input bytes at a time) in the input stream to a multi-byte character in the output stream.
The process stops either when the sequence of source characters is exhausted or when an invalid input sequence is encountered. (* NEBNF *)
|
rule chartohex |
input sequence of ASCII charactersoutput sequence of wide charactersattributes coreattrs'chartohex' converts a sequence of mutli-byte characters in the input stream to a sequence of hexadcimal encoded characters in the output stream.
The process stops when the sequence of source characters is exhausted.(* NEBNF *)
|
rule hextowchar |
input sequence of four hexadecimal single byte charactersoutput sequence of wide charactersattributes coreattrs'hextowchar' converts a sequence of hex encoded characters (4 input bytes at a time) in the input stream to a sequence of wide characters in the output stream.
The process stops either when the sequence of source characters is exhausted or when an invlaid input sequence is encountered.(* NEBNF *)
|