Libraries

Libraries are reusable building blocks for building parsers.


Libraries contain rules, entities and other logic that can be reused across applications. They are built-in, defined by users or are configured dynamically at runtime. Libraries provide the opportunity to not only reuse declarative grammars but also introduce algorithmic processes and procedures into applications that are opaque in their operation. There are five types of grammar library:

Internal Libraries

Internal grammar libraries are built into NETS and cannot be changed. They provide features like character classification and common encoding/decoding algorithms.

Library Name Description
Character Type Library The nets-ctype grammar library defines common character classification rules and entity references. Variants are provided for multi byte and wide character strings.
Utility Library The nets-utility grammar library defines rules for echoing input to output, progress monitoring, sending output to the null stream, line counting and hexdumps.
Encoding Library The nets-encoding grammar library defines common encoding and decoding rules. These include asciitowchar, wchartoascii, base64encode and base64decode to name a few.
XML Library The nets-XML grammar library defines common XML entity references for XML derived grammars. Also included in this package is XML encoding.

External Libraries

Executable external libraries provide a way of extending Nets. Included in NETS is the 'iconv' library. Users can develop their own external grammar libraries (see the Code section of this guide for more details). NETS searches in the -grammar_libpath path for DLLs and shared libraries that conform to the NETS grammar library interface specification or the current directory if no -grammar_libpath is specified. If they are found they are automatically loaded and their rules and entities made available to the NETS runtime.

Library Name Description
Iconv Library The nets-iconv grammar library supports conversion of strings using any combination of source and target character using the POSIX standard 'iconv' function. The Iconv Library is an external grammar library.
Encoder-Decoder Library An extensible package of utilities.

System Runtime Grammars

In addition to the built-in and external grammars, other grammars are dynamic and defined at runtime.

Library Name Description
Context Grammar The context grammar defines entities derived from command line and environment parameters. It includes entities like 'grammar', 'input' and 'output'.
Command Grammar The command grammar defines operating system commands that can be referenced with a grammar.

To view the full list of installed grammars and their rules and entities run NETS with the -printlibrary and -grammar_xml=grammar-xml to generate an XML file containing all the user defined grammars and rules in the current execution including the context grammar.

>nets-parser -loglevel=3 -grammar=default.g -grammar_xml=default.xml -printlibrary -input=default.in -output=default.out

Context Grammars

The context grammar is generated at runtime from the NETS command line parameters and system environment variables. The -loadenv command line parameter forces all environment variables to be loaded into the context grammar. A typical example of the context grammar is shown below. The -grammar_xml command line parameter can be used to report on the context for any run of the parser. Treating command parameters and environment variables in this way enables them be referenced easily within the grammar through Entity References. The locale (shown below) is defined when Nets runs and is taken from the current system locale.

# Command Line
>nets-parser -loglevel=3 -grammar=default.g -input=default.in -output=default.out -grammar=default.g -grammar_xml=grammar.xml -error=stderr

<!-- grammar.xml -->
<grammar id="context">
  <entity id="ini">default.ini</rule>
  <entity id="error">pipe:stderr</rule>
  <entity id="grammar_start">start</rule>
  <entity id="loglevel">3</rule>
  <entity id="input">default.in</rule>
  <entity id="output">default.out</rule>
  <entity id="grammar">default.g</rule>
  <entity id="grammar_xml">grammar.xml</rule>
  <entity id="locale">English_United States.1252</rule>
</grammar>

Command Grammars

The command grammar is loaded at runtime from the NETS 'command.g' grammar. It includes the 'command' rule which is part of the built-in library. NETS searches in the -grammar_libpath path for command.g grammar file. In some installations (such as cloud environments) -grammar_libpath is not available, and so cannot be overridden, so only a fixed set of commands are made available to users.

<!-- GXML -->
<grammar id="nets_command">
  <rule id="cp">
    <ruleref idref="command" name="copy"/>
  </rule>
  <rule id="rm">
    <ruleref idref="command" name="del"/>
  </rule>
  <rule id="command"/>
</grammar>

Name Description
rule command input none output none attributes name, parameters
Executes the command identified by name in 'command.g' with parameters identified by the 'parameter' Attribute.
(* NEBNF *)
start = command name="cp" parameter="${input} default.out"

Core Attributes

Core attributes are the set of attributes used by all rules and include id, input, output, encoding, echo, predicate, onbefore, onafter, onerror, loginfo and logmessage. They are denoted by 'coreattrs' in the library definitions that follow.

Character Type Library

The nets-ctype grammar library defines common character classifications and entity references. This library also supports custom character ranges using a minimum to a maximum value. Variants are provided for multi byte and wide character strings.

Name Description
rule alnum input single multi-byte character in the current locale output none
attributes coreattrs
A union of the alpha and digit character classifications
rule alpha input single multi-byte character in the current locale
output none
attributes coreattrs
Any alphabetic character in the current locale
rule ascii input single multi-byte character in the current locale
output none
attributes coreattrs
Any ASCII 7 bit character in the current locale
rule char input single multi-byte character in the current locale
output none
attributes coreattrs
Any character in the current locale
rule cntrl input single multi-byte character in the current locale
output none
attributes coreattrs
Any control character in the current locale
rule digit input single multi-byte character in the current locale
output none
attributes coreattrs
Any decimal digit in the current locale
rule lower input single multi-byte character in the current locale
output none
attributes coreattrs
Any lower case letter in the current locale
rule print input single multi-byte character in the current locale
output none
attributes coreattrs
Any printable character in the current locale
rule punct input single multi-byte character in the current locale
output none
attributes coreattrs
Any punctuation character in the current locale
rule space input single multi-byte character in the current locale output none
attributes coreattrs
Any space character in the current locale
rule upper input single multi-byte character in the current locale
output none
attributes coreattrs
Any upper case character in the current locale
rule xdigit input single multi-byte character in the current locale
output none
attributes coreattrs
Any hexadecimal digit in the current locale
rule range input single multi-byte character in the current locale
output none
attributes coreattrs, min, max
Any character with the min and max range defined in the current locale
rule walnum input single wide character in the current locale
output none
attributes coreattrs
A union of the walpha and wdigit character classifications
rule walpha input single wide character in the current locale
output none
attributes coreattrs
Any alphabetic wide character in the current locale
rule wascii input single wide character in the current locale
output none
attributes coreattrs
Any ASCII 7 bit wide character in the current locale
rule wchar input single wide character in the current locale
output none
attributes coreattrs
Any multibyte wide character in the current locale
rule wcntrl input single wide character in the current locale
output none
attributes coreattrs
Any control wide character in the current locale
rule wdigit input single wide character in the current locale
output none
attributes coreattrs
Any decimal wide digit in the current locale
rule wlower input single wide character in the current locale
output none
attributes coreattrs
Any lower case wide character in the current locale
rule wprint input single wide character in the current locale
output none
attributes coreattrs
Any printable wide character in the current locale
rule wpunct input single wide character in the current locale
output none
attributes coreattrs
Any punctuation wide character in the current locale
rule wspace input single wide character in the current locale
output none
attributes coreattrs
Any space wide character in the current locale
rule wupper input single wide character in the current locale
output none
attributes coreattrs
Any upper case wide character in the current locale
rule wxdigit input single wide character in the current locale
output none
attributes coreattrs
Any hexadecimal wide digit in the current locale
rule range input single wide character in the current locale
output none
attributes coreattrs, min, max
Any wide character with the min and max range defined in the current locale
entity \a input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x07 beep
entity \b input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x08 backspace
entity \t input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x09 horizontal tab
entity \n input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x0A newline
entity \v input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x0B vertical tab
entity \f input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x0C form feed
entity \r input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x0D carriage return
entity \" input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x22 double quote
entity \' input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x27 single quote
entity \? input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x3F question mark
entity \\ input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
x5C backslash
rule \xhh input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
Any character code hh defined in hexadecimal

XML Library

The nets-xml grammar library defines common entity references for grammars using XML grammar notation.

Name Description
entity &lt; input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
Less than
entity &gt; input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
Greater than
entity &amp; input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
Ampersand
entity &apos; input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
Single quote
entity &quot; input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
Double quote
entity &#nnn; input single multi-byte or wide character in the current locale
output single multi-byte or wide character in the current locale
attributes coreattrs
Any character code nnn defined in decimal
rule toXML input streamed DOM (SDOM)
output wide character XML
attributes coreattrs
Converts a streamed DOM (SDOM) to wide character XML.
rule toTextTree input streamed DOM (SDOM)
output wide character XML
attributes coreattrs
Converts a streamed DOM (SDOM) to a text tree showing all DOM details.

Utility Library

The nets-utility internal library contains utility rules.

Name Description
rule echo input sequence of bytes
output sequence of bytes
attributes coreattrs, echo_file, max
'echo' copies bytes from the input stream to the output stream without modification. The echo_file attribute defines a stream which will contain a copy of the input. This is useful when deubbing pipes between processes. Max sets the maximum number of bytes to be copied.

(* NEBNF *)
start = echo echo_file="tempfile.tmp" ;
rule pv input sequence of bytes
output sequence of bytes
attributes coreattrs, sample_time
'pv' is modeled on the Unix/Linux pv command which monitors progress of data through a pipe. pv echos bytes from input to output and generates progress information in the error stream. If the input stream is a file, pv reports a percentage complete in the error stream. If the input stream is a pipe, pv reports bytes processed in the error stream. pv reports progress every n seconds determined by the sample_time Attribute. The following example reports progress every ten seconds.

(* NEBNF *)
>start = pv sample_time="10" ;
rule null input sequence of bytes
output sequence of bytes to null
attributes coreattrs
'null' takes a sequence of bytes from the input and output them to the null device.

(* NEBNF *)
start = null;
rule newline, nl input none
output none
attributes none
Increments the input stream's newline count and newline offset. Used in grammars to keep track of the line number and byte count for input lines. 'newline' and 'nl' are synonyms.
rule newlineout, nlo input none
output none
attributes none
Increments the output stream's newline count and newline offset. Used in grammars to keep track of the line number and byte count for output lines. 'newlineout' and 'nlo' are synonyms.
rule hexdump input sequence of bytes
output ASCII hexdump of bytes
attributes none
'hexdump' takes a sequence of bytes from the input and output and creates a hexdump style output.

(* NEBNF *)
start = hexdump;

Output
00000000 | 48 65 6C 6C 6F 20 57 6F 72 6C 64 | Hello World

Iconv Library

The nets-iconv grammar external library provides access to the commonly used 'iconv' character encoding/decoding utility.

Name Description
rule iconv input sequence of characters
output sequence of characters
attributes coreattrs, input_encoding, output_encoding
Iconv converts a sequence of characters in the input stream to a sequence of character in the output stream. The process stops either when the sequence of source characters is exhausted or an error in the conversion occurs. The Attributes input_encoding and output_encoding refer to any valid combination of input and output encoding supported by the 'iconv' application installed on the system. The following example shows iconv configured to convert ASCII input to WCHAR_T output.

(* NEBNF *)
start = iconv input_encoding="ASCII" output_encoding="WCHAR_T";

Enocoder (Decoder) Library

The nets-encoder built-in grammar library provides access to commonly used encoding/decoding utilities.

Name Description
rule asciitowchar input sequence of ASCII characters
output sequence of wide characters
attributes coreattrs
'asciitowchar' converts a sequence of ASCII characters in the input stream to a sequence of wide character in the output stream. The process stops either when the sequence of source characters is exhausted. No checking is carried out on the validity of the ASCII character.

(* NEBNF *)
start = asciitowchar;
rule wchartoascii input sequence of wide characters
output sequence of ASCII characters
attributes coreattrs
'wchartoascii' converts a sequence of wide characters in the input stream to a sequence of ASCII character in the output stream. The process stops either when the sequence of source characters is exhausted. No checking is carried out on the suitability of the wide character and its validity of the ASCII character.

(* NEBNF *)
start = wchartoascii;
rule base64encode input sequence of bytes
output sequence of base64 encoded bytes
attributes coreattrs
'base64encode' converts a sequence of bytes in the input stream to a sequence of base64 encoded bytes in the output stream. The process stops when the sequence of source characters is exhausted.

(* NEBNF *)
start = base64encode;
rule base64decode input sequence of base64 encoded bytes
output sequence of bytes
attributes coreattrs
'base64decode' converts a sequence of base64 encoded bytes in the input stream to a sequence of bytes in the output stream. The process stops either when the sequence of source characters is exhausted or an invalid base64 sequence is encountered.

(* NEBNF *)
start = base64decode;
rule runlengthencode input sequence of bytes
output sequence of run length encoded bytes
attributes coreattrs
'runlengthencode' converts a sequence of bytes in the input stream to a sequence run length encoded bytes in the output stream. The process stops when the sequence of source characters is exhausted.

(* NEBNF *)
start = runlengthencode;
rule runlengthdecode input sequence of run length encoded bytes
output sequence of bytes
attributes coreattrs
'runlengthdecode' converts a run length encoded sequence of bytes in the input stream to bytes in the output stream. The process stops either when the sequence of source characters is exhausted or an error in the encoding scheme is found.

(* NEBNF *)
start = runlengthdecode;
rule hextochar input sequence of two hexadecimal single byte characters
output sequence of multi-byte characters
attributes coreattrs
'hextochar' converts a sequence of hex encoded characters (2 input bytes at a time) in the input stream to a multi-byte character in the output stream. The process stops either when the sequence of source characters is exhausted or when an invalid input sequence is encountered.

(* NEBNF *)
start = hextochar;
rule chartohex input sequence of ASCII characters
output sequence of wide characters
attributes coreattrs
'chartohex' converts a sequence of mutli-byte characters in the input stream to a sequence of hexadcimal encoded characters in the output stream. The process stops when the sequence of source characters is exhausted.

(* NEBNF *)
start = chartohex;
rule hextowchar input sequence of four hexadecimal single byte characters
output sequence of wide characters
attributes coreattrs
'hextowchar' converts a sequence of hex encoded characters (4 input bytes at a time) in the input stream to a sequence of wide characters in the output stream. The process stops either when the sequence of source characters is exhausted or when an invlaid input sequence is encountered.

(* NEBNF *)
start = hextowchar;