XML files must begin with the following 3 processing instructions/declarations...
All XML files must begin with a valid XML processing instruction, for example:
<?xml version='1.0'?>
If an XSL processing instruction is required, it should be constructed like this:
<?xml-stylesheet type="text/xsl" href="bpg4-0.xsl"?>
The standard doctype for an XML file with a top-level element of content is:
<!DOCTYPE content PUBLIC "-//BLACKWELL PUBLISHING GROUP//DTD 4.0//EN" "bpg4-0.dtd" []>
It is preferable not to specify local network paths in the XML's doctype declaration. To avoid this, a simple two-line DTD-redirect file can be inserted in the directory being parsed to redirect the parser to the DTD stored on the local network.
To do this, create a file called "bpg4-0.dtd" and place it in the directory with the XML file. The contents of this file would look like this (replacing "\\server..." with an actual network path):
<!ENTITY % dtd SYSTEM "\\server\dtds\bpgdtd\4-0\bpg4-0.dtd"> %dtd;
Similarly, an XSL-redirect file called "bpg4-0.xsl" can be created:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:include href="\\server\xsls\styles\bpg4-0.xsl"/> </xsl:stylesheet>
James Clark's SP parser which parses both SGML and XML can be downloaded from http://www.jclark.com. Full documentation is included with the download, but a quick guide to parsing is included here.
The command to parse a file using nsgmls is:
nsgmls -f<error filename> -sv -c<catalog filename> <sgml or xml declaration> <file to parse>
(-sv is optional: it suppresses the on-screen messages and prints the version number of the parser. Other options are also available and are described in the parser's documentation.)
Given an SGML file called "test.sgm", to parse this, at the command prompt type
nsgmls -ferror.txt -sv test.sgm
This will parse "test.sgm" in the current directory and will put any errors into a file called "error.txt". If you don't specify an error file, the errors will display in the DOS window.
If there are any public identifiers in the SGML file, a catalog file must be created to map those public identifiers to system identifiers. The format of the catalog file is:
PUBLIC "-//BLACKWELL PUBLISHING GROUP//DTD 3.0//EN" "\\server\dtds\bpg3-0.dtd"
For example, if the DTD declaration in "test.sgm" is...
<!DOCTYPE doc PUBLIC "-//BLACKWELL PUBLISHING GROUP//DTD 3.0//EN">
and the catalog file is "c:\sp\bin\catalog.txt", the parsing command would be...
nsgmls -ferror.txt -sv -cc:\sp\bin\catalog.txt test.sgm
Some SGML DTDs come with their own SGML declaration file. If that is the case, the declaration should be specified in the command line (see below).
To parse XML with nsgmls, the SGML declaration for XML which comes with the parser ("xml.dcl") must be specified in the command line.
Given an XML file called "test.xml" and an XML declaration located in "c:\sp\pubtext\xml.dcl", the command to parse an XML file is
nsgmls -ferror.txt -sv c:\sp\pubtext\xml.dcl test.xml
The format of the error files which nsgmls creates is
program name:filename:line number:character number:error flag:error description
For example
C:\SP\BIN\NSGMLS.EXE:test.sgm:4:14:E: there is no attribute "NAME"
The DTD can be downloaded from the download page.