Doctypes, dtd location and parsing

The XML file

XML files must begin with the following 3 processing instructions/declarations...

1. XML processing instruction

All XML files must begin with a valid XML processing instruction, for example:

<?xml version='1.0'?>

2. XSL processing instruction

If an XSL processing instruction is required, it should be constructed like this:

<?xml-stylesheet type="text/xsl" href="bpg4-0.xsl"?>

3. DOCTYPE declaration

The standard doctype for an XML file with a top-level element of content is:

<!DOCTYPE content PUBLIC "-//BLACKWELL PUBLISHING GROUP//DTD 4.0//EN" "bpg4-0.dtd" []>

DTD/stylesheet location

It is preferable not to specify local network paths in the XML's doctype declaration. To avoid this, a simple two-line DTD-redirect file can be inserted in the directory being parsed to redirect the parser to the DTD stored on the local network.

To do this, create a file called "bpg4-0.dtd" and place it in the directory with the XML file. The contents of this file would look like this (replacing "\\server..." with an actual network path):

<!ENTITY % dtd  SYSTEM  "\\server\dtds\bpgdtd\4-0\bpg4-0.dtd">
%dtd;

Similarly, an XSL-redirect file called "bpg4-0.xsl" can be created:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:include href="\\server\xsls\styles\bpg4-0.xsl"/>
</xsl:stylesheet>

Parsing

SP parser (aka "nsgmls")

James Clark's SP parser which parses both SGML and XML can be downloaded from http://www.jclark.com. Full documentation is included with the download, but a quick guide to parsing is included here.

The command to parse a file using nsgmls is:

nsgmls -f<error filename> -sv -c<catalog filename> <sgml or xml declaration> <file to parse>

(-sv is optional: it suppresses the on-screen messages and prints the version number of the parser. Other options are also available and are described in the parser's documentation.)

Parsing SGML

Given an SGML file called "test.sgm", to parse this, at the command prompt type

nsgmls   -ferror.txt   -sv   test.sgm

This will parse "test.sgm" in the current directory and will put any errors into a file called "error.txt". If you don't specify an error file, the errors will display in the DOS window.

If there are any public identifiers in the SGML file, a catalog file must be created to map those public identifiers to system identifiers. The format of the catalog file is:

PUBLIC "-//BLACKWELL PUBLISHING GROUP//DTD 3.0//EN"    "\\server\dtds\bpg3-0.dtd"

For example, if the DTD declaration in "test.sgm" is...

<!DOCTYPE doc PUBLIC "-//BLACKWELL PUBLISHING GROUP//DTD 3.0//EN">

and the catalog file is "c:\sp\bin\catalog.txt", the parsing command would be...

nsgmls   -ferror.txt   -sv   -cc:\sp\bin\catalog.txt   test.sgm

Some SGML DTDs come with their own SGML declaration file. If that is the case, the declaration should be specified in the command line (see below).

Parsing XML

To parse XML with nsgmls, the SGML declaration for XML which comes with the parser ("xml.dcl") must be specified in the command line.

Given an XML file called "test.xml" and an XML declaration located in "c:\sp\pubtext\xml.dcl", the command to parse an XML file is

nsgmls   -ferror.txt   -sv   c:\sp\pubtext\xml.dcl   test.xml

Nsgmls error files

The format of the error files which nsgmls creates is

program name:filename:line number:character number:error flag:error description

For example

C:\SP\BIN\NSGMLS.EXE:test.sgm:4:14:E: there is no attribute "NAME"

Downloading the DTD

The DTD can be downloaded from the download page.

Back