1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
|
---|
2 | "http://www.w3.org/TR/html4/loose.dtd">
|
---|
3 | <html>
|
---|
4 | <head>
|
---|
5 | <meta http-equiv="Content-Type" content="text/html">
|
---|
6 | <style type="text/css"></style>
|
---|
7 | <!--
|
---|
8 | TD {font-family: Verdana,Arial,Helvetica}
|
---|
9 | BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
|
---|
10 | H1 {font-family: Verdana,Arial,Helvetica}
|
---|
11 | H2 {font-family: Verdana,Arial,Helvetica}
|
---|
12 | H3 {font-family: Verdana,Arial,Helvetica}
|
---|
13 | A:link, A:visited, A:active { text-decoration: underline }
|
---|
14 | </style>
|
---|
15 | -->
|
---|
16 | <title>XML resources publication guidelines</title>
|
---|
17 | </head>
|
---|
18 |
|
---|
19 | <body bgcolor="#fffacd" text="#000000">
|
---|
20 | <h1 align="center">XML resources publication guidelines</h1>
|
---|
21 |
|
---|
22 | <p></p>
|
---|
23 |
|
---|
24 | <p>The goal of this document is to provide a set of guidelines and tips
|
---|
25 | helping the publication and deployment of <a
|
---|
26 | href="http://www.w3.org/XML/">XML</a> resources for the <a
|
---|
27 | href="http://www.gnome.org/">GNOME project</a>. However it is not tied to
|
---|
28 | GNOME and might be helpful more generally. I welcome <a
|
---|
29 | href="mailto:[email protected]">feedback</a> on this document.</p>
|
---|
30 |
|
---|
31 | <p>The intended audience is the software developers who started using XML
|
---|
32 | for some of the resources of their project, as a storage format, for data
|
---|
33 | exchange, checking or transformations. There have been an increasing number
|
---|
34 | of new XML formats defined, but not all steps have been taken, possibly because of
|
---|
35 | lack of documentation, to truly gain all the benefits of the use of XML.
|
---|
36 | These guidelines hope to improve the matter and provide a better overview of
|
---|
37 | the overall XML processing and associated steps needed to deploy it
|
---|
38 | successfully:</p>
|
---|
39 |
|
---|
40 | <p>Table of contents:</p>
|
---|
41 | <ol>
|
---|
42 | <li><a href="#Design">Design guidelines</a></li>
|
---|
43 | <li><a href="#Canonical">Canonical URL</a></li>
|
---|
44 | <li><a href="#Catalog">Catalog setup</a></li>
|
---|
45 | <li><a href="#Package">Package integration</a></li>
|
---|
46 | </ol>
|
---|
47 |
|
---|
48 | <h2><a name="Design">Design guidelines</a></h2>
|
---|
49 |
|
---|
50 | <p>This part intends to focus on the format itself of XML. It may arrive
|
---|
51 | a bit too late since the structure of the document may already be cast in
|
---|
52 | existing and deployed code. Still, here are a few rules which might be helpful
|
---|
53 | when designing a new XML vocabulary or making the revision of an existing
|
---|
54 | format:</p>
|
---|
55 |
|
---|
56 | <h3>Reuse existing formats:</h3>
|
---|
57 |
|
---|
58 | <p>This may sounds a bit simplistic, but before designing your own format,
|
---|
59 | try to lookup existing XML vocabularies on similar data. Ideally this allows
|
---|
60 | you to reuse them, in which case a lot of the existing tools like DTD, schemas
|
---|
61 | and stylesheets may already be available. If you are looking at a
|
---|
62 | documentation format, <a href="http://www.docbook.org/">DocBook</a> should
|
---|
63 | handle your needs. If reuse is not possible because some semantic or use case
|
---|
64 | aspects are too different this will be helpful avoiding design errors like
|
---|
65 | targeting the vocabulary to the wrong abstraction level. In this format
|
---|
66 | design phase try to be synthetic and be sure to express the real content of
|
---|
67 | your data and use the XML structure to express the semantic and context of
|
---|
68 | those data.</p>
|
---|
69 |
|
---|
70 | <h3>DTD rules:</h3>
|
---|
71 |
|
---|
72 | <p>Building a DTD (Document Type Definition) or a Schema describing the
|
---|
73 | structure allowed by instances is the core of the design process of the
|
---|
74 | vocabulary. Here are a few tips:</p>
|
---|
75 | <ul>
|
---|
76 | <li>use significant words for the element and attributes names.</li>
|
---|
77 | <li>do not use attributes for general textual content, attributes
|
---|
78 | will be modified by the parser before reaching the application,
|
---|
79 | spaces and line informations will be modified.</li>
|
---|
80 | <li>use single elements for every string that might be subject to
|
---|
81 | localization. The canonical way to localize XML content is to use
|
---|
82 | siblings element carrying different xml:lang attributes like in the
|
---|
83 | following:
|
---|
84 | <pre><welcome>
|
---|
85 | <msg xml:lang="en">hello</msg>
|
---|
86 | <msg xml:lang="fr">bonjour</msg>
|
---|
87 | </welcome></pre>
|
---|
88 | </li>
|
---|
89 | <li>use attributes to refine the content of an element but avoid them for
|
---|
90 | more complex tasks, attribute parsing is not cheaper than an element and
|
---|
91 | it is far easier to make an element content more complex while attribute
|
---|
92 | will have to remain very simple.</li>
|
---|
93 | </ul>
|
---|
94 |
|
---|
95 | <h3>Versioning:</h3>
|
---|
96 |
|
---|
97 | <p>As part of the design, make sure the structure you define will be usable
|
---|
98 | for future extension that you may not consider for the current version. There
|
---|
99 | are two parts to this:</p>
|
---|
100 | <ul>
|
---|
101 | <li>Make sure the instance contains a version number which will allow to
|
---|
102 | make backward compatibility easy. Something as simple as having a
|
---|
103 | <code>version="1.0"</code> on the root document of the instance is
|
---|
104 | sufficient.</li>
|
---|
105 | <li>While designing the code doing the analysis of the data provided by the
|
---|
106 | XML parser, make sure you can work with unknown versions, generate a UI
|
---|
107 | warning and process only the tags recognized by your version but keep in
|
---|
108 | mind that you should not break on unknown elements if the version
|
---|
109 | attribute was not in the recognized set.</li>
|
---|
110 | </ul>
|
---|
111 |
|
---|
112 | <h3>Other design parts:</h3>
|
---|
113 |
|
---|
114 | <p>While defining you vocabulary, try to think in term of other usage of your
|
---|
115 | data, for example how using XSLT stylesheets could be used to make an HTML
|
---|
116 | view of your data, or to convert it into a different format. Checking XML
|
---|
117 | Schemas and looking at defining an XML Schema with a more complete
|
---|
118 | validation and datatyping of your data structures is important, this helps
|
---|
119 | avoiding some mistakes in the design phase.</p>
|
---|
120 |
|
---|
121 | <h3>Namespace:</h3>
|
---|
122 |
|
---|
123 | <p>If you expect your XML vocabulary to be used or recognized outside of your
|
---|
124 | application (for example binding a specific processing from a graphic shell
|
---|
125 | like Nautilus to an instance of your data) then you should really define an <a
|
---|
126 | href="http://www.w3.org/TR/REC-xml-names/">XML namespace</a> for your
|
---|
127 | vocabulary. A namespace name is an URL (absolute URI more precisely). It is
|
---|
128 | generally recommended to anchor it as an HTTP resource to a server associated
|
---|
129 | with the software project. See the next section about this. In practice this
|
---|
130 | will mean that XML parsers will not handle your element names as-is but as a
|
---|
131 | couple based on the namespace name and the element name. This allows it to
|
---|
132 | recognize and disambiguate processing. Unicity of the namespace name can be
|
---|
133 | for the most part guaranteed by the use of the DNS registry. Namespace can
|
---|
134 | also be used to carry versioning information like:</p>
|
---|
135 |
|
---|
136 | <p><code>"http://www.gnome.org/project/projectname/1.0/"</code></p>
|
---|
137 |
|
---|
138 | <p>An easy way to use them is to make them the default namespace on the
|
---|
139 | root element of the XML instance like:</p>
|
---|
140 | <pre><structure xmlns="http://www.gnome.org/project/projectname/1.0/">
|
---|
141 | <data>
|
---|
142 | ...
|
---|
143 | </data>
|
---|
144 | </structure></pre>
|
---|
145 |
|
---|
146 | <p>In that document, structure and all descendant elements like data are in
|
---|
147 | the given namespace.</p>
|
---|
148 |
|
---|
149 | <h2><a name="Canonical">Canonical URL</a></h2>
|
---|
150 |
|
---|
151 | <p>As seen in the previous namespace section, while XML processing is not
|
---|
152 | tied to the Web there is a natural synergy between both. XML was designed to
|
---|
153 | be available on the Web, and keeping the infrastructure that way helps
|
---|
154 | deploying the XML resources. The core of this issue is the notion of
|
---|
155 | "Canonical URL" of an XML resource. The resource can be an XML document, a
|
---|
156 | DTD, a stylesheet, a schema, or even non-XML data associated with an XML
|
---|
157 | resource, the canonical URL is the URL where the "master" copy of that
|
---|
158 | resource is expected to be present on the Web. Usually when processing XML a
|
---|
159 | copy of the resource will be present on the local disk, maybe in
|
---|
160 | /usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\
|
---|
161 | (horror !). The key point is that the way to name that resource should be
|
---|
162 | independent of the actual place where it resides on disk if it is available,
|
---|
163 | and the fact that the processing will still work if there is no local copy
|
---|
164 | (and that the machine where the processing is connected to the Internet).</p>
|
---|
165 |
|
---|
166 | <p>What this really means is that one should never use the local name of a
|
---|
167 | resource to reference it but always use the canonical URL. For example in a
|
---|
168 | DocBook instance the following should not be used:</p>
|
---|
169 | <pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
|
---|
170 |
|
---|
171 |
|
---|
172 | "/usr/share/xml/docbook/4.2/docbookx.dtd"></pre>
|
---|
173 |
|
---|
174 | <p>But always reference the canonical URL for the DTD:</p>
|
---|
175 | <pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
|
---|
176 |
|
---|
177 |
|
---|
178 | "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> </pre>
|
---|
179 |
|
---|
180 | <p>Similarly, the document instance may reference the <a
|
---|
181 | href="http://www.w3.org/TR/xslt">XSLT</a> stylesheets needed to process it to
|
---|
182 | generate HTML, and the canonical URL should be used:</p>
|
---|
183 | <pre><?xml-stylesheet
|
---|
184 | href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
|
---|
185 | type="text/xsl"?></pre>
|
---|
186 |
|
---|
187 | <p>Defining the canonical URL for the resources needed should obey a few
|
---|
188 | simple rules similar to those used to design namespace names:</p>
|
---|
189 | <ul>
|
---|
190 | <li>use a DNS name you know is associated to the project and will be
|
---|
191 | available on the long term</li>
|
---|
192 | <li>within that server space, reserve the right to the subtree where you
|
---|
193 | intend to keep those data</li>
|
---|
194 | <li>version the URL so that multiple concurrent versions of the resources
|
---|
195 | can be hosted simultaneously</li>
|
---|
196 | </ul>
|
---|
197 |
|
---|
198 | <h2><a name="Catalog">Catalog setup</a></h2>
|
---|
199 |
|
---|
200 | <h3>How catalogs work:</h3>
|
---|
201 |
|
---|
202 | <p>The catalogs are the technical mechanism which allow the XML processing
|
---|
203 | tools to use a local copy of the resources if it is available even if the
|
---|
204 | instance document references the canonical URL. <a
|
---|
205 | href="http://www.oasis-open.org/committees/entity/">XML Catalogs</a> are
|
---|
206 | anchored in the root catalog (usually <code>/etc/xml/catalog</code> or
|
---|
207 | defined by the user). They are a tree of XML documents defining the mappings
|
---|
208 | between the canonical naming space and the local installed ones, this can be
|
---|
209 | seen as a static cache structure.</p>
|
---|
210 |
|
---|
211 | <p>When the XML processor is asked to process a resource it will
|
---|
212 | automatically test for a locally available version in the catalog, starting
|
---|
213 | from the root catalog, and possibly fetching sub-catalog resources until it
|
---|
214 | finds that the catalog has that resource or not. If not the default
|
---|
215 | processing of fetching the resource from the Web is done, allowing in most
|
---|
216 | case to recover from a catalog miss. The key point is that the document
|
---|
217 | instances are totally independent of the availability of a catalog or from
|
---|
218 | the actual place where the local resource they reference may be installed.
|
---|
219 | This greatly improves the management of the documents in the long run, making
|
---|
220 | them independent of the platform or toolchain used to process them. The
|
---|
221 | figure below tries to express that mechanism:<img src="catalog.gif"
|
---|
222 | alt="Picture describing the catalog "></p>
|
---|
223 |
|
---|
224 | <h3>Usual catalog setup:</h3>
|
---|
225 |
|
---|
226 | <p>Usually catalogs for a project are setup as a 2 level hierarchical cache,
|
---|
227 | the root catalog containing only "delegates" indicating a separate subcatalog
|
---|
228 | dedicated to the project. The goal is to keep the root catalog clean and
|
---|
229 | simplify the maintenance of the catalog by using separate catalogs per
|
---|
230 | project. For example when creating a catalog for the <a
|
---|
231 | href="http://www.w3.org/TR/xhtml1">XHTML1</a> DTDs, only 3 items are added to
|
---|
232 | the root catalog:</p>
|
---|
233 | <pre> <delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0"
|
---|
234 | catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
|
---|
235 | <delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
|
---|
236 | catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
|
---|
237 | <delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
|
---|
238 | catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/></pre>
|
---|
239 |
|
---|
240 | <p>They are all "delegates" meaning that if the catalog system is asked to
|
---|
241 | resolve a reference corresponding to them, it has to lookup a sub catalog.
|
---|
242 | Here the subcatalog was installed as
|
---|
243 | <code>/usr/share/sgml/xhtml1/xmlcatalog</code> in the local tree. That
|
---|
244 | decision is left to the sysadmin or the packager for that system and may
|
---|
245 | obey different rules, but the actual place on the filesystem (or on a
|
---|
246 | resource cache on the local network) will not influence the processing as
|
---|
247 | long as it is available. The first rule indicate that if the reference uses a
|
---|
248 | PUBLIC identifier beginning with the</p>
|
---|
249 |
|
---|
250 | <p><code>"-//W3C//DTD XHTML 1.0"</code></p>
|
---|
251 |
|
---|
252 | <p>substring, then the catalog lookup should be limited to the specific given
|
---|
253 | lookup catalog. Similarly the second and third entries indicate those
|
---|
254 | delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL
|
---|
255 | starts with the <code>"http://www.w3.org/TR/xhtml1/DTD"</code> substring
|
---|
256 | which indicates the location on the W3C server where the XHTML1 resources are
|
---|
257 | stored. Those are the beginning of all Canonical URLs for XHTML1 resources.
|
---|
258 | Those three rules are sufficient in practice to capture all references to XHTML1
|
---|
259 | resources and direct the processing tools to the right subcatalog.</p>
|
---|
260 |
|
---|
261 | <h3>A subcatalog example:</h3>
|
---|
262 |
|
---|
263 | <p>Here is the complete subcatalog used for XHTML1:</p>
|
---|
264 | <pre><?xml version="1.0"?>
|
---|
265 | <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
|
---|
266 | "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
|
---|
267 | <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
|
---|
268 | <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
|
---|
269 | uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/>
|
---|
270 | <public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
|
---|
271 | uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/>
|
---|
272 | <public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN"
|
---|
273 | uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/>
|
---|
274 | <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
|
---|
275 | rewritePrefix="xhtml1-20020801/DTD"/>
|
---|
276 | <rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
|
---|
277 | rewritePrefix="xhtml1-20020801/DTD"/>
|
---|
278 | </catalog></pre>
|
---|
279 |
|
---|
280 | <p>There are a few things to notice:</p>
|
---|
281 | <ul>
|
---|
282 | <li>this is an XML resource, it points to the DTD using Canonical URLs, the
|
---|
283 | root element defines a namespace (but based on an URN not an HTTP
|
---|
284 | URL).</li>
|
---|
285 | <li>it contains 5 rules, the 3 first ones are direct mapping for the 3
|
---|
286 | PUBLIC identifiers defined by the XHTML1 specification and associating
|
---|
287 | them with the local resource containing the DTD, the 2 last ones are
|
---|
288 | rewrite rules allowing to build the local filename for any URL based on
|
---|
289 | "http://www.w3.org/TR/xhtml1/DTD", the local cache simplifies the rules by
|
---|
290 | keeping the same structure as the on-line server at the Canonical URL</li>
|
---|
291 | <li>the local resources are designated using URI references (the uri or
|
---|
292 | rewritePrefix attributes), the base being the containing sub-catalog URL,
|
---|
293 | which means that in practice the copy of the XHTML1 strict DTD is stored
|
---|
294 | locally in
|
---|
295 | <code>/usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd</code></li>
|
---|
296 | </ul>
|
---|
297 |
|
---|
298 | <p>Those 5 rules are sufficient to cover all references to the resources held
|
---|
299 | at the Canonical URL for the XHTML1 DTDs.</p>
|
---|
300 |
|
---|
301 | <h2><a name="Package">Package integration</a></h2>
|
---|
302 |
|
---|
303 | <p>Creating and removing catalogs should be handled as part of the process of
|
---|
304 | (un)installing the local copy of the resources. The catalog files being XML
|
---|
305 | resources should be processed with XML based tools to avoid problems with the
|
---|
306 | generated files, the xmlcatalog command coming with libxml2 allows you to create
|
---|
307 | catalogs, and add or remove rules at that time. Here is a complete example
|
---|
308 | coming from the RPM for the XHTML1 DTDs post install script. While this example
|
---|
309 | is platform and packaging specific, this can be useful as a an example in
|
---|
310 | other contexts:</p>
|
---|
311 | <pre>%post
|
---|
312 | CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
|
---|
313 | #
|
---|
314 | # Register it in the super catalog with the appropriate delegates
|
---|
315 | #
|
---|
316 | ROOTCATALOG=/etc/xml/catalog
|
---|
317 |
|
---|
318 | if [ ! -r $ROOTCATALOG ]
|
---|
319 | then
|
---|
320 | /usr/bin/xmlcatalog --noout --create $ROOTCATALOG
|
---|
321 | fi
|
---|
322 |
|
---|
323 | if [ -w $ROOTCATALOG ]
|
---|
324 | then
|
---|
325 | /usr/bin/xmlcatalog --noout --add "delegatePublic" \
|
---|
326 | "-//W3C//DTD XHTML 1.0" \
|
---|
327 | "file://$CATALOG" $ROOTCATALOG
|
---|
328 | /usr/bin/xmlcatalog --noout --add "delegateSystem" \
|
---|
329 | "http://www.w3.org/TR/xhtml1/DTD" \
|
---|
330 | "file://$CATALOG" $ROOTCATALOG
|
---|
331 | /usr/bin/xmlcatalog --noout --add "delegateURI" \
|
---|
332 | "http://www.w3.org/TR/xhtml1/DTD" \
|
---|
333 | "file://$CATALOG" $ROOTCATALOG
|
---|
334 | fi</pre>
|
---|
335 |
|
---|
336 | <p>The XHTML1 subcatalog is not created on-the-fly in that case, it is
|
---|
337 | installed as part of the files of the packages. So the only work needed is to
|
---|
338 | make sure the root catalog exists and register the delegate rules.</p>
|
---|
339 |
|
---|
340 | <p>Similarly, the script for the post-uninstall just remove the rules from the
|
---|
341 | catalog:</p>
|
---|
342 | <pre>%postun
|
---|
343 | #
|
---|
344 | # On removal, unregister the xmlcatalog from the supercatalog
|
---|
345 | #
|
---|
346 | if [ "$1" = 0 ]; then
|
---|
347 | CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
|
---|
348 | ROOTCATALOG=/etc/xml/catalog
|
---|
349 |
|
---|
350 | if [ -w $ROOTCATALOG ]
|
---|
351 | then
|
---|
352 | /usr/bin/xmlcatalog --noout --del \
|
---|
353 | "-//W3C//DTD XHTML 1.0" $ROOTCATALOG
|
---|
354 | /usr/bin/xmlcatalog --noout --del \
|
---|
355 | "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
|
---|
356 | /usr/bin/xmlcatalog --noout --del \
|
---|
357 | "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
|
---|
358 | fi
|
---|
359 | fi</pre>
|
---|
360 |
|
---|
361 | <p>Note the test against $1, this is needed to not remove the delegate rules
|
---|
362 | in case of upgrade of the package.</p>
|
---|
363 |
|
---|
364 | <p>Following the set of guidelines and tips provided in this document should
|
---|
365 | help deploy the XML resources in the GNOME framework without much pain and
|
---|
366 | ensure a smooth evolution of the resource and instances.</p>
|
---|
367 |
|
---|
368 | <p><a href="mailto:[email protected]">Daniel Veillard</a></p>
|
---|
369 |
|
---|
370 | <p>$Id$</p>
|
---|
371 |
|
---|
372 | <p></p>
|
---|
373 | </body>
|
---|
374 | </html>
|
---|