Summary: This module describes XML (eXtensible Markup Language) and the rules that govern its usage. It also explains what a well-formed and valid document is.
<para>
This is a paragraph in <term>CNXML</term>. Notice that the markup
contains tags that express the meaning of the text.
</para>
<para> and
</para> are the tags that
enclose the text. In XML, tags are always marked by angle
brackets (also known as < and
>). Tags generally come
in pairs. An opening tag will look like
<tagname>. A closing tag will look
like </tagname>, with a
/ preceding the tag name.
<u> and
<i>, which underline and italicize
text respectively. This does not express content information,
only formatting. XML allows you to define your own language
of tags to represent content. You could create a tag called
<book> to represent book titles,
and create a stylesheet (a separate formatting document), that
says that every <book> tag should
be italicized or underlined. Then when you want to change the
presentation of that type of content, you just change one
small part of the stylesheet. Also, if you make tags that
convey the content of the document, you can enable better
searching. For example, you might look for the author of a
document by looking at the author tag.
<module> and a
closing tag looks like
</module>. There is a
shortcut. If your tag contains no other tags (referred to
as an empty tag), then you can can type a /
before the end of the opening tag and delete the closing
tag. For example,
<media> </media>
can be abbreviated
<media/>.
<b>red <i>and</i>
blue</b> is fine, but
<b>red <i>and</b>
blue</i>is incorrect because the
<b> and
<i> tags have overlapping
content.
<module id="m0001">
and
<module id='m0001'>
are fine, but
<module id=m0001>
is incorrect.
<?xml version="1.0"?>
You can also include other information such as the
encoding of the document or whether the document depends
on other files or not.
<html> and
</html> must surround all of
the other tags. There are some things that are included
at the top of the document that are not tags and that are
not included with the tags. The XML declaration is an
example of this.
& refers to an ampersand (&)< refers to a less-than symbol (<)> refers to a greater-than symbol (>)" refers to a double-quote mark (")' refers to an apostrophe (')
<para id="p1">The firm was known as Scrooge and Marley.</para>
you could replace 'and' with the entity reference &:
<para id="p1">The firm was known as Scrooge & Marley.</para>
& and ends with ;.
&#, or they begin with &#x, and they end with a semicolon ;. A character reference contains a representation of a Unicode code point: if it begins with &#, then it contains a decimal representation of a Unicode code point; if it begins with &#x, then it contains a hexidecimal representation of a Unicode code point.
00F8, and the decimal representation for the same is 248. Therefore, the character references for the small 'o' with a stroke are ø and ø
So you could write
<emphasis>The majestik møøse</emphasis>
or
<emphasis>The majestik møøse</emphasis>
or even
<emphasis>The majestik møøse</emphasis>
to get
The majestik møøse