Which should we use, HTML or XHTML, and why?
The history of HTML at W3C starts with HTML 3.2, code named Wilbur
, which was followed a few years later by HTML 4.0, then HTML 4.01. HTML 4.01 is the last version of HTML, and is also the final W3C specification to define the semantics of markup. From HTML 3.2 to HTML 4.01, the language has improved a great deal, focusing on such issues as:
XHTML 1.0 was created shortly after HTML 4.01 to help the transition of hypertext to a new generation of mark-up languages for text. XHTML 1.1 is an additional step toward a more flexible version of hypertext with the full benefits of XML architecture and integration of different technologies. Note that XHTML 1.1 has slighly improved the semantics of HTML 4.01 by including the Ruby module, used in particular languages like Japanese scripts (read the Ruby Specification for more information). For practical purposes, the discussion here will focus on HTML 4.01 and XHTML 1.0.
When we refer to the “semantics” of a language, we're referring to the meaning of a given tag. HTML 4.01 and XHTML 1.0 assign the same semantics to their elements and attributes. For example, an element address
has exactly the same meaning in HTML 4.01 and XHTML 1.0: they're both used to mark up addresses. Only bits of the syntax varies between the two languages. For example :
<img alt="Portrait Murakami Haruki" src="/images/murakami.jpg"> <p lang="fr">Je levai la tête pour regarder les étoiles. Leur vue apaisa peu à peu les battements de mon coeur.</p> <p><cite class="title">Chroniques de l'oiseau à ressort</cite> - <cite class="author">Haruki Murakami</cite></p>
<img alt="Portrait Murakami Haruki" src="/images/murakami.jpg" /> <p xml:lang="fr">Je levai la tête pour regarder les étoiles. Leur vue apaisa peu à peu les battements de mon coeur.</p> <p><cite class="title">Chroniques de l'oiseau à ressort</cite> - <cite class="author">Haruki Murakami</cite></p>
The syntax in these examples are still very similar and there are only a few differences between them.
Both languages come in three flavors: Frameset, Transitional and Strict. The "strict" version is strongly recommended by the W3C for regular documents. Using strict versions removes problematic elements as well as forcing a significant separation between the structure of your document and its presentation. Transitional versions allow deprecated elements to assist those implementers to upgrade smoothly their software or their content.
Is there any advantage to using HTML 4.01 over XHTML 1.0? There is no simple answer and the benefits you will gain are tied to how you're using the language in a given situation.
Switching from HTML 4.01 to XHTML 1.0 brings almost no direct benefits for the visitors of your Web site; still, there are several good reasons for Web authors to make the switch:
XML syntax rules are far more rigorous than HTML. As a result, XHTML makes authors work more precisely, having to address issues such as:
In HTML, case, quotes, termination of many elements and uncontained elements are allowed and commonplace. The margin for errors in HTML is much broader than in XHTML, where the rules are very clear. As a result, XHTML is easier to author and to maintain, since the structure is more apparent and problem syntax is easier to spot.
As you are probably aware by now, XHTML 1.0 is the reformulation of HTML 4.01 in XML. Therefore, XHTML documents are hypertext documents and XML documents. A powerful technology has been developed at W3C to manipulate and transform XML documents: the Extensible Style sheet Language Transformations (XSLT). This technology is tremendously useful to create various new resources automatically from an XHTML document. For example
The syntax rules defined by XML are far more consistent than those found in HTML and therefore easier to explain than the SGML rules on which HTML is based.
When the new version of XHTML becomes a recommendation, XHTML 1.0 documents will be easily upgradable to this new version, to allow to take advantages of its exciting new features. It's likely that an XSLT style sheet will be available by then to help you move your XHTML 1.0 (strict) documents to XHTML 2.0 documents.
Yes, HTML 4.01 is as valuable as XHTML 1.0 in a daily usage. The syntax proposed by XHTML 1.0 has several important benefits. The weight of these benefits has to be evaluated in the context of your project: Use the right tool for the right job.
For a Web designer, starting to use XHTML 1.0 will be helpful in some circumstances and will certainly help you to smoothly negotiate the future. XHTML 1.0 gives a wonderful opportunity to learn about XML languages and their possibilities without having to learn new semantics because you're working with familiar tags and attributes.
Please see the WaSP article Common ideas between HTML and XHTML .
You can read more about using XSLT and XHTML together at the W3C's Web site.
For clarification and discussion on this topic, please address your comments and questions to the W3C Web Standards Education list.
To subscribe to the list, send an email to [email protected] with “Subject: subscribe”. You can read archived posts at http://lists.w3.org/Archives/Public/public-evangelist/.