Sunday, April 11, 2010

Exercise_Week 5

XML: eXtensible Markup Language

What is XML?

  • XML is not a markup language like HTML, it is a language used to describe a markup language. The technical term for such a language is meta language
  • Using XML a developer can define markup languages which describe electronic circuitry, information for electronic data interchange, the files produced by Web servers, mechanical parts of aircraft and so on.
  • A developer define a particular language using XML and a tool or utility then takes XML documents which contain text expressed in the language and carries out some process such as converting it to an MS Word document or into some other form.

XML as Internet standard

The major goals of the language are as follows.

  • That it should be easy to use in the Internet.
  • Capable of supporting a large number of applications eg browsers, search engines.
  • That it should be compatible with SGML, the text processing language which was the inspiration for HTML.
  • That it should not be a complicated process to develop processors for documents written in languages defined in XML.
  • Easy to write a program to check that a source text reflects its definition.
  • That the number of optional facilities of the language should be very low.
  • That XML documents should be easy to read and understand.
  • That document written using a language defined by XML should be easy. to develop using simple editors.

XML at work

  • For structuring data on the Web
  • Hierarchical model
  • Define your own tags
  • Tags are like field names in a database
  • Tags define elements
  • Elements may have attributes
  • Tags are case sensitive

Building block of XML

  • Elements
  • Tags
  • Attributes
  • Entities
  • PCDATA
  • CDATA

Elements and tags

  • Elements
    • The main building block
    • May hold data, other elements, or be empty.
  • Tags
    • “Markup” (delimit) elements

Attributes

  • Provide extra information about elements
  • Placed inside the starting tag of an element
  • Always come in name/value pairs

PCDATA

  • Parsed character data
  • Text between the start tag and the end tag of an XML element
  • Will be parsed by a parser
    • Tags inside the text will be treated as markup
    • Entities will be expanded

CDATA

  • Character data
  • Will be ignored parsed by a parser
    • Tags NOT treated as markup
    • Entities NOT expanded

What is XSLT?

  • A language for transforming between XML vocabularies
  • Standardized by the World Wide Web Consortium
  • Integrated with .NET, Apache, IE, Java, Oracle, …
  • XSLT is a slightly new twist:
    • Not a scripting language
    • Not, strictly-speaking, “open source”
  • But not radically different:
    • Typically interpreted
    • Open source implementations
    • Not vendor dominated

XSLT as a programming language

  • XSLT has
    • looping constructs
    • function calling
    • variables
    • parameters
    • math functions
    • module combination

XSLT summary

  • XSLT can be used to compute anything that can be computed
  • It is “Turing complete”
  • But….
    • Certain coding paradigms are VERY awkward
    • Non-textual Input/Output is typically not possible
  • Scripting languages are designed to be general purpose
  • “Modern” scripting languages go well beyond “scripting”
  • They are general purpose multi-paradigm languages
    • But XSLT wins for specificity
  • XSLT has deep native support for XML
  • “Built-in”, highly consistent parser
  • “XPath” XML navigation (“query”) language
  • Special syntax for working with elements
  • XSLT makes input-based recursion easy:
    • Sections within sections within …
    • Part descriptions within part descriptions …
  • XSLT automatically selects the right rule to go with the right element in the input document
  • XSLT keeps track of context: namespaces, current node, current node list etc.
  • Relevant contexts are in both the stylesheet and the document.
  • In most traditional programming languages, the programmer would have to be explicit
Reference: Eustace, K. (2004). Lecture 5: XML: eXtensible Markup Language [PowerPoint slides]. Retrieved from Charles Sturt University, Melbourne Study Centre

1 comment:

  1. XML (Extensible Markup Language) is a set of rules for encoding documents electronically. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards.
    XML’s design goals emphasize simplicity, generality, and usability over the Internet. Good article Kunal!

    ReplyDelete