External data representation and marshalling
External Data Representation (XDR) is a standard data serialization format, for uses such as computer network protocols. It allows data to be transferred between different kinds of computer systems.
Marshalling is the process of transforming the memory representation of an object to a data format suitable for storage or transmission and it is typically used when data must be moved between different parts of a computer program or from one program to another.
There are three approaches to external data representation and marshalling.
- CORBA’s common data representation
- Java’s object serialization
- XML
01. CORBA’s common data representation
CORBA’s common data representation which is concerned with an external representation of data that can be passed as arguments and results of remote invocations. CORBA CDR is the external data representation defined with CORBA 2.0. CDR can represent all of the data types that can be used as arguments and return values in remote invocations in CORBA. These consist of 15 primitive types, which include short (16-bit), long (32-bit), unsigned short, unsigned long, float (32-bit), double (64-bit), char, Boolean (TRUE, FALSE), octet (8-bit), and any (which can represent any basic or constructed type); together with a range of composite types. Each argument or result in a remote invocation is represented by a sequence of bytes in the invocation or result message. CORBA has a common external forma CDR (Common Data Representation). The interface of objects are described in CORBA IDL which is then compiled by the CORBA interface compiler and the marshaling/ unmarshaling operations are generated automatically.
Marshalling in CORBA
Marshalling operations can be generated automatically from the specification of the types of data items to be transmitted in a message. The types of the data structures and the types of the basic data items are described in CORBA IDL which provides a notation for describing the types of the arguments and results of RMI methods.
02. Java object serialization
Java provides a mechanism, called object serialization where an object can be represented as a sequence of bytes that includes the object’s data as well as information about the object’s type and the types of data stored in the object.
After a serialized object has been written into a file, it can be read from the file and deserialized that is, the type information and bytes that represent the object and its data can be used to recreate the object in memory.
Most impressive is that the entire process is JVM independent, meaning an object can be serialized on one platform and reserialized on an entirely different platform.
In Java RMI, both objects and primitive data values may be passed as arguments and results of method invocations. An object is an instance of a Java class. For example, the Java class equivalent to the Person struct defined in CORBA IDL might be:
public class Person implements Serializable
{
private String name;
private String place; private int year;
public Person(String aName, String aPlace, int aYear)
{
name = aName; place = aPlace; year = aYear;
}
// followed by methods for accessing the instance variables
}
03. Extensible Markup Language (XML)
XML is a markup language that was defined by the World Wide Web Consortium (W3C) for general use on the Web. In general, the term markup language refers to a textual encoding that represents both a text and details as to its structure or its appearance. Both XML and HTML were derived from SGML (Standardized Generalized Markup Language), a very complex markup language. HTML was designed for defining the appearance of web pages. XML was designed for writing structured documents for the Web.
XML data items are tagged with ‘markup’ strings. The tags are used to describe the logical structure of the data and to associate attribute-value pairs with logical structures. That is, in XML, the tags relate to the structure of the text that they enclose, in contrast to HTML, in which the tags specify how a browser could display the text. For a specification of XML, see the pages on XML provided by W3C [www.w3.org VI].
XML is used to enable clients to communicate with web services and for defining the interfaces and other properties of web services. However, XML is also used in many other ways, including in archiving and retrieval systems — although an XML archive may be larger than a binary one, it has the advantage of being readable on any computer.
Other examples of uses of XML include for the specification of user interfaces and the encoding of configuration files in operating systems.
XML is extensible in the sense that users can define their own tags, in contrast to HTML, which uses a fixed set of tags. However, if an XML document is intended to be used by more than one application, then the names of the tags must be agreed between them. For example, clients usually use SOAP messages to communicate with web services. SOAP is an XML format whose tags are published for use by web services and their clients.
Some external data representations (such as CORBA CDR) do not need to be self-describing, because it is assumed that the client and server exchanging a message have prior knowledge of the order and the types of the information it contains. However, XML was intended to be used by multiple applications for different purposes. The provision of tags, together with the use of namespaces to define the meaning of the tags, has made this possible. In addition, the use of tags enables applications to select just those parts of a document it needs to process: it will not be affected by the addition of information relevant to other applications.