DSML Tools Design Document


Overview


Introduction


This product provides facilities for working with DSML and LDAP-supporting directories. DSML is Directory Services Markup Language, a XML dialect for describing directory information, including directory schema information.


Specifically, this set of tools will provide:


  1. Directory-contents to DSML conversion (LDAP2DSML), allowing export of any subset of a directory’s information as DSML

  2. DSML to Directory-modification-requests (DSML2LDAP), allowing modification of a directory based on the contents of a DSML file

  3. Diffs between two DSML files (DSMLDiff), allowing conversion of the data in one file into the other.


For a list of command-line switches available for each of these three programs, please see the Functional Specification.


Terms Used


  1. DSML – Directory Services Markup Language

  2. LDAP – Lightweight Directory Access Protocol, a protocol for directory access

  3. XML – eXtensible Markup Language, a meta-language

  4. DOM – Document Object Model, a World Wide Web Consortium recommendation for an API for manipulating trees

  5. DTD – Document Type Definition, a file which formally lays down the tagging rules for a given XML-based language. E.g. an <entry> tag contains only <objectclass> tags and <attr> tags.


References


Functional Specification:

DSMLToolsFS.doc

Netscape LDAP API

http://docs.iplanet.com/docs/manuals/dirsdk/jsdk41/Reference/

Apache Xerces API

http://xml.apache.org/apiDocs/




Architecture


These programs will be written in Java.


Libraries


We intend to use libraries from a number of different sources:


For command-line parsing, we are using GNU GetOpt. It is available under the GNU Lesser General Public License.


We need to interface with LDAP directories. For this, we will use the Netscape Java LDAP SDK – it provides LDAP client functionality, allowing connection to, and searching of, an LDAP directory. This is available under the Mozilla Public License.


We also need to parse and write DSML data. For this, we are using The Apache Group’s Xerces XML parser. It provides XML parsing capabilities, and contains an implementation of the DOM API for access to the parsed data. It is available under a BSD-style license.


Overview


In XML processing, the largest conceptual unit of data is the Document (org.w3c.dom.Document) object, made up of Node objects. This is a representation of an XML (in this case, DSML) document, as might be obtained from a Parser. Document is actually an interface; we are extending an implementation of Document to produce DSMLDocumentImpl, a class which adds validation of the DSML information stored within itself, and serializability. All Documents we create from scratch (for example, as a result of doing diffs) will be created using this class.


On the LDAP side, the Netscape LDAP libraries use LDAP-oriented objects, so we will have to convert these to a Document representation. To achieve this, we are implementing the two classes DSMLLDAPReader and DSMLLDAPWriter which implement add/delete/search methods (like the Netscape API) that take or return Document objects containing DSML (instead of Netscape’s more directory-oriented structures). This allows us to present a clean interface to the LDAP directory. Internally, those classes will convert between the two representations.


Given the above, the main sections of code to be written are:

  1. Conversion of directory data from an XML to an LDAP representation, and back again

  2. Validation of DSML in a directory-oriented manner (e.g. entries have only those attributes permitted by their objectclass)

  3. Driving classes for all five tools – command-line handling, option setting, error checking.

  4. Transformation of a Document representation into a stream of valid DSML.

  5. Code to do a directory-oriented diff of two trees.


Validation


On many occasions, we need to make sure that DSML data is valid in a directory sense, i.e.:


  1. The schema only contains attributes referenced in one or more objectclasses

  2. Each entry has only objectclasses present in the schema

  3. Each entry has only the attributes permitted by its objectclasses

  4. Each entry has all the mandatory attributes for each of its objectclasses


To make this the case, we would need to take the following actions on finding invalid bits of DSML. Note that the following requires an understanding of the structure of DSML:


  1. <attribute-type> referenced in <class> but not present – die with error.

  2. <attribute-type> unreferenced by <class> - removes <attribute-type>.

  3. <oc-value> references <class> which is not present - removes <oc-value>.

  4. Required <attr> missing from <entry> - removes <entry>.

  5. <attr> in entry not permitted by objectclasses - removes <attr>.


Directory-level Diffing


We create two new DSML Documents, called here DocAdd and DocDelete. They have no schema, and no entries. Go to the <directory-entries> container node of A and sort the <entry> nodes beneath it alphabetically on the basis of DN. Then, compare them. For each one in A but not in B, move that node from A to DocDelete. For each one in B but not in A, move that node from B to DocAdd. For each one in both, go through it looking for changes. If any are found, add the relevant sub-nodes in B to produce XML comment output of the old values, then move that node from B to DocAdd.


Classes

Key library classes


org.w3c.dom.Document

This interface represents a complete XML Document, in the form of a tree of Nodes. Each Node can have any number of children, and there are several different types of Node, such as Element, Attribute or Text nodes. For example, <A href=”fred.html>Fred</A> would be represented as an Element node for the <A> tag, with an associated Attribute node for the href attribute, and a child Text node for the value “Fred”. As you can see, the multiplicity of different node types (and the fact that Attribute nodes are not strictly part of the tree) makes Document trees fairly complex.


org.apache.xerces.parsers. DOMParser

This class is a parser which has the ability to produce a Document as output. We will be treating it mostly as a black box.


Data: a document representation of XML data

Methods: This class has many methods which we are not using, due to the way we are accessing the resultant DSML data. The key ones to be used are:

  1. void parse(InputSource) – parses the given InputSource

  2. Document getDocument() – gets the Document object from the parser


gnu.getopt.Getopt

This class works exactly like the C version of Getopt – it implements an extremely powerful command-line parser, allowing both short and long option names.


Data: Details of command-line options given.

Methods:

  1. int getopt() – returns the next option

  2. String getOptarg() – returns the argument to the last option retrieved



Internal classes


datcon.dsml.DSMLDocumentImpl extends org.apache.xerces.DocumentImpl

We extend the DocumentImpl class (a class which implements the Document interface) to add a method validate() which checks the validity of the directory information (either data, schema or both). That is, it makes sure all entries have only the attributes permitted by their objectclasses, and so on. We also add a method serialize() which will write the DSML to a given stream.


Data: DSML Document

Methods:

  1. int validate(int mode) – checks the Document tree to make sure it is valid from a directory point of view

  2. static int validate(Document, int mode)

  3. void serialize(OutputStream) – turns the tree into a stream of DSML data to write to OutputStream

  4. static void serialize(OutputStream, Document)


datcon.dsml.DSMLLDAPWriter

Implements add/delete/modify methods to take Document objects containing DSML data. This class will have internal methods to convert from the DSML Document to the LDAP representation of directory data.


The schema reconciling algorithm for inserting DSML data into the directory will be as follows:

  1. Read in both schema.

  2. Search for each objectclass from the DSML schema in the LDAP schema. If a match, fine. If no match, remove from LDAP schema and print an error.

  3. Search for each attribute from the DSML schema in the LDAP schema. If a match, fine. If no match, remove from LDAP schema and print an error.

  4. For each entry in the DSML data, check that all of its objectclasses are still present in the (possibly modified) LDAP schema. For each one missing, remove it from the entry.

  5. For each attribute in the entry, check that it is valid for at least one objectclass. Remove any attributes that are invalid.

  6. Add the entry to the LDAP directory.

Points 4 and 5 are equivalent to data-validate()ing the remaining DSML data against the modified LDAP schema, so we can create a hybrid document of the two and validate() it to achieve those steps.


Data: LDAP server connection information (LDAPConnection object)

Methods:

  1. void connect(connect info) – connects to LDAP directory

  2. void add(Document) – adds directory info in the Document to the directory

  3. void delete(Document) – deletes directory info in the Document from the directory


datcon.dsml.DSMLLDAPReader

Implements a search method that returns a Document object containing DSML data. This class will have internal methods to convert from the LDAP to the DSML Document representation of directory data.


Data: LDAP server connection information (LDAPConnection object)

Methods:

  1. void connect(connect info) – connects to LDAP directory

  2. DSMLDocumentImpl search(options) – performs an LDAP search


datcon.dsml.DSMLDOMDiff

Implements either the first or the second, depending on time, of the DOM TreeDiff algorithms, as set out above.


Methods:

  1. void diff(InDocs, OutDocs) – does the diff



End-user-accessible classes


These tie the above classes together. Because most of the complexity is encapsulated in the two classes above, these classes are relatively simple. All of them will do mundane things like command-line parsing, and store the options in member variables. It is also expected that these classes will request validation if necessary. Obviously, they all have a main() method.


datcon.dsml.LDAP2DSML

Creates a DSMLLDAPReader, performs a search according to the given criteria, and serializes the resulting Document object.


datcon.dsml.DSML2LDAP

Creates a parser and uses it to parse the input DSML. It then creates a DSMLLDAPWriter and calls add/delete/modify (depending on parameters), passing it the Document object.


datcon.dsml.DSMLDiff

Creates a parser and uses it to parse the input DSML, creating two Document objects. These are then passed to a TreeDiff implementation – probably ours, but possibly IBMs. Because IBMs approach is slightly different to that we would take, using their code will require a bit of extra work on the output. However, overall, it will produce two DSML files which allow the data in the first document to be transformed into the data in the second. This class will then serialize them.