SDMetrics is primarily a UML design quality measurement solution. It contains an XMI parser to read UML models from XMI files. The XMI parser has attracted some interest on its own. In this post, I’ll talk a bit about the reasoning behind the design decisions of the XMI parser, and discuss the consequences for practical applications other than UML design measurement.
I wrote the XMI parser as part of SDMetrics almost 10 years ago. Back then, XMI versions 1.0, 1.1, and 1.2 where in use. Meta modeling was apparently not yet well understood by UML tool vendors, because XMI files tended to be vastly incompatible between tools. I guess that’s what happens when standard specifications are difficult to understand, as seems often the case.
Anyway, for SDMetrics, I had the following requirements for the XMI parser:
- Must be able to deal with large XMI files, and process them quickly.
- Must deal with all XMI versions, and be able to iron out the “idiosyncrasies” of the various XMI exporters that hamper XMI interchange.
- Must be extensible by end users to adapt to new XMI exporters and their idiosyncrasies.
- Must be extensible by end users to extract any information they need from XMI files.
- Must be able to modify the meta model and XMI import configuration between model reloads at runtime. To keep things simple, on-the-fly Java byte code generation from meta models and class loader orgies were right out.
- Does not need to provide an exact 1:1 representation of the UML model; simplifications and approximations are OK for the purpose of design measurement.
For the overall design I more or less ignored the standard specifications (see above). Instead, I looked at as many XMI files from as many different sources as I could get my hands on: Where is the pertinent information stored in the files? How can it be extracted? From that point on, I just followed the path of least resistance to pull this data into Java objects in an orderly manner.
XSLT almost met all of the above requirements, but it turned out to be very slow and could not deal with larger input files at all. So, using a SAX parser was the next natural choice. The information what to extract from the XMI files could not be hard-coded, but had to be driven by a configuration script. And since I was dealing with XML input files anyway, it was obvious to use XML for those scripts, too.
The result is what you find documented here: an excruciatingly simple meta model and “XMI transformations” to pull the model information from XMI files. The design of the XMI parser hasn’t changed at all since its conception. The implementation hasn’t really changed much, either. The only major extension was to deal with XMI 2.0 serializations when UML2.0 was released.
The parser meets all of the above requirements. Especially in terms of speed, despite being written in Java :-). I had users report they were delighted that they could read XMI files of several hundred megabytes in a matter of minutes, where other tools ran several hours, or simply died on them. The implementation is really tiny: 10 Java classes, 1400 LOC (twice that much with comments and blank lines). Add another 1000 lines of XML code for the scripts that drive the UML2 model import.
But of course there are also some limitations. The solution supports XMI import only, and only provides an approximate representation of the UML model. This limits, for example, its use for model transformations. XMI output is not supported at all, as the mapping from the official UML meta model to the simplified meta model is strictly one way. These simplifications during model import cannot be undone. Other outputs such as source code or database scripts can be generated, but the parser does not provide any dedicated support for this (e.g. some sort of templates) – you’d have to implement that part yourself from scratch.
So, SDMetrics’ XMI parser is certainly no replacement for fully MOF-compliant meta modeling facilities such EMF. If you require detailed model-to-model transformations, and only need to support one or few XMI sources, something like EMF is clearly the tool of choice.
But if you need to flexibly deal with a large variety of XMI sources, can live with a subset of the UML, and want to be really fast, SDMetrics’ XMI parser is definitely worth a look. It is open-source, AGPL-licensed, and available here.