Target Language Support in ArgoUML

Drawing UML diagrams with ArgoUML has the purpose to abstractly design something. This 'something' might be a business process or a software system. If it's a software system that the user (a development team or a single programmer) has in mind, then the final output of the development process is (mainly) a collection of source code files. (Build scripts, documentation and installation routines are also important, but not covered here). Since UML is a formal notation with a machine implemented syntax and semantics, it is desirable to provide a mapping between the model and the source code.

There are many ways how this mapping can be implemented, as it can be seen in the different available UML modeling tools. The strategy in this area has to be carefully chosen, because it is commonly known that many approaches has not been accepted by developers. They sometimes judge that it is practically useless if not disturbing, and a tool base connection between source code and model should principally be avoided.

The Vision

1. Be independent of development process methodologies

Each development team follows their own development process. It might be oriented on one of the well known (RUP, Waterfall, XP), but it need not. The ArgoUML tool allows to choose the process freely. E.g., we should not assume that modeling happens before coding. Sometimes developers concentrate on coding only and are not willing to 'feed' their UML modeling tool for a couple of weeks. Although UML freaks and modeling purists might cry out that's a nono, it should be allowed without making the model useless.

2. Make the existence of source code optional

Some people will use ArgoUML as a design tool only (although we have such a nice target language support, you know...), and they don't care about source code. Or they even don't design a software system at all. UML is not only aimed at software design, so let's not bother those users with unnecessarily required source code related settings.

3. Be cognitive and transparent

ArgoUML was designed as a cognitive, easy to use tool, and this also applies to dealing with target source code. The user will always know what this UML tool does with his/her sources. Everything claims to be easy understandable. Which means: every step results in a meaningful tool response (e.g. messages). Decisions are made by the user, and ArgoUML presents the options in a clear way.

4. Be powerful

In the UML based development process, there are two possibilities where changes could occur: either in the model (not only in the static, but also in the dynamic!) or in the source files. Whenever such a change requires synchronization, the user should be at least guided in this step. Ideally, this should happen automatically if no decisions have to be made. ArgoUML has full control when a single model element is changed. In this case, source code synchronization could be invoked immediatelly; still it's not easy, since all source files might be affected. However, since this is well implemented, ArgoUML also is a powerful refactoring tool. Changes in the source files are more difficult to handle, because other applications can modify source files (an editor or an IDE). The time stamp of all files are examined by ArgoUML.

5. Don't mess with UML refusing coders

In a development team not everybody necessarily is a fan of the UML notation. This implies that the sources might evolve without any model changes. This will be detected and properly handled (instead of prohibited). A feature is required that under all circumstances prevents the model from being unrepairable invalidated. A report of 'differences' between model and source code (this is not a precise formulation) will be generated and a GUI for resolving them is presented. Not much automation should be expected here, but a lot of help.

6. Be a composition of optional subcomponents

ArgoUML is not aimed to support just one fixed target language (historically, Java was supported only). It is extensible to allow UML design in any OO domain. For extensibility, the complex language support that meets the above described requirements cannot be implemented as a monolithic feature monster. Instead, the functionality is divided into subcomponents. Each of them can be implemented separately and are useful even if the others are missing. By this, target language support can be achieved in smaller steps, which makes it more simplen to implement.

Three subprojects

ArgoUML 0.10.1 already is capable of both reading and writing Java source code. Of course, it also reads and writes UML models; Novosoft's NSUML 0.4.19 provides the internal representation of the model. It is the technology, that makes ArgoUML 'understand' UML. On the other hand, the technology for 'understanding' Java source code is a parser generated with Antlr 2.7.1 from a Java grammar. The parser identifies all Java language constructs from the overall structure down to the atomic expressions. As a consequence, this enables us to build all desired features and implement the above presented vision.

Next we will identify a few subprojects that will build upon that technology. The criterions for them is both a useful functionality in the sense of our vision as well as beeing helpful (reusable) in the other subprojects. In fact, they are meant to form a roadmap for the specification and implementation of the target language support in ArgoUML. We'll concentrate on Java as the target language, without forgetting to be easily applicable to other languages.

1. Refactoring of existing CG/RE

In ArgoUML 0.10.1 exist two parsers that solve different tasks. One is for RE: it identifies the code pieces that need to be represented in the model (e.g. classes, operations, generalizations), collects relevant data and then calls appropriate model manipulating methods (e.g. addOperation(...), passing the method body). After a single pass of a source code file, all model elements for the static structure model are generated or, if existed before, updated. (An exception are associations, which need a second path for some algorithmical reasons.) The other Java parser resides in CG, where it collects all pieces of code that can be associated with a model element, in order to replace them by code generated from the model. These two Java parsers need to be merged into one, that is capable of solving both tasks, and even more that is needed.

Another tasks is to reorganize the implementation of the existing CG/RE functionality, such that subtasks are done by separate components, and such that extensibility and maintainability is optimized. This is important for supporting target languages other than Java. Also, there is need for a sophisticated 'glue' between those components, in the sense, that these components are optional, resulting in a useful behaviour for any combination of existing/nonexisting components. Of course, these components will be ArgoUML modules.

...

2. Model Code Synchronizer

  • diff analyzer
  • GUI for the diff results, providing controls for synchronization
  • synchronization actions

3. Model listener (requires new event model in ArgoUML)

A Few Examples

Let us consider the following two classes:

     +-----------------------+         +---------------------------------+
     |           A           |         |                B                |
     +-----------------------+         +---------------------------------+
     |- ok : boolean = false |-------->|- minValue : int = -1000         |
     |- x : int              |     blob|- maxvalue : int = 1000          |
     +-----------------------+         +---------------------------------+
     |+ A(value : int) : void|         |+ B(min : int, max : int) : void |
     |             <>|         |                       <>|
     |+ checkRange() : void  |         |+ getMinValue() : int            |
     |+ clip() : void        |         |+ getMaxValue() : int            |
     +-----------------------+         |+ checkRangeOf(x : int) : boolean|
                                       +---------------------------------+

The corresponding source code files are the following:

 1  package a.package_;
 2  import another.package_;
 3  public class A {
 4    private boolean ok = false;
 5    private int x;
 6    private B blob;
 7    public A(int value) {
 8      x = value;
 9      blob = new B(10,20);
10    }
11    public void checkRange() {
12      ok = blob.checkRangeOf(x);
13    }
14    public void clip() {
15      if (ok)
16        return;
17      if (x < blob.getMinValue())
18        x = blob.getMinValue();
19    }
20  }
 1  package another.package_;
 2  public class B {
 3    private int minValue = -1000;
 4    private int maxValue = 1000;
 5    public B(int min, int max) {
 6      minValue = min;
 7      maxValue = max;
 8    }
 9    public int getMinValue() { return minValue; }
10    public int getMaxValue() { return maxValue; }
11    public boolean checkRangeOf(int x) {
12      return x >= minValue && x <= maxValue;
13    }
14  }

1. Model: adding something new

If a static element like a class, an association or an operation is added, then the source code changes are simple. A new class/interface results in a new file, an attribute/operation is a local change in a source file, same with a derivation, and an assotiation changes up to two files (it depends on navigability). So, these simple cases are less interesting than the following.

[Any comments? Maybe I'm wrong...]

Things might become difficult if something is added to the dynamic model.

2. Model: deleting elements

This is more difficult since all references to the deleted model element must be considered, too. Two steps must be performed:

First, search all references to this element in

----------------Draft-Notes-------------------

Target language support consists of the following pluggable modules:

1. diagram notation
2. model element generator
3. source code file parser

ArgoUML behaves reasonably with any combination of provided/missing modules.


Proposals | ArgoUML Home | Developer Zone