The interfaces in UIMA have seemed rather obtuse recently.
To process a document, an analysis module is given a CAS (Common Analysis System), and returns a CAS. The CAS contains a reference to the original document, and any annotations that have been made so far. A module can ask the CAS for the original document, perform its analysis, and add any annotations back into the CAS. At the end of the process, the UIMA framework returns a CAS object which can be checked for annotation properties.
Adding annotations to a CAS is an easy process for a module. An appropriate
Annotation derivative is created for the current CAS (the CAS object is passed as a parameter to the Annotation's constructor). It is also easy to read these annotations after UIMA has finished. However, the CAS objects in each case are accessed through different classes, with different features. In an analysis module I have a
JCas object, but the results of the UIMA run are returning a
TCAS object instead.
TCAS is actually just a specialization for handling text of a standard
CAS object, but I didn't see that immediately (I should have paid closer attention).
JCas object has a method called
getCAS(), and a
CAS has a
getJCas() method, as these objects have a 1:1 correspondence. Why have two objects? I think it's to separate out the CAS concepts from the Java specific information needed to manipulate a CAS. But don't quote me.
Separating these two classes, and providing different types in different circumstances has tripped me up a bit. Maybe someone can explain the reasoning, but I've just found it annoying.
Monday, January 09, 2006