Semantics and Data Modeling: Modeling Workflows Semantically

Workflows are essential parts of nearly all document-centric systems, to the extent that many applications tend to encode workflows directly into the operational logic of the application itself. However, over the years I've found that it makes a great deal of sense to look at "soft" workflows - where the specific actions that a document can take at any time can be modeled as a graph.

A standard publishing workflow provides a good example of what such a graph looks like:

Publishing State Diagram

T
One of the first things that becomes evident with such a graph is that a workflow is very seldom a linear sequence of items. Instead, you can think of such a graph as a combination of states (such as New, Editable or Published) that identify a particular view (possibly editable) that a document is in, along with a set of actions (Create, Edit, Review, Publish, etc.) that connect these states.

The actions are links that connect nodes, and it's entirely possible for a single node to have multiple outbound and inbound links - and even have re-entrant links such as is the case for the Editable State and the Edit Action. However, it is precisely the fact that you have referential links that makes describing such workflows in languages such as XML problematic ... and why it makes the use of RDF ideal for the same task.

From a conceptual standpoint, each state, action and workflow "term" should be considered unique. From an RDF standpoint, each action is a predicate that joins a subject node (the starting state) and an object node (the ending state). Consequently, you could describe the state diagram via a series of RDF triples:

<State:New> <Action:Edit> <State:Editing>.

<State:New> <Action:Review> <State:Approval>.

<State:Approval> <Action:Publish> <State:Published>.

<State:Approval> <Action:Edit> <State:Editing>.

<State:Editable> <Action:Review> <State:Approval>.

<State:Editable> <Action:Edit> <State:Editable>.

...

Each term or resource identified above will have additional information. For instance, the set of all states and actions within a given workflow should be identified as being a part of that workflow, and the class of a given state or action should also be identified. For instance, the following is used to both identify a given workflow and to bind the Approval State to that workflow, as well as provide a label for the current state that the document is in:

<Workflow:Publishing> <rdf:type> <Class:State>.

<State:Approval> <rdf:type> <Class:State>.
<State:Approval> <State:Workflow> <Workflow:Publishing>.
<State:Approval> <owl:label> "Document Review".

Similarly, each action can also be defined:

<Action:Publish> <rdf:type> <Class:Action>.

<Action:Publish> <Action:Workflow> <Workflow:Publishing>.

<Action:Publish> <owl:label> "Publish".

Finally: a helper SPARQL update statement can make the relationship between actions and states explicit at the class level.

insert {?startState <State:HasAction> ?action} where
{
?startState ?action ?endState.
?startState <rdf:type> <Class:State>.
?endState <rdf:type> <Class:State>.

}

For the Approval State, this will add the following:

<State:Approval> <State:HasAction> <Action:Publish>.

<State:Approval> <State:HasAction> <Action:Edit>.

This makes it possible, for a given state, to determine what particular actions are available on that state.

While this describes the particular states, it's important to understand that a given document or resource participates in a Workflow by binding the document to a given state (and typically binding the workflow itself to the class of the document). For instance, consider a blog entry called "My First Blog Post", with identifier <BlogPost:MyFirstBlogPost>. This would be bound to a workflow by using a property called <Doc:WorkflowState> and <Doc:Workflow> respectively:

<BlogPost:MyFirstBlogPost> <rdf:type> <Class:BlogPost>;
<WorkflowDoc:WorkflowState> <State:Approval>.

<Class:BlogPost> <WorkflowDoc:Workflow> <Workflow:WF1>.

I'm using the WorkflowDoc: namespace here as a base type for all documents that participant in workflows in the system, with the assumption that the BlogPost class is a subclass:

<Class:BlogPost> <owl:subClassOf> <Class:WorkflowDoc>.

Note that if $doc and $action are the terms for a specific document and action, you can set the next state as follows:

delete {$doc <WorkflowDoc:WorkflowState> ?oldState}
insert {$doc <WorkflowDoc:WorkflowState> ?newState}
where
{
$doc <WorkflowDoc:WorkflowState> ?oldState.
?oldState $action ?newState.
}

(The DELETE INSERT construct is especially useful for replacing old property assertions with new ones, as both the DELETE and INSERT clause use the variables that are in scope in the WHERE clause.)

This is a fairly simplified model, and doesn't take into account user or group permissions, which in general make use of a surprisingly similar model. I'll be covering that in the next posting.

The publishing workflow described here is pretty standard, but it should be noted that any workflow can be described in a similar manner, from wizards to complex business orchestration. In the latter situation where the action is initiated by a system event rather than a user one, the action taken may very well be determined by testing against query conditions. Unless triggering is supported in your SPARQL database, this will likely be initiated from an external condition, such as a call from an XQuery engine, which would retrieve the items that are in a given state and have additional internal conditions, such as the following:

# for the $action <Action:Promote>, this will change the state
# of the document to <State:Promoted> if this action is supported
# in the workflow and if the blog post has more than three "favorites"
# on the post itself

delete {?doc <WorkflowDoc:WorkflowState> ?oldState}
insert {?doc <WorkflowDoc:WorkflowState> ?newState}
where
{
$action <owl:sameAs> <Action:Promote>.
?doc <WorkflowDoc:WorkflowState> ?oldState.
?oldState $action ?newState.
?doc <BlogPost:numFavorites> ?numFavorites.
FILTER (?numFavorites > 3)
}

The <owl:sameAs> predicate, by the way, is immensely useful for switches and conditions with parameters, as illustrated in the above. It provides an identity relationship, and in general every term should have one of the form:

<foo:bar> <owl:sameAs> <foo:bar>.

One additional consideration is indicating where a given document starts and terminates within a workflow. This can be established on the workflow object itself:

<Workflow:Publishing> <Workflow:StartState> <State:New>.

<Workflow:Publishing> <Workflow:EndState> <State:Purge>.

The workflow models given here make no assumptions about user or group permissions, which should be seen as being orthogonal to the workflow - a document must take into account user permissions when transitioning between states, but these are conditions in the WHERE clause. My next SPARQL post will cover user and group permissions, and show how they tie into workflow models.

Finally, it should be noted that workflows as discussed here use the same underlying concept as they do in most enterprise systems - a graph that describes the transitions between states via actions over time, rather than the somewhat vaguer definition of how a person in his or her day to day routine accomplishes a task, though for a sufficiently complex graph the former should approach the latter.

Semantics and Data Modeling

Sunday, January 6, 2013

Modeling Workflows Semantically

No comments:

Post a Comment

About Me