Wednesday, March 14, 2012

About business logic

Introduction
An interesting and really non-trivial question in application development, especially important for business applications, is "How to package/organize the business logic?". By business logic I mean the code that does business-relating stuff, like paying the invoice or processing the purchase order.
I will review potential choices we have and explain the direction chosen for VITA applications.

VITA entities and business logic
If you looked at VITA sample code, you had seen the implementation of some entities - simple data objects that carry the values from database table rows. You might have asked the question - how to turn these entities into business objects with some custom methods? There does not seem to be a way to do it, simply from the way things are organized. Entities defined in your code as interfaces; real objects are created from classes which are dynamically IL-generated at runtime. You cannot add method implementation to interface, and you cannot define a method in an inherited class - the base class implementing the entity interface does not exist yet (at design time).
The answer is - you can't do this, you can't turn entities into "business objects". But it is not necessarily a show stopper - the idea of encapsulating logic with data might not be a good idea after all. I will explain why in the following sections.

Classic OOP approach - how things can go wrong
Classic OOP suggests to package the functionality related to data into the classes that contain the data. This is called "encapsulation" in OOP land. Such packaging makes it possible to implement polymorphic behavior by sub-classing and overriding methods - which is a good thing, no doubt.  This all makes sense, as long as we stay with textbook examples about animals, mammals, cats and dogs - perfect for illustrating the principles.
This classic OOP approach works extremely well in some real-life scenarios, for example - libraries of UI controls. The very base _Control_ class defines a number of properties and behaviors (methods) which are progressively overridden and extended in sub-classes, up to a very sophisticated pieces like grids and tree views.
However, using this approach does not always work well - complex business applications (LOBs) is one example. Let us illustrate the problem with some fictitious anti-pattern story.
We start a new application from scratch. The initial "data" objects are derived from real-world objects, lists of properties are figured out and implemented - things like Invoice, PurchaseOrder, etc. We create a database schema, and write some data access methods and SQLs to read/write the objects.
The next step is building some rough UI for editing and viewing the objects. The UI elements need some code support for properly showing the data, so "business logic" appears. We start adding methods and "smart" properties to our classes that used to be simple data containers - but not anymore, they are now "business objects". But everything works well so far, all according OOP principles - things are encapsulated and isolated, and we have a nice clean API.
Next, it is time for more complex processing; some of which should be done in the "background" batches. We start adding more methods for these processes, right into our business objects. When we start running the batches, we notice that in some cases the UI-related behavior gets activated - like automatic load of referenced objects we implemented for UI.  It worked great for UI, but now it gets annoyingly too automatic - a lot of extra queries are executed,  with data that never used, and it slows down the process considerably. Quite possibly, we can fix it with a few tweaks, and things are working well again, both for UI and background process.
But then we starting adding more processes (billing, invoicing, posting to general ledger, doing payments, receipts, refunds) - and things starting to get worse. The processes start to stamp on each other. The business objects become bloated, we may even have a hard time finding a good name for a method - it is already taken by another process, and "Adjust" for billing is not the same as for receipting. You will find out that often it is not so easy to choose the location of the code, among many business objects involved - the methods don't seem to belong to a particular class, they work with data across entity boundaries.
The other problem is starting to surface - the shared business methods start to be more and more aware of the "context" - the process in which they are invoked. They originated as generic shared methods, but now have to contain more and more conditional logic dependent on external context. It is no longer "operations over encapsulated data" - it is dependency on context. No longer OOP.
But the real trouble unfolds when we start to create "variations" of the objects and business processes. Here is what I mean. The "invoice" in the real world is not a single standard thing. There are many different kinds of invoices. The trouble is not that there are many types - but that classification does not fit  into a tree-like single-inheritance schema we have in c# or similar OOP languages. It varies by the industry - manufacturing, retail, services, utility company - in each case the invoice is a quite different thing. It varies by company type, business type, procurement channel type, etc. And invoice that company receives from its vendor (AP invoice) is recorded differently from the invoice the company sends out to a customer (AR invoice). The point is that the variation is multi-dimensional, each particular case is a combination of dimensions, not specialization of one parent and more generic thing. We may have very hard time modeling this variability with a plain inheritance tree. The "inherit and override" pattern does not quite work anymore.
(Note: multiple inheritance or interface-based API would not make much difference, take my word for this)  
You might say that this invoice example is an extreme case relevant only to a full-blown ERP system "built for all" - which is not what our project is. But it illustrates the very common pattern which you may encounter, maybe to a lesser degree. Even in a more limited case the consequences might be quite troubling for your code - it may get really messy.
The point I'm trying to make is that classic OOP structures do not work well for business applications, when used in "business entities". What would be then a viable alternative? The alternative is to separate data from code: the data is stored in dumb entities, and business logic is in separate "processing" classes that hold no data - except for process parameters.  
                 
Data objects and processor objects
The proposed alternative is to keep the "data" in dumb objects free of any logic (entities), and to put the processing code into separate classes that do all processing. In fact, you may discover that these processing classes actually fit quite well into inheritance paradigm of classic OOP - and you will find yourself building the trees of processing sub-classes. The variation might be achieved either by sub-classing, or using pluggable sub-processes for specific operations.
What is interesting is that complex data-connected system seem to deviate towards this separation over time, even if they start with classic OOP-style business objects. I have seen this once, in a  big ERP system - after facing more and more problems with business objects as system added more and more features, the team started implementing the new functionality using new processes, without modifying old business objects holding the data and old code. The other system I worked on has adopted this style before I joined the team, and I witnessed the development of new features almost exclusively in separate process classes. The business objects holding the data had very limited code supporting UI editing.
Another interesting observation - the MVC architectural pattern suggests this separation as well. The Model is mostly data, while all editing functionality is confined to Controller classes. The UI-support code is mostly in Views.
REST also seems to suggest this style of API design. With a limited set of HTTP verbs, you do not have much area to invent fancy methods on top of entities; instead, you go with processes (as resources), use verbs to activate them and use entities/resources are pure data containers as process parameters.
I had been using this approach for some time already. In [url:Irony|http://irony.codeplex.com] project, the parsing automaton is constructed using numerous classes like ParserState, ParserAction, etc. There are two major kinds of activities happening with these objects:

  • Construction of the automaton at startup - an extensive, computationally heavy process implementing some tricky algorithms
  • Using automaton during parsing - LALR parsing algorithm working with the graph of ParserState and related objects.
One choice would have been to put all logic from for both activities into the automaton objects. That would really mix up the code for two almost unrelated activities, bloat the objects (obscuring their "essense"), and generally break the rule of "separation of concerns". Instead, I chose to place the code into separate processing areas: there is a number of -builder classes for building the state graph, and a Parser class that does the parsing. The classes constituting the "parsing automaton" (ParserData) are mostly code-free property containers. The same approach is used inside VITA - there are classes that hold data and others that do stuff.


Business code in VITA applications
Long story short - this is the pattern I am planning to follow in VITA. The entities are code-free collections of values, with some smart auto-loading functionality underneath - but this is hidden in the framework. The application is assembled from entity modules - containers for a group of entities implementing functional block. All business code should be placed into modules and helper processing classes.
As one early example - the Error logging module in _Vita.Modules_ assembly. It manages the read/write operations for error log table internally, and it exports two facilities:

  • ErrorLogService - a "processor" service for logging errors
  • ErrorLogViewer - a service for displaying the error log in the browser

Both facilities are pure "processors" - they work with  _ErrorInfo_ entity, but do not "extend" or encapsulate it in classic OOP style. That said, the particulars of processing classes design are not clear yet and are subject to further research. One foggy area for now is how to expose the functionality through REST.