Thursday, January 7, 2010

Cavaets in using Ellipse Business Connector

Ellipse Business Connector (MIMSX) is an integration component to allow developers to integrate with their ERP system.

This is a COM STA component originally designed to work in say Excel. However, this has found usage in many different type of applications, such as Web Servers or multi-threaded daemon process. This blog post is not to show how to use this component properly but to describe a number of the caveats that can cause problems - even when using in Excel.

The need for these caveats is because MIMSX, together with some of its internal components, can cause problems that generally are related to the absence of concurrency control. One may ask how can a STA component can have concurrency issues? Doesn't COM provides synchronization protection when an STA component is being access from multiple apartments?

Read on because MIMSX is not your COM spec conforming STA component.

Caveats on using MIMSX

MIMSX Reentrancy

The main problem of using MIMSX in a GUI application, including Excel, is that MIMSX is re-entrant; this problem is best illustrated using MSQDSK in Ellipse 6, which uses MIMSX. While many of the methods of the MIMSX interfaces do not return until results are ready as all normal COM methods should, they internally contain a message pump that is required by the legacy home-grown IPC mechanism based on window messaging technique. This message pump is the root cause of making MIMSX, inproc-COM component, reentrant. To be fair, it is not entirely MIMSX's problem. Whenever you have a modal method or code modality, you have a reentrant situation.

If your program, even in Excel, sends a command to the backend and expecting for some result, you should implement appropriate reentrancy control techniques. Failure to do so can cause unpredictable problem.

Using MIMSX in a multi-threaded program

More and more people are trying to use MIMSX in say ASP/ASP.Net, Web Service or Service Daemon programs to provide server-side integration solution. Or to move their scripts from desktop deployment to centralize server operation. All these environments operate in MTA and expect all the required COM components to support MTA.

However, MIMSX is an STA component. When MIMSX object is created in this situation, known as mixed-model, MIMSX objects have to be created in its STA while the clients of these objects are living in the MTA. The end effect is that the calls from these clients threads are serialized by COM to call these objects, nullifying the benefit of multi-threading.

When operating in mixed-model environment, COM does not raise any warning or error, all one would notice is the absence of performance when load is applied despite more threads are being used.

To alleviate this is quite straight forward without requiring re-architecting the solution. All one has to do is to apply COM aggregation technique to aggregate MIMSX in a specially crafted COM executable server. Mincom has made one called EllipseCOM but may not be generally available. If that is the case, just follow the instructions to build your own. COM is an open industry standard and there is nothing much a vendor can do to forbid you applying proper COM idiom.

This is actually a good technique to build your own ERP SOA server so that your application is not directly bound to Mincom's idiosyncrasies and products.

This technique is not optimal and certainly not perfect. It would be nicer if MIMSX object model was published by TpAgent, which acts as a COM local server. But this is not the case and using aggregation is the next best option.

While it alleviates serializing all calls from all these threads to MIMSX objects, this option has the following issues:
1) it incurs heavy tolls on marshalling and context swithing as MIMSX's object model is designed to be transverse in-process and hence it employs very chatty interfaces. But now, the in-process calls are discarded replaced by inter-process calls.

If code modification is an option, try to reduce the usage of dotted-expression to transverse the object model. Don't forget memory are allocated on the aggregation server and then marshalled across to the client process and then made available to the client. They can exert heavy toll.

But in comparison, the usage of aggregation server prevents one's call from blocking other threads from doing their work and to call the Ellipse back end and this on the balance has been measured to increase the throughput.

2) If the clients are .Net environment, you are now dealing with two memory management schemes - the COM one is deterministic based on reference count while the CLR one is base on garbage collection, which is non-deterministic. As a results, usage of the dotted-expression can cause large number of temporary CLR objects to be created each of them manages an associated COM object. These CLR objects will not release its associated COM objects until they are garbage collected. This can cause system memory demand to be greater than an operating environment based solely on reference counting.

The biggest attraction of using aggregation server is that you only have to change one line of code; just change from the ProgID or CLSID of MIMSX to that of the aggregation server. This assumes that your application is coded using MIMSX interfaces, as a well-written COM application should, and not using implementation language wrapper classes. If you are prepared to switch the entire server environment over to using the aggregation server, you do not even have to modify your code by leveraging the COM Emulation technique

While this solution uses multiple STA servers to remove the mixed-model problem, for MIMSX it is not the same as implementing a multiple STA server solution because MIMSX uses components that are downright dangerous in this kind of environment.They are covered by other caveats.

Caveat in using MSYMCACH

It is an internal optional COM component used by MIMSX. Its sole purpose is to 'cache' the host commands and their results on the client machine to avoid the expensive trips to the back end. The principle is sound but in practice not well executed.

The cached data are saved, rather in a primitive manner with little performance concerns, in files on the client's machine. They are not isolated by processes. That is if ApplicationA and ApplicationB are to cache data of a same identifier, they are then competing to access the same pair of files.

In well written multi-threaded program, such occurrence will be controlled by suitable synchronization mechanism to ensure orderly access. But not in MSYMCACH, which seems to be a leftover from Windows 3.x's era which was a non-preemptive operating system. In those olden days, only one application can run and no one, including Windows 3.x, can steal its sole usage of the CPU. Therefore why need to implement any synchronization mechanism to control concurrent access? However, the world has changed and we are now using preemptive OS running on Hyper-Threading or multi-core machines. Collision is just matter of when and not even if.

The usage of a set of very old, first generation, C IO functions give credence to this speculation despite newer ones which allow programmers to control file access mode.

So even if you have two Excel scripts running it can cause problem. Problem can range from one application over writing changes made by another application; or it can be access problem when two are trying to either write to the same pair of files.

Because of the poor implementation and doubtful benefit it brings, it is recommended a user to unregister this component to avoid unpredictable problem caused by the absence of any concurrency control.

EllipseCOM or aggregation server design is of no help in this situation. Files are essentially a singleton resource and should be treated as such.

Caveat for CBR32.DLL

Once again, this is internally used by MIMSX and is a mandatory component, unlike MSYMCACH. In brief it is basically Ellipse's marshaller and MIMSX offers a COM conforming object model representation of the Ellipse model.

Once again, it appears this is another leftover from days gone by coupled with gratuitous dependency on global variables (nasty) even when it is using dynamic heap. Naturally it lacks any basic form of concurrency control.

This component assumes that it is always living in a single thread process and possibly only one client using it. Single Thread does not equate to single client using a component; you could have several clients on a Window form using CBR32.

Because it is devoid of any concurrency control, one should never use it in a multi-threaded program or program with multiple STA to host MIMSX. Such environment can result in total CBR32 data corruption. Don't forget, CBR32 is not designed to be used only by MIMSX, it is used by other applications as it offers a C-Style exported functions.

If you need to have multiple STA to host MIMSX, it is recommended that each STA uses either EllipseCOM or MIMSX aggregation server.

The EllipseCOM or MIMSX aggregation server ensures that only one thread per process and that is the kind of the environment expected by CBR32. Failure to observe this will result in data corruption.

No comments:

Post a Comment