Monday, November 13, 2006

Transactions, concurrency, performance

In distributed environment performance is an important criteria for software. To achieve performance, we need to make our software work fast and effective. I will try to describe a problem that you may encounter when developing such application, and of course, a solution which may be helpful.

Let's say you have an application in which you need to work with many user data, where frequent database access is foreseen. In this application you need fast access to existing data. As your application is distributed, data access is required from several locations. You may get in trouble for concurrent data access or reading dirty data. Modifications of the same data should be avoided.

An option is to use a database. Although, this will not provide you satisfaction when it comes to performance. Suppose you need to handle data operations. As soon as data is available, it will be processed by your application from the other locations. Concurrent access will be managed by database - transactions will slow you down even more. And it will not make your application performance better. In principal for performance issues using a database is not the best choice, instead using a cahe would be a better choice.

A distributed cache would suit your needs in this case (see JBoss Cache).

O/R operations in cache are performed at high speed. Using JBoss Cache, you can also have your data distributed on many locations. And you will have fast data access, as your data will be replicated and available on the locations you choose to configure. But to increase the performance of your application, you may choose to use or not transactions. Jboss Cache provides the following transaction configuration options (analogous to database isolation levels): NONE, READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, or SERIALIZABLE (for detailed description of levels please see here). REPEATABLE_READ is the default isolation level used. You should know that Jboss Cache by default is using the pessimistic locking scheme to avoid concurrent data access.

The idea behind using distributed cache and achieving performance is having such a cache structure, where writing reading may be performed without causing exceptions. More precisely, writing to the cache should be performed in the same cache node (read here more about the cache structure) by the application running on a location. This means, you need a cache structure based on the number of the locations your application is distributed on. Creation of the cache should be done on-the-fly: when the application is not available on a location, it should not have a cache entry. Now we have a write-safe cache. As for reading, we can choose from several strategies: you could use the default isolation level, which uses a read-write lock. This you don't need entirely, as you can write your cache safely. So for reading you could use the READ_COMMITTED or READ_UNCOMMITTED values, depending on how your application handles data integrity. But if you handle dirty data and you are for performance, you could pick the the transaction isolation level NONE - where no transaction support is provided. To improve concurrency and performance, you should care for the configurable locking mode OPTIMISTIC, which if is enabled, transaction isolation levels are ignored. With this configuration mode you still have transactions.

The correct configurations, though, are application and requirement specific.

Technorati tags: design, JBoss Cache, transactions, concurrency, performance.