In the phase 1 of 2PCP, controller asks all the participant if they can commit. Depending upon the response TC can enter phase 2, if all the participants agreed to commit, TC will confirm this decision with all the participants in phase 2 otherwise roll back decision is sent to all the participants. Also, after phase 1 each participant as well as TC saves enough information in their transaction logs so that they can roll back in case of failure. After phase 1, all the participants which agreed to commit remain blocked till they get phase 2 message.
Failure conditions: What if transaction controllers fails after phase 1? The participants will remain in locked state. Have you ever noticed when you start an application server, it tries to recover the incomplete transactions? Look at the startup logs when you see it next time. What is does is, it looks as the transaction logs and tries to recover the previous state. This is main reason that after phase -1 each participant save enough information in the disk, so that it can recover from the failed transaction. There are other failure scenarios like “What if resource A fails after phase -1?” All these kind of scenarios result in transaction timeout and it rolls back. Resource A when come back, it should read the transaction logs it saved before it crashed and bring itself to consistent state.
How about transactions running in a cluster? Each node in a cluster can start and manage its own transactions. If one of the nodes crashes (say Node1), then HA Manager (some kind of P2P implementation which is highly available and controls all the nodes) with the help of Node2's transaction manager reads the shared transactional logs and recover the locked resources (which is result of failure of transaction controller) Locked resource could be a poisoned message in a JMS queue or a locked row in a database (sometimes many rows get locked due to page level locking – DB2 does it.)
No comments:
Post a Comment