CONCURRENCY CONTROL SIMULATION PARAMETERS

COMMIT PROTOCOL

The centralized version of the simulator used a very simple commit protocol. This can be done because there is only one scheduler. In a distributed database, it is common that more than one scheduler participate in executing a transaction.

Because of this, a distributed commit protocol has to be used, to make sure that all participating schedulers reach the same result. Either all perform the commit, or all have to abort. Two well-known protocols are two-phase commit (2PC) and three-phase commit. We have employed 2PC, the protocol which is used by most commercially available distributed database systems.

BOOK KEEPER

The book keeper is used for collecting statistics. Among other things, it counts the number of transactions completed and aborted, whether they are short or long, and average execution time. It also collects statistics about active transactions (transactions that are not finished) at the time the simulation ends.

Simulation Parameters

The performance in real database systems is determined by several parameters. Some of them are determined by the hardware and operating system architecture, some are set by the administrator, and others are results of the characteristics of the transactions done on the system. If we want the simulation results to have value in real life, these parameters have to be a part of the model.

In this section we will briefly present the most important parameters that are part of the simulation model. Determining reasonable values of the parameters are very important, as the final result is highly dependent of these values. Our parameters are mainly taken from and, others are determined according to hardware specifications and previous work at our department. The parameters being constant for all simulations are summarized in Table 1 and 2. It should also be noticed that all transactions are strict. Strict execution is dominant in real systems, both because of user functionality and recovery reasons. Strict execution implies that all resources are held until commit/abort.

Number of Nodes. Number of nodes in the system we simulate. These nodes are connected by a network.

Number of Transactions. The number of transactions to complete after the initial warm-up phase. The reason for having a warm-up phase, is to make sure that the start-up phase for the system we simulates, is not taken into account when we start collecting statistics about the simulation. In our simulations we have used a warm-up phase consisting of 2000 transactions before we start collecting data. Size of Address Space. Number of elements (pages) in the address space is important. In our simulations we have set the address space to 20000 elements.

DATA DECLUSTERING

Distribution of the data elements to nodes, the percentage of the database elements in the database system located at a particular node.

Data Locality: The probability that a transaction access a data element located on the same node as the transaction.

Non-Local Data Access: When a transaction in the system is accessing a data element not located at its home node, this is the probability that the remote access will access data at a particular node.

Hot Spot Probability: The probability of an operation to address the hot spot part of the address space. The hot spot area in our model is the first 10% of the address space at each node.

Multi Programming Level: Number of concurrently executing transactions.

Transaction Distribution: The probability that a new transaction in the system is started on a particular node.

Short Transaction Probability: The probability of a transaction being short. The rest of the transactions are long transactions.

Abort Probability: The probability of a transaction requesting abort before committing. This probability is the same for both long and short transactions.

Read Probability: The probability of a data access operation being a read operation. In our simulations, we have used a value of 80%. This gives a write probability of 20%.

Burst Probability: The probability of a transaction to ask for operations in a burst. Time between operations in a burst is shorter than normal. The length of short and long transaction, the time between transactions, time between operation requests from a transaction and number of operations in a burst is drawn from an uniform distribution with parameters as shown in Table 1.

Restart Delay: When using a timestamp ordering scheduler, transactions might get aborted because of conflicts. If restarted immediately, the probability for the same conflict to occur is quite high. To avoid this problem, the restart of the transaction is delayed a certain amount of time. This delay is a value drawn from a uniform distribution, multiplied by the number of retries.

Network: What kind of network to simulate. It can be of three types, cluster, LAN or WAN as described earlier section.

Scheduler Type: Scheduler used for concurrency control. In the current version, this has to be either two-phase locking or timestamp ordering.

Disk Operation Time: This time interval gives the time for doing an access to the disk. This time is drawn from a uniform distribution.

Simulations and Results

The results of the simulations should make us able to say something about the scheduler’s performance, relative to each other. If these results should have any relevance in “real life”, the mix of operations given to the schedulers should be as close to those in real life as possible.

Different systems have different characteristics, and to reflect this, our simulations are done with a mix of different characteristics. Our simulations are based on a constant load. The MPL during one simulation is constant. We have used a mix of short and long transactions. At all times, 20% of the load on the system comes from long transactions.

When one transaction finishes, a new one is started. The number of finished transactions in each simulation run is 20000, with an additional warm-up phase of 2000 transactions finished before we start collecting statistics. We have tried to observe the effect of distribution on two-phase schedulers and timestamp ordering schedulers. We measured throughput with different number of nodes and MPL, with different data placement, and different types of networks.

CONCURRENCY CONTROL SIMULATION PARAMETERS

Post a Comment

6 Comments

Popular Posts