CENTRALIZED TO DISTRIBUTED DBMS
Compared to a centralized database
system a distributed database system is much more complex in implementation and
concurrency control. We had to augment the simulation model used in the
centralized simulator with flexibility to simulate the most important aspects
of a distributed database system. We introduced three new concepts to the
distributed simulation model:-
1. Number of sites that the database is distributed to.
3. Type of network.
TRANSACTION
MANAGER
The Transaction Manager (TM) is responsible
for simulating the transactions that generate operations to the schedulers. It is implemented as one module that contains
information about all active transactions which are currently carried out. Every transaction is assigned to one of the nodes
in the network when it is created. In this way, although we only have one TM
object, we simulate that every node has one transaction manager, and every
transaction has one initiating node.
In the simulation model we use the
number of concurrent transactions as the main criteria for load on the system.
In every simulation we have a fixed number of active transactions. This is a
good approximation to a system with constant load. This is called multiprogramming
level (MPL). Every time a transaction finishes, a new one is started (after some
random delay).
The database contains a fixed number
of data elements. Each data element is addressed by a number in the interval [0,
max]. The data elements are de-clustered on all the nodes sometime in the
network. How the data elements are assigned to the nodes is specified in the
simulator’s configuration file.
When a transaction is created it is
given “a life on its own”, and starts to generate operations to the schedulers.
Each operation is a Read, Write, Commit or Abort operation. If the operation is
a Read or Write, a data element has to be accessed. One of the main advantages
with a distributed scheduler is the possibility to store data at the node that
most frequently accesses it. That’s why, the transaction selects the data
element in most cases to be a data element stored on the same node as the node
where the transaction originated. This locality of the data accesses can be configured
individually for each node. Every time an operation finishes successfully, the
transaction, after a short delay, generates a new operation or it decides to
end the transaction by sending a commit or abort operation.
SCHEDULER
The scheduler module in this system
is a black box , it communicates with the other modules only via a well-defined
interface. This makes it easy to replace it with schedulers that use other
techniques. In the distributed version of the simulator, two-phase locking and
timestamp ordering schedulers are implemented.
1.
Strict Two-Phase Locking Scheduler:
A nice feature
with non-replicated distributed databases is that each local scheduler can
schedule the data accesses as if it was a centralized scheduler. But of course
there are also some problems that are more difficult to solve in the
distributed context than for a centralized scheduler. For the 2PL scheduler the
main problem which has to be solved is deadlock
Strict Timestamp Ordering Scheduler:
We have
implemented a strict TO scheduler. Although deadlocks are no problem here, we
have another “global problem”. Assigning monotonous increasing unique
timestamps to transactions. In a real implementation this could be done by
concatenating a local timestamp counter with the node number.
DATA MANAGER
The data manager simulates the
physical database storage manager. Each node has a data manager, which operates
independently of the other data managers in the system. We have used a very
simple and straightforward model, where the data manager only consists of a
simulated buffer cache and a disk required to perform the aforesaid task.
Time delays are used to simulate
operations against the data manager in the process. The time needed to execute
a disk operation is the sum of the time that operation has to wait for the disk
to become idle (the time needed to execute previous operations submitted to the
disk), and the time it takes to read or write the page. We have chosen a very
simple algorithm for disk scheduling: first-come-first-served (or first-in-first-out).
A read operation from the cache takes constant time, and can be served
immediately to optimized the performance.
Logging is not implemented
separately. The time and resources needed for logging is considered to be
included in the time needed to do the write operations. However, when an abort
occur, we write back the before images.
4 Comments
Time stamp is required for consistency
ReplyDeleteCentralized and distributed database features are defined very well
ReplyDeletePerfect 🥰
ReplyDeleteperfect combination of centralized and distributed dbms
ReplyDelete