- Overview
- Cycle-based, Round-robin
- Deadline-driven, Random
- Optimization Techniques
- Online Reconfiguration
- Supporting Multi-zone Disks
3.5 Online Reconfiguration
In real applications, access frequency varies over time in many reasons. Because the effectiveness of the selective replication is sensitive to the change of access frequency, one may need to periodically reconfigure the number of instances of objects and data placement of them to maximize the system performance. In general, it requires time and extra disk bandwidth to relocate even some parts of huge multimedia objects while servicing user requests. This may result in a significant degradation of the system performance during reconfiguration process. However, while this performance degradation is temporary during reconfiguration, it could be permanent without reconfiguration. Thus, we assume that a system performs a periodic reconfiguration such as every midnight or every week.
3.5.1 Reconfiguration Process
We can represent the current configuration of data placement as a set with n elements:
Then, formally, reconfiguration process is a function from the current configuration C to a new configuration C':
Our proposed reconfiguration process works as indicated in Figure 3.14. In step 1, the system gathers necessary information for reconfiguration including the object list and their updated access frequency. In step 2, the number of instances is calculated using selective replication as described in Figure 3.9. In step 3, the location of individual blocks is determined using Eq. 3.2 and 3.16. At this stage, the system can begin relocating data blocks based on the data placement.
Figure 3.14 Reconfiguration Process.
For dynamic reconfiguration, the system needs to gather logging access information and update access frequency to each object in the database. Then, according to a preset schedule, it performs a dynamic reconfiguration by executing step 4 and 5. There are two different cases in reconfiguration. First, there could be changes only in access frequency of existing objects. No new object is introduced and no change in the database. In this case, reconfiguration may happen based on the tradeoff analysis between the overhead of reconfiguration process and the performance enhancement after it (step 5). For example, assuming a slight change in access frequency and a minor enhancement in performance, expensive reconfiguration won't be justified. The overhead of reconfiguration process can be quantified from the number of data blocks to be moved (say, Sm) during the process, which can be quantified using an algorithm in Figure 3.15.b. Then, we can calculate the percentage of wasted disk bandwidth (Woverhead) during a predefined reconfiguration period (Treconf in seconds) as the overhead, Woverhead = (100 · Sm)/(m · Treconf). When E[L] is the expected startup latency with old configuration C and E[L'] is the one with new configuration C', the performance gain is defined as the percentage reduction, G = (100 · (E[L] − E[l']))/E[L]. Then, the system commits reconfiguration based on a cost function only when the gain is greater than the cost, i.e., G > c × Woverhead, where c is a cost coefficient set by the system administrator.
Figure 3.15 Algorithms for reconfiguration process.
The other case happens when there is a change in the database such as adding, deleting, and replacing a number of objects. This case is a forced reconfiguration committed by the administrator, regardless of the cost. When one introduces a new object, its access probability should be determined (estimated) rather than obtained from the history. Figure 3.15.a shows an algorithm to adjust the access probability of existing objects when k (0 < k ≤ n) least popular objects are replaced in the database. Step 6 in Figure 3.14 executes this adjustment and produces a new access frequency. The system resumes the process to accommodate the change.
3.5.2 Experiments
In all experiments, we assumed that the entire database consisting of 50 video objects was disk resident. The selection of objects and their access frequencies were based on real world statistics from a WWW page [42], maintained by The Internet Movie Database, which provides ranks and revenues of weekly top 50 video rentals in the United States. Without loss of generality, we converted video rental revenue of an object into its access frequency. We used statistics for ten weeks (say, week 1 to 10) between the third week of February, 2002 and the third week of April, 2002. For example, the first two columns of tables in Figure 3.16 show the statistics from the rental revenue. We assumed constant-bit-rate compressed video objects with a fixed one-hour length in all our experiments.
Figure 3.16 Statistics of top 50 video rentals in the United States.
First, we performed a series of simulations to obtain the expected startup latency and compared them with analytical results. We assumed a system consisting of 12 disks (d = 12). The bandwidth of each disk can support ten simultaneous displays (N = 10). Hence, the maximum throughput of this configuration was 120 simultaneous displays (m = 120). We assumed two Poisson arrival rates (λ = 0.0317/sec for a system load, ρ = 0.95, and λ = 0.0267/sec for ρ = 0.8) for user requests. Upon the arrival of a request, if the scheduler fails to find an idle slot in the system (d failures), then this request is rejected (m server loss system). Figure 3.17 presents the distribution of number of failures from both the analytical and simulation results. The results demonstrated that the proposed equations model well the server's behavior producing very close values. We performed the same experiments for different configurations and obtained similar results. Note that we assumed that each object has only a primary copy (no replication) for these experiments.
Figure 3.17 Distribution of number of failures (d = 12, N = 10).
Second, we quantified the startup latency in a system with 12 disks while varying the number of maximum simultaneous displays of a disk (N) from 1 to 12. By varying the value of N with a constant data transfer rate of a disk, we model different display rate requirements of objects. A lower N value means that the system is configured with objects requiring a higher display rate (more disk bandwidth). In this experiment, we assumed two configurations: one with no replication and the other with replicas using the selective replication based on the access frequency from week 2. For the selective replication, we assumed a total storage capacity of 100 copies for equi-size 50 objects. Based on the quantified access frequency, we determined the number of replicas using the divisor method in Figure 3.9. The third column of table in Figure 3.16.b shows the exact number of instances of each object. For each configuration, we quantified startup latency using both analytical model and simulation. Figure 3.18 shows results from this experiment. Startup latency with the selective replication was always far shorter than that with no replication. Figure 3.18 also demonstrated that the results from the analytical model were always slightly lower than those from the simulation but the trends were exactly matched. This implies that we would not have any problem when we use the analytical model to make a decision in the algorithm (Figure 3.15.b) in order to commit a reconfiguration. Another observation is that startup latency depends on the N value. The larger N is, the shorter the expected startup latency. This is because the probability that all slots are exhausted in a group decreases as a function N resulting in a smaller number of failures.
Figure 3.18 Expected startup latency (ρ = 0.95, d = 12).
Finally, we performed nine forced reconfigurations during a ten week period based on the statistics from The Internet Movie Database. Every week, we recalculated the number of instances of objects using selective replication technique with the updated access frequency. Then, we quantified the overhead of moving data blocks to new locations according to the new data placement. Consequently, a new expected startup latency was quantified. Due to the lack of space, we reported only one detailed reconfiguration from the first week to the second week (see Figure 3.16).
Figure 3.19.a shows the percentage reduction in expected startup latency, which was calculated using the difference between latencies before the reconfiguration and after it. Note that a latency before the reconfiguration means the one quantified using the previous week's configuration with the current week's access frequency representing the system performance just before the configuration happens. On the average, we could achieve a 17.0% reduction in expected startup latency when we performed a reconfiguration. We observed that the impact of reconfiguration was more significant when a bigger change in access frequency was introduced. For example, from week 1 to week 2 (see Figure 3.16.b), two very popular videos were introduced and ranked as the top two, resulting in the largest reduction (28%) in our experiments. However, from week 7 to week 8, newly introduced videos could not attract people's attention and ranked low, resulting in the smallest reduction (5.6%). This observation fits to the rationale of selected replication based on popularity.
Figure 3.19 Reconfiguration process, (ρ = 0.95, m = 120, d = 12).
Figure 3.19.b shows the percentage of disk bandwidth wasted during the 4-hour reconfiguration process. Our results show that, on the average, 18.6% of disk bandwidth was wasted. In other words, when we reserve 20% of total disk bandwidth for reconfiguration, reconfiguration takes 3.7 hours on the average. Considering the fact that system load significantly decreases during the night in most applications, this result demonstrates that reconfiguration is feasible without affecting the overall performance of the system.