A Disk RAID array Volume Manager for Disk

the experiments in the paper use a block size of
10
4KB, since most file systems will do operations on scale no smaller than this. The system is running with a
25% unallocated disk, to allow some free space for the learner to place moved blocks. The experiments use
a set of traces and heuristic sets, described in the remainder of this section.
5.1 Traces
The evaluations in this paper use several traces. This describes each trace set.
92 Traces: These traces come from the Cello92 trace set [22]. This is a trace of the departmental server
at HP in 1992, which served news, development, etc. The experiments use trace data from the users partition
(disk B) and the news partition (disk D) from this server. We will refer to these traces as the 92 Users and
92 News traces.
99 Traces: These traces come from the Cello99 trace, a trace of the same system at HP in 1999. The
experiments use trace data from the source partition (disk 1F002000), a partition containing development
code (disk 1F015000), and a partition containing reference data (disk 1F003000.) We will refer to these
traces as the 99 Source , 99 Development , and 99 Reference traces.
5.2 Heuristics
The experiments found in this paper use the following combinations of heuristics.
All: This includes all of the heuristics listed in section 4.1.2.
All-no-bad: This includes all of the heuristics listed in section 4.1.2 except for the bad heuristic.
Placement: This includes the three placement heuristics: disk shuffling, front loading, and bad.
Sequential: This includes the two sequential heuristics: threading and run packing.
Thread+Shuffle: This includes the threading heuristic and the disk shuffling heuristic.
Thread+Placement: This includes the threading heuristic and two of the placement heuristics: disk
shuffling and front loading.
6 Evaluating Overall Effectiveness
This section presents a representative evaluation of the learner, and discusses the weaknesses in our current
methods. We will discuss the complexities of the problem, and our methods in more detail in later sections.
We run the learner on a full week from each trace. The learner trains on a given day, and then tests the
performance on the next day. Because we are using a test window of 3 days, the first two days are purely
11
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Base
Shuffle
Front Load
Threading
Run Packing
All-no-bad
All
Figure 4: Learner evaluation. This evaluation shows results on the 92 News trace, using the Apply learner described in more
detail in Section 8.3.
training data. The results given are the average (with standard deviation bars) of the following five days. For
each day the learning unit is run for 100 iterations.
We compare the results of the learner with all but the bad heuristic (All-no-bad) with the learner with
all heuristics (All), and the base heuristics (Shuffle, Front load, Threading and Run Clustering.) The base
case (Base) shows the performance of the layout when no reorganization is performed.
Figure 4 shows the results of these evaluations. The values shown in the graphs are average response
times normalized to the base response time, so lower bars mean better performance. Note that these results
do not include the cost of doing the actual reorganization.
A correctly performing learner should be as good or better than the best heuristic in all cases. The
learner should also be able to ignore the bad heuristic; meaning that the addition of the bad heuristic will not
hurt performance.. However, the figure shows that the learner without the bad heuristic (All-no-bad) does
not do as well as Front Loading, which is the best heuristic. In addition, the learner is unable to completely
filter out the bad heuristic; the learner with the bad heuristic (All) is significantly worse than the learner
without the bad heuristic (All-no-bad).
We have tried a variety of approaches, and experimented with a variety of settings while tackling this
12
problem. We have also evaluated the system on a variety of traces. While this exploration has yielded results
which are sometimes better and sometimes worse, the graph shown is typical. The next section describes
some of the reasons the learner does not perform as hoped.
7 Challenges
There are a variety of challenges in the adaptive combiner’s task of finding a good layout given the constraints
from the heuristics. This section discusses the most important of these challenges in detail.
7.1 Constraint Space
7.1.1 Constraint Conflicts
As mentioned earlier, conflicts are an important challenge that the learner must address. The complexity of
the conflict space affects how well the greedy approach will work in applying constraints.
In order to explore this complexity, we performed an experiment measuring these conflicts. The experiment
chooses 200 random constraints to monitor. For each of these constraints, it finds all combinations of
a certain number of other constraints which, if applied, would keep the monitored constraint from being applied.
Each of these combinations is listed as a single conflict. We will call the number of other constraints
involved in the conflict the conflict length. Figure 3 on page 9 shows examples of conflicts. For example, the
conflict between constraints A and B in the figure would be a conflict of length 1, because it includes one
constraint besides A. The conflict between constraints A, C and D would be of length 2, since it includes two
constraints besides A. This experiment looks for all conflicts of length 3 or smaller. We did not extend the
search further because the methods of exhaustively searching for these conflicts are exponential and quickly
become unmanageable (attempting to find conflicts of length 4 took more than 48 hours for a single day on
some traces.)
Table 2 shows the average numbers of conflicts per constraint. It is interesting to note that as expected,
larger numbers of heuristics usually lead to larger numbers of conflicts. Having both sequential and placement
constraints also leads to higher numbers of conflicts, since the overlap allows for multiple placement
constraints to conflict through a single sequential constraint. The Thread+Shuffle combination is interesting
to note, as the two heuristics are complementary and have almost no conflicts.
Figure 5 shows the average number of conflicts per constraint across all of the trace sets by conflict
length. Ihere are a large number of conflicts of length one and two, but very few of length three. This is
probably because with our heuristics, most conflicts occur between at most two placement constraints and