the experiments in the paper use a block size of 10 4KB, since most file systems will do operations on scale no smaller than this. The system is running with a 25% unallocated disk, to allow some free space for the learner to place moved blocks. The experiments use a set of traces and heuristic sets, described in the remainder of this section. 5.1 Traces The evaluations in this paper use several traces. This describes each trace set. 92 Traces: These traces come from the Cello92 trace set [22]. This is a trace of the departmental server at HP in 1992, which served news, development, etc. The experiments use trace data from the users partition (disk B) and the news partition (disk D) from this server. We will refer to these traces as the 92 Users and 92 News traces. 99 Traces: These traces come from the Cello99 trace, a trace of the same system at HP in 1999. The experiments use trace data from the source partition (disk 1F002000), a partition containing development code (disk 1F015000), and a partition containing reference data (disk 1F003000.) We will refer to these traces as the 99 Source , 99 Development , and 99 Reference traces. 5.2 Heuristics The experiments found in this paper use the following combinations of heuristics. All: This includes all of the heuristics listed in section 4.1.2. All-no-bad: This includes all of the heuristics listed in section 4.1.2 except for the bad heuristic. Placement: This includes the three placement heuristics: disk shuffling, front loading, and bad. Sequential: This includes the two sequential heuristics: threading and run packing. Thread+Shuffle: This includes the threading heuristic and the disk shuffling heuristic. Thread+Placement: This includes the threading heuristic and two of the placement heuristics: disk shuffling and front loading. 6 Evaluating Overall Effectiveness This section presents a representative evaluation of the learner, and discusses the weaknesses in our current methods. We will discuss the complexities of the problem, and our methods in more detail in later sections. We run the learner on a full week from each trace. The learner trains on a given day, and then tests the performance on the next day. Because we are using a test window of 3 days, the first two days are purely 11 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Base Shuffle Front Load Threading Run Packing All-no-bad All Figure 4: Learner evaluation. This evaluation shows results on the 92 News trace, using the Apply learner described in more detail in Section 8.3. training data. The results given are the average (with standard deviation bars) of the following five days. For each day the learning unit is run for 100 iterations. We compare the results of the learner with all but the bad heuristic (All-no-bad) with the learner with all heuristics (All), and the base heuristics (Shuffle, Front load, Threading and Run Clustering.) The base case (Base) shows the performance of the layout when no reorganization is performed. Figure 4 shows the results of these evaluations. The values shown in the graphs are average response times normalized to the base response time, so lower bars mean better performance. Note that these results do not include the cost of doing the actual reorganization. A correctly performing learner should be as good or better than the best heuristic in all cases. The learner should also be able to ignore the bad heuristic; meaning that the addition of the bad heuristic will not hurt performance.. However, the figure shows that the learner without the bad heuristic (All-no-bad) does not do as well as Front Loading, which is the best heuristic. In addition, the learner is unable to completely filter out the bad heuristic; the learner with the bad heuristic (All) is significantly worse than the learner without the bad heuristic (All-no-bad). We have tried a variety of approaches, and experimented with a variety of settings while tackling this 12 problem. We have also evaluated the system on a variety of traces. While this exploration has yielded results which are sometimes better and sometimes worse, the graph shown is typical. The next section describes some of the reasons the learner does not perform as hoped. 7 Challenges There are a variety of challenges in the adaptive combiner’s task of finding a good layout given the constraints from the heuristics. This section discusses the most important of these challenges in detail. 7.1 Constraint Space 7.1.1 Constraint Conflicts As mentioned earlier, conflicts are an important challenge that the learner must address. The complexity of the conflict space affects how well the greedy approach will work in applying constraints. In order to explore this complexity, we performed an experiment measuring these conflicts. The experiment chooses 200 random constraints to monitor. For each of these constraints, it finds all combinations of a certain number of other constraints which, if applied, would keep the monitored constraint from being applied. Each of these combinations is listed as a single conflict. We will call the number of other constraints involved in the conflict the conflict length. Figure 3 on page 9 shows examples of conflicts. For example, the conflict between constraints A and B in the figure would be a conflict of length 1, because it includes one constraint besides A. The conflict between constraints A, C and D would be of length 2, since it includes two constraints besides A. This experiment looks for all conflicts of length 3 or smaller. We did not extend the search further because the methods of exhaustively searching for these conflicts are exponential and quickly become unmanageable (attempting to find conflicts of length 4 took more than 48 hours for a single day on some traces.) Table 2 shows the average numbers of conflicts per constraint. It is interesting to note that as expected, larger numbers of heuristics usually lead to larger numbers of conflicts. Having both sequential and placement constraints also leads to higher numbers of conflicts, since the overlap allows for multiple placement constraints to conflict through a single sequential constraint. The Thread+Shuffle combination is interesting to note, as the two heuristics are complementary and have almost no conflicts. Figure 5 shows the average number of conflicts per constraint across all of the trace sets by conflict length. Ihere are a large number of conflicts of length one and two, but very few of length three. This is probably because with our heuristics, most conflicts occur between at most two placement constraints and |