Appears in Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04).
San Francisco, CA. March 2004.
Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks
Jiri Schindler, Steven W. Schlosser, Minglong Shao, Anastassia Ailamaki, Gregory R. Ganger
Carnegie Mellon University
Abstract
The Atropos logical volume manager allows applications
to exploit characteristics of its underlying collection of
disks. It stripes data in track-sized units and explicitly
exposes the boundaries, allowing applications to maximize
efficiency for sequential access patterns even when
they share the array. Further, it supports efficient diagonal
access to blocks on adjacent tracks, allowing applications
to orchestrate the layout and access to twodimensional
data structures, such as relational database
tables, to maximize performance for both row-based and
column-based accesses.
1 Introduction
Many storage-intensive applications, most notably
database systems and scientific computations, have
some control over their access patterns. Wanting the
best performance possible, they choose the data layout
and access patterns they believe will maximize I/O efficiency.
Currently, however, their decisions are based on
manual tuning knobs and crude rules of thumb. Application
writers know that large I/Os and sequential patterns
are best, but are otherwise disconnected from the underlying
reality. The result is often unnecessary complexity
and inefficiency on both sides of the interface.
Today’s storage interfaces (e.g., SCSI and ATA) hide
almost everything about underlying components, forcing
applications that want top performance to guess and
assume [7, 8]. Of course, arguing to expose more information
highlights a tension between the amount of
information exposed and the added complexity in the
interface and implementations. The current storage interface,
however, has remained relatively unchanged for
15 years, despite the shift from (relatively) simple disk
drives to large disk array systems with logical volume
managers (LVMs). The same information gap exists inside
disk array systems—although their LVMs sit below
a host’s storage interface, most do not exploit devicespecific
features of their component disks.
This paper describes a logical volume manager, called
Atropos (see Figure 1), that exploits information about
its component disks and exposes high-level information
about its data organization. With a new data organization
and minor extensions to today’s storage interface,
Now with EMC Corporation.
APPLICATION
disk drive
parameters
I/O
requests
Atropos LVM
disk array
LVM
parameters
explicit hints
to applications
layout w/ efficient
host
2
1
data access
Figure 1: Atropos logical volume manager architecture. Atropos
exploits disk characteristics (arrow 1), automatically extracted from
disk drives, to construct a new data organization. It exposes high-level
parameters that allow applications to directly take advantage of this
data organization for efficient access to one- or two-dimensional data
structures (arrow 2).
it accomplishes two significant ends. First, Atropos exploits
automatically-extracted knowledge of disk track
boundaries, using them as its stripe unit boundaries. By
also exposing these boundaries explicitly, it allows applications
to use previously proposed “track-aligned extents”
(traxtents), which provide substantial benefits for
mid-sized segments of blocks and for streaming patterns
interleaved with other I/O activity [22].
Second, Atropos uses and exposes a data organization
that lets applications go beyond the “only one dimension
can be efficient” assumption associated with
today’s linear storage address space. In particular, twodimensional
data structures (e.g., database tables) can be
laid out for almost maximally efficient access in both
row- and column-orders, eliminating a trade-off [ 15]
currently faced by database storage managers. Atropos
enables this by exploiting automatically-extracted
knowledge of track/head switch delays to support semisequential
access: diagonal access to ranges of blocks
(one range per track) across a sequence of tracks.
In this manner, a relational database table can be laid
out such that scanning a single column occurs at streaming
bandwidth (for the full array of disks), and reading
a single row costs only 16%–38% more than if it had
been the optimized order. We have implemented Atropos
as a host-based LVM, and we evaluate it with both
database workload experiments (TPC-H) and analytic
models. Because Atropos exposes its key parameters explicitly,
these performance benefits can be realized with
no manual tuning of storage-related application knobs.
|