A Wish List For Commercial Grid Systems
By Paul Shread
January 22, 2003
Last week's alpha release of the Globus Toolkit 3.0 was a big step
toward a Grid infrastructure standard, but Grid technology will need to
provide even more advanced features to address the needs of commercial
IT systems, according to a paper by Fujitsu researchers.
The paper, "OGSA Fundamental Services: Requirements for Commercial GRID
Systems," was authored by Hiro Kishimoto and Andreas Savva of Fujitsu
Laboratories and David Snelling of Fujitsu Laboratories of Europe.
Snelling said the paper "addresses quite high level functions, where GT3
[Globus Toolkit 3.0] will be mostly an infrastructure release. You might
find some of the control and information services a good start, but most
of what we are looking to develop for commercial Grids will need to be
built on top."
Widespread use of information technology, especially easy access to
broadband networking technologies, is placing new requirements on the
creation of commercial IT systems, the Fujitsu researchers wrote.
IT systems must provide "innovative features while ensuring high
performance and high availability under unpredictable workloads in an
extremely heterogeneous and distributed environment," they said.
"Systems must be developed at high speed with short development cycles
and provided at low initial cost so as to reach the maximum number of IT
system customers."
Systems Integrators, Internet Data Center Administrators Face
Issues
System integrators and Internet Data Center (IDC) administrators face a
number of issues, the paper noted.
For system integrators, constructing heterogeneous systems is very
difficult. Problems include making end-to-end performance predictions
and guarantees, ensuring 24/7 availability, provisioning so as to avoid
the Internet spike problem, and responding to frequent service
specification changes.
"It is almost impossible to design and construct robust IT systems in a
timely manner," Kishimoto, Savva and Snelling wrote. "Often robustness
suffers; hence we see almost weekly reports of service disruptions due
to malfunctioning IT systems."
For IDC administrators, IT system customers are interested in end-to-end
Service Level Agreements (SLA), such as agreements based on the number
of end users' requests processed per second or the maximum response time
to end users' requests.
"Unfortunately, with current tools it is virtually impossible for IDC
administrators to determine what resources are needed to ensure that a
given SLA is guaranteed," the Fujitsu paper said. "Often the result is
over-provisioning."
In order to minimize the total cost of ownership, IDC administrators try
to increase the utilization ratio of their own resources. However, since
they do not have effective predictive tools, the actual utilization
ratio is often less than one-third, one-fifth, or even worse, Kishimoto,
Savva and Snelling said. Some resources are reserved for failover and
provisioning and so are not put to productive use. "It should be
possible to share such resources among multiple systems, with physical
location not being the single determining factor whether sharing is
possible or not," the paper said.
IT systems provide infrastructure essential to daily life. "Undisrupted
operation must be ensured even in the event of disasters such as
earthquakes, fires, or acts of terrorism," Kishimoto, Savva and Snelling
wrote. Independent but networked IDCs can be used to provide the
necessary physical infrastructure, they said, but the technologies to
provide transparent and seamless operation across such an environment
are not yet available.
OGSA Could Provide A Solution
Several cutting-edge technologies and products already in the market
attempt to solve one or more of these challenges, they said, citing
IBM's Oceano Project, Sun's N1 vision, and Terraspring (acquired by
Sun). But they added that "such attempts take a proprietary approach and
have limited scope."
The Open Grid Services Architecture, which Globus Toolkit 3.0
implements, "is an open, extensible, and comprehensive architecture,
which can be used to address the difficult problems described so far,"
Kishimoto, Savva and Snelling wrote. "In particular, commercial IT
systems, with all their differences from scientific IT systems, should
form one important Service Domain of OGSA."
Part of the solution could be a common hosting environment on top of
various hardware and OS platforms, the paper said. Although not a
requirement of OGSA, implementations may be based on a Java hosting
environment. Commercial IT systems, however, need more than a common
hosting environment, they said.
Commercial IT systems need support for at least three kinds of job
requests, the paper said.
Java Program: Business process applications written in Java. The
requested job will run as an EJB component in an EJB container hosting
environment or as a Java Servlet in a Servlet Container. Most newly
written business process applications are expected to be of this type.
Job lifetimes vary.
Batch Process: Current business process applications also include
batch type jobs including periodic ones, such as monthly summary report
creation or employee payment transfers. In some cases, workflow
management is also required. Requirements may be the same as for job
execution using the GRAM interface of Globus Toolkit 2. Job requests of
this type have relatively short lifetimes.
System composition: An entire business process application may be
submitted as a single request. Such requests require the composition of
a system suitable for running the application and deployment of the
application on it. IDC administrators may submit this kind of request.
Such requests are infrequent and are expected to have very long
lifetimes.
In addition to handling the above job requests, the Fujitsu paper said
commercial IT systems require the following OGSA characteristics:
-Heterogeneous environment support, such as a variety of hardware,
operating systems, and applications;
-Scalable and hierarchical organization;
-Open standard interfaces; and
-Ability to overlay existing, multiple, underlying resource
instrumentations (SNMP, CIM).
With respect to functionality, the paper said commercial IT systems need
the following services:
System configuration management: The OGSA Common Resource Model
is used to model the system and its resources. Generated events are
handled according to user set policy.
Job execution management: Time, priority, and space based
scheduling of jobs. In case of application failure, jobs are retried
based on applicable policy.
Resource management: Dynamic and flexible resource management is
essential. At the same time, resource isolation between different jobs
is crucial, not only for access control but also to ensure that there
are no unexpected performance dependencies.
Autonomic Management: Clustering features to handle resource
failures and provisioning features for adaptive resource allocation
should be provided to enable autonomic management. The actual behavior
should be based on client provided policies.
Infrastructure services: user management, accounting management,
logging and tracing.
The paper addresses these needs in detail. These "fundamental services,"
the authors concluded, "are required for effective Grids in a commercial
environment."