Parallel
& Distributed Computing 1. Load Balancing and Task Scheduling in
Parallel, Distributed, and Cluster Computing Environments Scheduling
and load balancing are two important problems in the area of parallel
computing. Efficient solutions to these problems will have profound
theoretical and practical implications that will affect other parallel
computing problems of similar nature. Little research attempted a generalized
approach to the above problems. The major problems encountered are due to the
interprocessor communication and delay because of
interdependency between the different subtasks of a given applications. The
mapping problem arises when the dependency structure of a parallel algorithm
differs from the processor interconnection of the parallel computer, or when
the number of processes generated by the algorithm exceeds the number of
processors available. This problem can be further complicated when the
parallel computer system contains heterogeneous components (e.g. different
processors and link speeds, such as in Cluster and Grid Architectures). This
project intends to investigate the development of new classes of algorithms
for solving a variety of scheduling and loadbalancing problems for static
and dynamic scenarios. 2.
Scheduling Communications in Cluster Computing Systems Clusters
of commodity computer systems have become the fastest growing choice for
building costeffective highperformance parallel computing platforms. The
rapid advancement of computer architectures and highspeed interconnects have
facilitated many successful deployments of this type of clusters. Researchers
in previous studies have reported that, the cluster interconnect
significantly impacts the performance of parallel applications. Highspeed
interconnects not only unveil the potential performance of the cluster, but
also allow clusters to achieve better performance/cost ratio than clusters
with traditional local area networks. Towards this end, this project aims to
study the how computations and communications influence the performance of such
systems. Applications tend to range from the computeintensive to the
communicationintensive and an understanding of such applications and how
they map efficiently onto clusters is important. 3.
Parallel Machine Learning and Stochastic Optimization Algorithms Optimization
algorithms can be used to solve a wide range of problems that arise in the
design and operation of parallel computing environments (e.g., datamining,
scheduling, routing). However, the many classical optimization techniques
(e.g., linear programming) are not suited for solving parallel processing
problems due to their restricted nature. This project is investigating the
application of some new and unorthodox optimization techniques such fuzzy
logic, genetic algorithms, neural networks, simulated annealing, ant
colonies, Tabu search, and others. However, these
techniques are computationally intensive and require enormous computing time.
Parallel processing has the potential of reducing the computational load and
enabling the efficient use of these techniques to solve a wide variety of
problems. 4. Autonomic Communications in Parallel and Distributed
Computing Systems The rapid advancement of computer
architectures and highspeed interconnects have facilitated many successful
deployments of many types of parallel and distributed systems. Researchers in
previous studies have reported that, the design of interconnects significantly
impacts the performance of parallel applications. Highspeed interconnects
not only unveil the potential performance of the computing system, but also
allow such systems to achieve better performance/cost ratio. Towards this
end, this project aims to study the how computations and communications
influence the performance of such parallel and distributed computing systems. 5. Quality of Service in
Distributed Computing Systems There
is a need to develop a comprehensive framework to determine what QoS means in the context of the distributed systems and
the services that will be provided through such infrastructure. What
complicates the scenario is that the fact the distributed systems will
provide a whole range of services and not only high performance computing.
There is a great need for the development of different QoS
metrics for distributed systems that could capture all the complexity and
provide meaningful measures for a wide range of applications. This will
possibly mean that new classes of algorithms and simulation models need to be
developed. These should be able to characterize the variety of workloads and
applications that can be used to better understand the behaviour of
distributed computing systems under different operating conditions. 6.
Healing and SelfRepair in Large Scale Distributed Computing Systems As the complexity of distributed systems increases
time
there will be a need to endow such systems with capabilities that make them
capable of operating in disaster scenarios. What makes this problem very
complex is the heterogeneous nature of today’s distributed computing
environments that could be made up of hundreds or thousands of components
(computers, databases, etc). In addition, a user in one location might not be
able to have control over other parts of the system. So it is rather logical
that there is a need for “smart” algorithms (protocols) that can achieve such
an acceptable level of faulttolerance and account for a variety of disaster
recovery scenarios.
