Paper Details

Reference:

Mark A. Holliday and Michael Stumm,
Performance evaluation of hierarchical ring-based shared memory multiprocessors",
IEEE Transactions on Computers, 43(1), January, 1994, pp. 52–67.

Download:

PDF

Abstract:

This paper investigates the performance of wordpacket, slotted unidirectional ring-based hierarchical direct networks in the context of large-scale shared memory multiprocessors. Slotted unindirectional rings are attractive because their electrical characteristics and simple interfaces allow for fast cycle times and large bandwidths. For large-scale systems, it is necessary to use multiple rings for increased aggregate bandwidth. Hierarchies are attractive because the topology ensures unique paths between nodes, simple node interfaces and simple inter-ring connections. To ensure that a realistic region of the design space is examined, the architecture of the network used in the Hector prototype is adopted as the initial design point. A simulator of that architecture has been developed and validated with measurements from the prototype. The system and workload parameterization reflects conditions expected in the near future. The results of our study show the importance of system balance on performance. Large-scale systems inherently have large communication delays for distant accesses, so processor efficiency will be low, unless the processors can operate with multiple outstanding transactions using techniques such as prefetching, asynchronous writes and multiple hardware contexts. However with multiple outstanding transactions and only one memory bank per processing module, memory quickly saturates. Memory saturation can be alleviated by having multiple memory banks per processing module, but this shifts the bottleneck to the ring subsystem. While the topology of the ring hierarchy affects performance, the ring subsystem will inherently limit the throughput of the system. Hence increasing the number of outstanding transactions per processor beyond a certain point only has a limiting effect on performance, since it causes some of the rings to become congested. An adaptive maximum number of outstanding transactions appears necessary to adjust for the appropriate tradeoff between concurrency and contention as the communication locality changes. We show the relationships between processor, ring and memory speeds, and their effects on performance. Using block transfers for prefetching seems unlikely to be advantageous in that the improvement in the cache hit ratio needed to compensate for the increased network utilization is substantial.

Keywords:

Computer architecture, Networks, Communication locality, Hierarchical ring-based networks, Hot spots, Large scale parallel systems, Memory banks, Performance evaluation, Prefetching, Shared memory multiprocessors, Simulation

Reference Info:

DOI: 10.1109/12.250609
ISSN: 0018-9340
OCLC: 4656780970

BibTeX:

@article(Holliday-IEEETOC94,
    author = {Mark A. Holliday and Michael Stumm},
    title = {Performance evaluation of hierarchical ring-based shared memory multiprocessors},
    volume = {43},
    number = {1},
    month = {January},
    year = {1994},
    pages = {52-67},
    doi = {10.1109/12.250609},
    issn = {0018-9340},
    keywords = {Computer architecture, Networks, Communication locality, Hierarchical ring-based networks, Hot spots, Large scale parallel systems, Memory banks, Performance evaluation, Prefetching, Shared memory multiprocessors, Simulation}
)