|
Monday, 23-Nov-2009 04:22:33 EST
558
Visitors Since 2-Sep-2009



For the following lecture notes you can download or view a lecture as an Acrobat PDF file, or as a Microsoft Powerpoint97 file:
|
9-7-2k9
|
Computer Architecture Review.
|
|
|
|
9-7-2k9
|
Simultaneous Multithreading (SMT): Performance gain potential. SMT performance evaluation vs. fine-grain multithreading, CMPs.
SMT Level one cache configuration. SMT thread instruction fetch, issue policies.
|
|
|
|
9-9-2k9
|
Compiler optimizations for SMT. SMT support for fine-grain synchronization.
|
|
|
|
9-16-2k9
|
Operating System Impact on SMT performance. Overview of Intel’s Hyper-Threading Microarchitecture & Performance.
|
|
|
|
9-21-2k9
|
High Bandwidth Instruction Fetching Techniques for Superscalar Processors: Collapsing Buffer (CB),
Branch Address Cache(BAC), Trace Cache.
|
|
|
|
9-28-2k9
|
Dynamic Branch Prediction: Basic Taxonomy of Two-Level Schemes, Hybrid Predictors,
Aliasing/Intereference Reduction Global Prediction Schemes.
|
|
|
|
10-5-2k9
|
Vector Processing: Basics and Architectures. Vector Intelligent RAM (VIRAM) Overview.
|
|
|
|
10-14-2k9
|
Digital Signal Processing (DSP) Architecture & Processors
|
|
|
|
10-21-2k9
|
Introduction to Reconfigurable Computing.
|
|
|
|
10-28-2k9
|
The Stanford Hydra: An Example CMP with Hardware Data/Thread Level Speculation (TLS) Support.
|
|
|
|
11-2-2k9
|
High Performance Computing (HPC) Trends, Heterogeneous Computing (HC), Micro-Heterogeneous Computing (MHC).
|
|
|

6:00 - 7:50 PM Monday, Wednesday, Room 9-3139.
|

he goal of this course is to acquire a good understanding of important current and emerging design
Techniques, machine structures, technology factors, evaluation methods that will determine the form of
High-performance programmable processors in the 21st Century. The topics covered include Simultaneous
Multithreading (SMT), Vector Processing, Digital Signal Processing (DSP) & Media Architectures &
Processors, Re-Configurable Computing and Processors, Advanced Branch Prediction Techniques,
Redundant Arrays of Disks (RAID). The course also provides an introduction to the main concepts of parallelism including single-chip multiprocessors.
|
Computer Architecture EECC551 (0306-551), or 0605-720.
|

Participation and class presence: 10%
Quizzes / Homework assignments: 60%
Special topics project: 30%
Quizzes:
Quizzes are announced at least one class in advance, and are given during first 30-40 minutes of the specified class.
Quizzes are closed references (e.g. no books, notes, handouts, etc.). Calculators may be helpful.
Notes on Special Topics paper and presentation:
Each Student will select and research an approved topic in the field of Computer Architecture, write a report,
and give a presentation. Each topic must be presented and approved by Dr. Shaaban.
Duplicate topics are not permitted and are accepted on a first come first serve basis.
The Paper:
Each student will write a report (~ 8-15 pages) on their research findings using
the IEEE journal format/guidelines/template.
Take great care in following the guidelines, especially properly citing sources of pictures graphs and quoting.
The paper is due (hardcopy and electronic) on the last day of the presentations. Plagiarism will result in a zero grade.
( see page 19 of the KGCOE 2009-2010 Student Handbook).
The Presentation:
Each student will give a 20-minute PowerPoint presentation of their research to the entire class.
The student should be thoroughly prepared to answer questions. A signup sheet for a time slot
will be available towards the end of the quarter. Attendance is mandatory for all presentation sessions.
You must submit your presentation electronically to Dr. Shaaban at least 24 hrs prior to your presentation time.
Samples of prior presentations will be available on the course website.
|

Reference Papers:
- Simultaneous Multithreading (SMT):
- Simultaneous Multithreading: Maximizing On-Chip Parallelism
Abstract, ,
Postscript,
PDF,
Dean Tullsen,
Susan Eggers, and
Henry Levy,
Proceedings of the 22rd Annual International Symposium on Computer Architecture, June 1995, pages 392-403.
- Exploiting Choice: Instruction Fetch and Issue on an Implementable
Simultaneous Multithreading Processor
Abstract, ,
Postscript,
PDF,
Dean Tullsen,
Susan Eggers,
Joel Emer,
Henry Levy,
Jack Lo,
and Rebecca Stamm
Proceedings of the 23rd Annual International Symposium on Computer
Architecture, May 1996, pages 191-202.
- Tuning Compiler Optimizations for Simultaneous Multithreading
Abstract, ,
Postscript,
PDF,
Jack Lo,
Susan Eggers,
Henry Levy,
Sujay Parekh, and
Dean Tullsen
Proceedings of the 30th Annual International Symposium on
Microarchitecture,
December 1997, pages 114-124.
- Supporting Fine-Grain Synchronization on a Simultaneous
Multithreaded Processor
Abstract, ,
Postscript,
PDF,
Dean Tullsen,
Jack Lo,
Susan Eggers, and
Henry Levy
Proceedings of the 5th International Symposium on High Performance
Computer Architecture, January 1999, pages 54-58.
- Software-Directed Register Deallocation for Simultaneous
Multithreaded Processors
Abstract, ,
Postscript,
PDF,
Jack Lo,
Sujay Parekh,
Susan Eggers,
Henry Levy, and
Dean Tullsen
IEEE Transactions on Parallel and Distributed Systems,
September 1999, pages 922-933.
- Instruction Recycling on a Multiple-Path Processor,
Abstract, ,
Postscript,
PDF,
Steven Wallace, Dean M. Tullsen, Brad Calder
In 5th International Symposium on High Performance
Computer Architecture, January, 1999.
- An Analysis of Operating System Behavior on a Simultaneous
Multithreaded Architecture
Postscript,
PDF,
Josh Redstone,
Susan Eggers, and
Henry Levy.
Proceedings of the 9th International Conference on Architectural Support
for Programming Languages and Operating Systems,
November 2000.
- Hyper-Threading Technology Architecture and Microarchitecture
PDF,
Deborah T. Marr et al.,
Intel Technology Journal
, Volume 6, Number 1, February 2002.
- Hyper-Threading Technology: Impact on Compute-Intensive Workloads
PDF,
William Magro et al.,
Intel Technology Journal
, Volume 6, Number 1, February 2002.
- High Bandwidth Instruction Fetching Techniques/Trace Cache:
- Increasing Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache,
PDF,
Tse-Yu Yeh, Deborah Marr, and Yale Patt,
Proc. 7th ACM International Conference on Supercomputing, Tokyo, Japan, July, 1993.
- Optimization of instruction fetch mechanisms for high issue rates,
PDF,
T. Conte, K. Menezes, P. Mills, and B. Patel.
Proc. 22nd Intl. Symp. on Computer Architecture, pp. 333-344, June 1995.
- Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching,
PDF,
E. Rotenberg, S. Bennett, and J.E. Smith,
Proc. 29th Annual International Symposium on Microarchitecture. IEEE, December 2-4, 1996.
- Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism,
PDF,
Friendly, Daniel H., Patel, Sanjay J., and Patt, Yale N.,
Proc. 30th ACM/IEEE International Symposium on Microarchitecture, November, 1997.
- Path-Based Next Trace Prediction,
PDF,
Jacobson, Quinn, Rotenberg, Eric, and Smith, James E.,
Proc. 30th Annual International Symposium on Microarchitecture, pp. 14-23, December 1997.
- The Block-based Trace Cache,
PDF,
B. Black, B. Rychlik, and J.P. Shen,
Computer Architecture News, pp 196-207, Volume 27, Number 2, May 1999.
- Improving Trace Cache Hit Rates Using the Sliding Window Fill Mechanism and Fill Select Table,
PDF,
M. Shaaban and E.Mulrane,
Proc. ACM SIGPLAN Workshop on Memory System Performance (MSP-2004), pp. 36-41, June 2004.
- Dynamic Branch Prediction:
- A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History,
PDF,
Tse-Yu Yeh and Yale N. Patt,
Proc. 20th Annual International Symposium on Computer Architecture, May 1993.
- Combining Branch Predictors,
PDF,
Scott McFarling,
WRL Technical Note TN-36, June 1993.
- Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference,
PDF,
P.-Y. Chang, M. Evers, and Y. Patt,
Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, Oct. 1996.
- Target Prediction for Indirect Jumps,
PDF,
Po-Yung Chang, Eric Hao, and Yale N. Patt,
Proc. 24th International Symposium on Computer Architecture, June 1997.
-
The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference,
PDF,
Eric Sprangle, Robert S. Chappell, Mitch Alsup, and Yale N. Patt,
Proc. 24th Annual International Symposium on Computer Architecture, 1997.
-
The bi-mode branch predictor,
PDF,
C. Lee, I. Chen, and T. Mudge,
29th Ann. IEEE/ACM Symp. Microarchitecture (MICRO-29), Dec. 1997, pp. 4-13.
-
Trading Conflict and Capacity Aliasing in Conditional Branch Predictors,
PDF,
Pierre Michaud, André Seznec, Richard Uhlig,
ISCA 1997: 292-303
-
Control Flow Speculation in Multiscalar Processors,
PDF,
Q. Jacobson, S. Bennett, N. Sharma and J. Smith,
Proceedings of the 3rd International Symposium on High-Performance
Computer Architecture, February 1997.
-
Variable Length Path Branch Prediction,
PDF,
Jared Stark, Marius Evers, and Yale N. Patt,
Proc. 8th International Conference on Architectural Support for Programming Languages and Operating Systems,
October 1998
-
The YAGS branch predictor,
PDF,
A. Eden, and T. Mudge,
Proc. 31th Ann. IEEE/ACM Symp. Microarchitecture (MICRO-31), Dec. 1998, pp. 69-77.
- Vector Processing, Vector IRAM:
- Vector Processors,
PDF,
Appendix G,
Computer Architecture: A Quantitative Approach, Third Edition, John Hennessy, and David Patterson,
Morgan Kaufmann Publishers,
May 2002.
- A Case for Intelligent DRAM: IRAM,
Postscript,
PDF,
David Patterson, et al.
IEEE Micro , April 1997.
-
Scalable Processors in the Billion Transistor Era: IRAM,
PDF,
Christoforos E. Kozyrakis et al.
IEEE Computer Special Issue: Future Microprocessors - How to use a Billion Transistors, September 1997.
- Intelligent RAM (IRAM): the Industrial Setting, Applications, and Architecture,
Postscript,
PDF,
David Patterson, at al.
ICCD '97 International Conference on Computer Design, Austin, Texas, 10-12, October 1997.
- A New Direction in Computer Architecture Research,
PDF,
Christoforos Kozyrakis, and David A. Patterson,
IEEE Computer November 1998.
- A Media-Enhanced Vector Architecture for Embedded Memory Systems,
PDF,
Christoforos Kozyrakis,
Technical Report UCB//CSD-99-1059, University of California, Berkeley, July 1999.
- Efficient FFTs On VIRAM,
Postscript,
PDF,
Randi Thomas and Katherine Yelick,
Proceeding of the 1st Workshop on Media Processors and DSPs, in Conjunction with the
32nd Annual International Symposium on Microarchitecture, Haifa, Israel, November 15, 1999.
- DSP:
- The Evolution of DSP Processors,
PDF,
A white paper by Berkeley Design Technology, Inc., 2000.
- Choosing a DSP Processor,
PDF,
A white paper by Berkeley Design Technology, Inc., 2000.
- Pocket Guide to DSP Processors and Cores,
PDF,
Berkeley Design Technology, Inc., 2002.
- Evaluating DSP Processor Performance,
PDF,
Berkeley Design Technology, Inc., 2000.
- The BDTImark2000: A Measure of DSP Execution Speed,
PDF,
Berkeley Design Technology, Inc., 2004.
- The Digital Signal Processor Derby,
PDF,
IEEE Spectrum, June 2001.
- Reconfigurable Computing:
- High-Performance Microarchitectures with Hardware-Programmable Functional Units,
PDF,
Rahul Razdan and Michael D. Smith.
Proc. 27th Annual IEEE/ACM Intl. Symp. on Microarchitecture, pp. 172-180, November 1994.
- OneChip: An FPGA Processor With Reconfigurable Logic,
PDF,
Ralph D. Wittig and Paul Chow.
Proc. 4th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM'96), pp. 126-135. March 1996.
- Garp: A MIPS Processor with a Reconfigurable Coprocessor,
PDF,
John R. Hauser and John Wawrzynek.
Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '97), 1997.
- Baring it all to Software: Raw Machines,
PDF,
Elliot Waingold et al,
IEEE Computer, pp. 86-93, September 1997.
- REMARC: Recongurable Multimedia Array Coprocessor,
PDF,
Takashi Miyamori and Kunle Olukotun
Proc. ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 1998.
- CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit,
PDF,
Zhi Alex Ye, Andreas Moshovos, Scott Hauck and Prithviraj Banerjee,
Proc. 27th Annual International Symposium on Computer architecture, pp 25-235, June 2000.
- Configurable Computing: A Survey of Systems and Software,
PDF,
Katherine Compton, Scott Hauck,
Northwestern University, Dept. of ECE, Technical Report, 1999.
- Hydra CMP:
- The Case for a Single-Chip Multiprocessor,
PDF,
Kunle Olukotun, Basem A. Nayfeh , Lance Hammond, Ken Wilson and Kun-Yung Chang,
Proceedings of the Seventh International Symposium on Architectural Support for Parallel Languages and Operating Systems, October 1996.
- A Single-Chip Multiprocessor,
PDF,
Lance Hammond, Basem A. Nayfeh and Kunle Olukotun,
IEEE Computer Special Issue on "Billion-Transistor Processors", September 1997.
- Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture,
PDF,
Lance Hammond, and Kunle Olukotun
Stanford University Computer Systems Lab Technical Report CSL-TR-98-749, February 1998.
- Data Speculation Support for a Chip Multiprocessor,
PDF,
Lance Hammond, Mark Willey, and Kunle Olukotun,
Proceedings of the Eighth ACM Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, October 1998.
- Improving the Performance of Speculatively Parallel Applications on the Hydra CMP,
PDF,
Kunle Olukotun, Lance Hammond, and Mark Willey,
Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.
- The Stanford Hydra CMP,
PDF,
Lance Hammond, Ben Hubbert , Michael Siu, Manohar Prabhu , Mike Chen , and Kunle Olukotun,
IEEE MICRO Magazine, March-April 2000.
- Virtual Memory:
- Virtual memory: Issues of implementation,
PDF,
B. Jacob, and T. Mudge,
Computer, vol. 31, no. 6, pp. 33-43. June 1998.
- Virtual memory in contemporary microprocessors,
PDF,
B. Jacob, and T. Mudge,
Micro, vol. 18, no. 4, pp. 60-75. July/Aug. 1998.
- A look at several memory management units, TLB-refill mechanisms, and page table organizations,
PDF,
B. Jacob, and T. Mudge.
8th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose CA, Oct.
1998, pp. 295-306.
- Uniprocessor virtual memory without TLBs,
PDF,
B. Jacob and T. Mudge,
IEEE Trans. Computers, vol. 50, no. 5, May 2001, pp. 482-499.
- Memory Performance Issues:
- A performance comparison of contemporary DRAM architectures,
PDF,
V. Cuppu, B. Jacob, B. Davis, T. Mudge,
Proc. of the 26th Ann. Int. Symp. Computer
Architecture, May 1999, pp. 222-233.
- DDR2 and low latency variants,
PDF,
B. Davis, T. Mudge, and B. Jacob,
Proc. Memory Wall Workshop.
In conjunction with the 26th Ann. Int. Symp. Computer
Architecture, May 2000.
- The new DRAM interfaces: SDRAM, RDRAM and variants,
PDF,
B. Davis, B. Jacob, and T. Mudge,
3rd Int. Symp. High Performance Computing,
Lecture Notes in Computer Science, 1940, Publ: Springer, Tokyo, Japan, Oct. 2000, pp. 26-31.
- Memory Latency: to Tolerate or to Reduce?,
PDF,
A. Bakshi, Jean-Luc Gaudiot, Wen-Yen Lin, M. Makhija, V. K. Prasanna, Wonwoo Ro, Chulho Shin,
The 12th Symposium on Computer Architecture and High Performance Computing,
SBAC-PAD 2000 Oct 24-27, 2000.
- I/O Performance, RAID, Unix I/O Performance:
- Maximizing Performance in a Striped Disk Array,
PDF,
P. Chen and D.A. Patterson,
Proc. 17th Annual IEEE Symposium on Computer Architecture, 1990, pp. 322-331.
- Storage Performance--Metrics and Benchmarks,
PDF,
P. Chen and D. Patterson,
Proceedings of the IEEE 81(8):1151-1165, Aug., 1993.
- RAID: HighPerformance, Reliable Secondary Storage,
PDF,
P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz and D. A. Patterson,
ACM Computing Surveys, Vol.26, No.2, June 1994, pp.145-185.
- Unix I O Performance in Workstations and Mainframes,
PDF,
Peter M. Chen, David A. Patterson,
Dept. of Electrical Engr. and Computer Science, University
of Michigan, Technical Report, CSE-TR-200-94, 1994.
- Striping in a RAID Level 5 Disk Array,
PDF,
P. Chen, P.M., AND E. Lee,
Proc. 1995 ACM SIGMETRICS Conference on Measurement and Modeling of
Computer Systems, pp.136---145, May 1995.
- Heterogeneous Computing (HC) & microHeterogeneous Computing (mHC)
- Heterogeneous Computing: Challenges and Opportunities,
PDF,
Ashfaq A. Khokhar, Viktor K. Prasanna, Muhammad E. Shaaban, Cho-Li Wang
IEEE Computer, June 1993 (Vol. 26, No. 6), pp. 18-27.
- Heterogeneous Distributed Computing ,
PDF,
Muthucumaru Maheswaran, Tracy D. Braun, Howard Jay Siegel,
Pre-copy edited version of a chapter appearing in the Encyclopedia of Electrical and
Electronics Engineering, J. G. Webster, editor, John Wiley & Sons, New York, NY, 1999
Vol. 8, pp. 679-690.
- A Comparison Study of Static Mapping Heuristics for a Class of Meta-tasks on Heterogeneous Computing Systems,
PDF,
Tracy D. Braun, Howard Jay Siegel, Noah Beck, Ladislau L. Boloni, Albert I. Reuther, Mitchell D. Theys, Bin Yao, Richard F. Freund
Proceedings 8th Heterogeneous Computing Workshop, 1999. (HCW 1999), 1999, pp. 15-29.
- Segmented min-min: a static mapping algorithm for meta-tasks on heterogeneous computing systems,
PDF,
Min-You Wu, Wei Shu, H. Zhang,
Proceedings. 9th Heterogeneous Computing Workshop, 2000. (HCW 2000), 2000, pp. 375 -385.
- Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing,
PDF,
H. Topcuoglu, S. Hariri, M.Y. Wu,
IEEE Transactions on Parallel and Distributed Systems, March 2002 (Vol. 13, No. 3).
- Greedy Heuristics for Resource Allocation in Dynamic Distributed Real-Time Heterogeneous Computing Systems,
PDF,
S. Ali, J. Kim, H. J. Siegel, A. Maciejewski, Y. Yu, S. Gundala, S. Gertphol, V. K. Prasanna,
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications
(PDPTA '02), June 2002, (Volume 2), pp. 519-530.
- Efficient Utilization of Fine-Grained Parallelism using a microHeterogeneous Environment,
PDF,
William L. Scheidel,
MS Thesis, Department of Computer Engineering, September 2002.
- Linux Kernel Support For Micro-Heterogeneous Computing,
PDF,
Kim R. Schuttenberg,
MS Thesis, Department of Computer Engineering, June 2004.
- Implementing Micro-Heterogeneous Computing:
Abstracting Auxiliary Processors in a Multi-process OS,
PDF,
Kim R. Schuttenberg and Muhammad E. Shaaban,
International Conference on
Parallel and Distributed Computing Systems (PDCS-2005), September 2005.
Reference Books:
|

Attending all lecture sessions is expected.
|

| Week1: |
EECC551 Computer Architecture Review.
Simultaneous Multithreading (SMT).
|
| Week2: |
Compiler optimizations for SMT. SMT support for fine-grain synchronization.
Operating System Impact on SMT performance.
Overview of Intel’s Hyper-Threading Microarchitecture & Performance.
|
| Week3: |
High Bandwidth Instruction Fetching Techniques for Superscalar Processors including
Conventional and Block-Based Trace Caches.
|
| Week4: |
Advanced Branch Prediction Techniques emphasizing Aliasing/Intereference Reduction
Prediction Schemes.
|
| Week5: |
Vector Processing: Basics and Architectures. Vector Intelligent RAM (VIRAM) Overview.
|
| Week6: |
Digital Signal Processing (DSP), Media Architectures & Processors.
|
| Week7: |
Re-Configurable Computing and Processors.
|
| Week8: |
The Stanford Hydra: An Example Single-Chip Multiprocessors (CMP) with Hardware
Data/Thread Level Speculation (TLS) Support.
|
| Week9: |
Virtual Memory: Implementation Issues.
Advanced Storage Systems, Bus Design, I/O Performance Measures and Benchmarks
Reliable Storage: Redundant Array of Inexpensive Disks (RAID).
|
| Week10: |
High Performance Computing (HPC) Trends, Heterogeneous Computing (HC), Micro-
Heterogeneous Computing (MHC).
|
| Week11: |
Exam/Project Presentations.
|
|
 
This page is 48Kbytes long
and was last modified on:  
Tuesday, 10-Nov-2009 16:40:13 EST.
Made with at least 30% post-consumer recycled bits 
|