[1] Hill M D, Marty M R. Amdahl's law in the multicore era[J]. IEEE Computer, 2008, 41(7): 33-38.[2] Sherwood T, Perelman E, Hamerly G, et al. Discovering and exploiting program phases[J]. IEEE Micro, 2003, 23(6): 84-93.[3] Ipek E, Kirman M, Kirman N, et al. Core fusion: accommodating software diversity in chip multiprocessors[C]//Tullsen D (ed). Proceedings of ISCA'07. New York: ACM Press, 2007: 186-197.[4] Tarjan D, Boyer M, Skadron K. Federation: boosting per-thread performance of throughput-oriented manycore architectures[J]. ACM Transactions on Architecture and Code Optimization, 2010, 7(4): (article 19)1-38.[5] Zhong H, Lieberman S A, Mahlke S A. Extending multicore architectures to exploit hybrid parallelism in single-thread applications[C]//Louri A (ed). Proceedings of HPCA'07. Washington: IEEE CS, 2007: 25-36.[6] Watanabe Y, Davis J D, Wood D A. WiDGET: wisconsin decoupled grid execution tiles[C]//Seznec A(ed). Proceedings of ISCA'10. New York: ACM Press, 2010: 2-13.[7] Kim C, Sethumadhavan S, Govindan M, et al. Composable lightweight processors[C]//Bellas N (ed). Proceedings of MICRO'07. Washington: IEEE CS, 2007: 381-394.[8] Gulati D, Kim C, Sethumadhavan S, et al. Multitasking workload scheduling on flexible-core chip multiprocessors[C]//Moshovos A(ed). Proceedings of PACT'08. New York: ACM Press, 2008: 187-196.[9] Gebhart M, Maher B A, Coons K E, et al. An evaluation of the TRIPS computer system[C]//Soffa M L (ed). Proceedings of ASPLOS'09. New York: ACM Press, 2009: 1-12.[10] Burger D, Keckler S, McKinley K, et al. Scaling to the end of silicon with EDGE architectures[J]. IEEE Computer, 2004, 37(7): 44-55. [11] Mahlke S A, Lin D C, Chen W Y, et al. Effective compiler support for predicated execution using the hyperblock[C]//Hwu W (ed). Proceedings of MICRO'92. Washington: IEEE CS, 1992: 45-54.[12] Ranganathan N, Burger D, Keckler S W. Analysis of the TRIPS prototype block predictor[C]//Tullsen D(ed). Proceedings of ISPASS'09. Washington: IEEE CS, 2009: 195-206.[13] Robatmili B, Coons K E, Burger D, et al. Strategies for mapping data flow blocks to distributed hardware[C]//Gonzalez A (ed). Proceedings of MICRO'08. Washington: IEEE CS, 2008: 23-34.[14] Chaudhry, Cypher S, Ekman R, et al. ROCK: A high-performance SPARC CMT processor[J]. IEEE Micro, 2009, 29(2): 6-16.[15] Ren Y, An H, Sun T, et al. Dynamic resource tuning for flexible core chip multiprocessors[C]//Hus C (ed). Proceedings of ICA3PP'10. Heidelberg: Springer-Verlag Berlin, 2010: 32-41.[16] University of Massachusetts. Scale Compiler Toolset. http://www.cs.utexas.edu/users/cart/Scale/index.html. |