The Tensor Contraction Engine
Translation to French,
by Natalie Harmann.
Translation to Russian,
by Alisa Bagrii.
The majority of software for scientific computations is written in the
lowlevel languages FORTRAN and C. The computational structure of
some of this software, however, has sufficient underlying structure
that it could benefit from specialpurpose software engineering tools
or domainspecific programming languages. E.g., electronic structure
calculations in quantum chemistry and in physics involve large
collections of tensor contractions (generalized matrix
multiplications). Currently, chemists spend weeks or months
manipulating formulas containing dozens or hundreds of terms with
Mathematica, handoptimizing the computation, and writing FORTRAN code
by hand. The computation can take on the order of 1 TFLOP week or
more and can require multiple TBs of storage.
We have developed a domainspecific language that allows chemists to
specify the computation in a highlevel Mathematicastyle language.
The compiler for this language, the Tensor Contraction Engine (TCE),
searches for an optimal implementation and generates FORTRAN code.
First, algebraic transformations are used to reduce the number
of operations. We then minimize the storage requirements to fit the
computation within the disk limits by fusing loops. We have
designed an algorithm that finds the optimal evaluation order if
intermediate arrays are allocated dynamically and are
working on combining loop fusion with dynamic memory allocation. If
the computation does not fit within the disk limits, recomputation
must be traded off for a reduction in storage requirements. If
the target machine is a multiprocessor machine, we optimize the
communication cost together with finding a fusion configuration for
minimizing storage. Finally, we minimize the data access times by
minimizing disktomemory and memorytocache traffic and
generate FORTRAN code. We have completed a first prototype of the TCE
and are working on implementing the communication minimization and
data access optimization algorithms. In future research, we will
extend this approach to handle common subexpressions, symmetric
matrices, and sparse matrices.
The Tensor Contraction Engine (TCE) is the application of compiler
optimization and sourcetosource translation technology to craft a
domain specific language for manybody theories in chemistry and
physics. The underlying equations of these theories are all expressed
as contractions of manydimensional arrays or tensors There may be
many thousands of such terms in any one problem but their regularity
means that they can be translated into efficient massively parallel
code that respects the boundedness of each level of the memory
hierarchy and minimizes overall runtime with effective tradeoff of
increased computation for reduced memory consumption. The approach has
been overwhelming successful and now NWChem contains about 1M lines of
humangenerated code and over 2M lines of machine generated code. The
resulting scientific capabilities would have taken many mandecades of
effort and new theories/models can be tested in a morning on
physically relevant systems instead of on small test systems after
months of effort. In combination with the OCE (operator contraction
engine) that turns Feynmanlike diagrams into tensor expressions the
TCE represents perhaps the first endtoend production quality example
of a solution to the semantic gap.
We are currently working on generalizing our optimization approach to
handwritten code by combining it with polyhedral model
transformations. Motivated by the successes of the modeldriven
searchbased optimization approach of the TCE and the polyhedral
modelbased parallelization of Pluto, we are working on developing an
optimization infrastructure in the ROSE Compiler that combines the key
aspects of the TCE and Pluto and provides the flexibility to continue
research on optimizing tensor computations for parallel, distributed,
and/or outofcore computations for any machine architecture,
including multicores and GPUs.
For an overview of the project, see our
Proceedings of the IEEE paper.
For more information about version 1.0 of the TCE (the "Prototype"
TCE), please, see our Getting and Using the
TCE page.
There are several components available as part of the TCE software.
For details, please, see our
TCE Software page.
Collaborators

J. Ramanujam,
ECE Division, School of Electrical Engineering and Computer Science,
Louisiana State University

P. Sadayappan
Dept. of Computer Science and Engineering, Ohio State University

David E. Bernholdt,
Oak Ridge National Laboratories

Robert J. Harrison,
Oak Ridge National Laboratories

So Hirata,
Dept. of Chemistry, University of Illinois at UrbanaChampaign

Marcel Nooijen,
Dept. of Chemistry, University of Waterloo

Russell M. Pitzer,
Dept. of Chemistry, Ohio State University
Senior Personnel

Alexander Auer,
Dept. of Molecular Theory and Spectroscopy, Max Planck Institute

Daniel Cociorva,
FICO

Venkatesh Choppella,
International Institute of Information Technology,
Hyderabad
 ChiChung Lam,
Dept. of Computer Science and Engineering, Ohio State University
Students
 Kevin Hartline
 Vishnu Khaspa
 Arvind Saini
Former Students

Pamela Bhattacharya (MS, August 2008),
Microsoft

Alina Bibireata (MS, March 2004),
GE Healthcare

Xiaoyang Gao, IBM

Albert Hartono, Intel Labs

Kit Hymel (BS, May 2013)

Sandhya Krishnan, Google

Sriram Krishnamoorthy,
Pacific Northwest National Labs.

Qingda Lu, Intel

Ajay R. Panyala (PhD, August 2014)

Srinivas Pola (MS, Decemer 2008)

Brian Poulin (BS, May 2013),
Postlethwaite & Netterville

Swarup Kumar Sahoo,
Dept. of Computer Science, University of Illinois at UrbanaChampaign

Alexander Sibiriakov, MainConcept  DivX

Vaidyanathan Sivaraman,
Dept. of Mathematical Sciences, Binghampton University

Vamshi Kodimala (MS, May 2012), Infosys
 SaiRaj Yalamanchili, LSU
Software
Publications
2013
2012

Empirical PerformanceModel Driven Data Layout Optimization
and Library Call Selection for Tensor Contraction Expressions
Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam,
P. Sadayappan.
Journal of Parallel and Distributed Computing,
Vol. 72, No. 3, March 2012, pp. 338352, doi:
10.1016/j.jpdc.2011.09.006.
2011

MemoryOptimal Evaluation of Expression Trees Involving
Large Objects
C. Lam, T. Rauber, G. Baumgartner, D. Cociorva, P. Sadayappan.
Computer Languages, Systems & Structures,
Vol. 37, No. 2, July 2011, pp. 6375, doi:
10.1016/j.cl.2010.09.003.
2009

Performance Optimization of Tensor Contraction Expressions for
ManyBody Methods in Quantum Chemistry
A. Hartono, Q. Lu, T. Henretty, S. Krishnamoorthy, H. Zhang,
G. Baumgartner, D.E. Bernholdt, M. Nooijen, R.M. Pitzer,
J. Ramanujam, P. Sadayappan.
Journal of Physical Chemistry A,
Vol. 113, No. 45, September 2009, pp. 1271512723.
2007

Efficient SearchSpace Pruning for Integrated Fusion and
Tiling Transformations
X. Gao, S. Krishnamoorthy, S.K. Sahoo, C. Lam, G. Baumgartner,
J. Ramanujam, P. Sadayappan.
Concurrency and Computation: Practice and Experience,
Vol. 19, No. 18, December 2007, pp. 24252443.
2006

Identifying CostEffective Common Subexpressions to Reduce
Operation Count in Tensor Contraction Evaluations
A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M. Nooijen,
G. Baumgartner, D.E. Bernholdt, V. Choppella, R.M. Pitzer,
J. Ramanujam, A. Rountev, P. Sadayappan.
In V.N. Alexandrov, G.D. van Albada, P.M.A. Sloot,
J.J. Dongarra (eds.):
Proceedings of the International Conference
on Computational Science 2006 (ICCS '06), Part I,
Reading, United Kingdom, 2831 May 2006.
Lecture Notes in Computer Science, Vol. 3991, SpringerVerlag,
2006, pp. 267275.

Layout Transformation Support for the Disk Resident Arrays
Framework
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha,
P. Sadayappan.
Journal of Supercomputing,
Vol. 36, No. 2, May 2006, pp. 153170.

Efficient Synthesis of OutofCore Algorithms
Using a Nonlinear Optimization Solver
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam,
J. Ramanujam, P. Sadayappan, V. Choppella.
Journal of Parallel and Distributed Computing,
Vol. 66, No. 5, May 2006, pp. 659673.

Memory Minimization for Tensor Contractions using
Integer Linear Programming
A. Allam, J. Ramanujam, G. Baumgartner, P. Sadayappan.
In Proceedings of the Workshop on Performance
Optimization for HighLevel Languages and Libraries
(POHLL '06), Rhodes Island, Greece, 29 April 2006.
In Proceedings of the International Parallel and Distributed
Processing Symposium (IPDPS '02),
IEEE Computer Society Press, 8 pages.

Efficient Parallel OutofCore Matrix Transposition
S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam,
P. Sadayappan.
International Journal on High Performance Computing and
Networking, Vol. 2, No. 2/3/4, 2006, pp. 110119.

Automatic Code Generation for ManyBody Electronic Structure
Methods: The Tensor Contraction Engine
A. Auer, G. Baumgartner, D.E. Bernholdt, A. Bibireata,
V. Choppella, D. Cociorva, X. Gao, R.J. Harrison, A. Hartono,
S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen,
R.M. Pitzer, J. Ramanujam, P. Sadayappan, A. Sibiryakov.
Molecular Physics, Vol. 104, No. 2, January 2006,
pp. 211228.

Efficient SearchSpace Pruning for Integrated Fusion and
Tiling Transformations
X. Gao, S. Krishnamoorthy, S.K. Sahoo, C. Lam, G. Baumgartner,
J. Ramanujam, P. Sadayappan.
In Proceedings of the 12th Workshop
on Compilers for Parallel Computers (CPC '06),
A Coruña, Spain, 911 January 2006, 15 pages.
2005

Efficient SearchSpace Pruning for Integrated Fusion and
Tiling Transformations
X. Gao, S. Krishnamoorthy, S.K. Sahoo, C. Lam, G. Baumgartner,
J. Ramanujam, P. Sadayappan.
In E. Ayguadé, G. Baumgartner, J. Ramanujam,
P. Sadayappan (eds.),
Proceedings of the 18th International Workshop
on Languages and Compilers for Parallel Computing (LCPC '05),
Hawthorne, New York, 2022 October 2005.
Lecture Notes in Computer Science, Vol. 4339, SpringerVerlag,
2006, pp. 215229.

Performance Modeling and Optimization of
Parallel OutofCore Tensor Contractions
X. Gao, S.K. Sahoo, Q. Lu, G. Baumgartner, C. Lam,
J. Ramanujam, P. Sadayappan.
In Proceedings of the ACM SIGPLAN 2005
Symposium on Principles and Practice of Parallel Programming
(PPoPP '05),
Chicago, Illinois, 1517 June 2005, pp. 266276.

Automated Operation Minimization of Tensor Contraction
Expressions in Electronic Structure Calculations
A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner,
D.E. Bernholdt, S. Hirata, C. Lam, R.M. Pitzer, J. Ramanujam,
P. Sadayappan,
In Proceedings of the International Conference on
Computational Science 2005 (ICCS '05),
Atlanta, Georgia, 2225 May 2005, Part I.
Lecture Notes in Computer Science, Vol. 3514, SpringerVerlag,
pp. 155164.

Automated Operation Minimization of Tensor Contraction
Expressions in Electronic Structure Calculations
A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner,
D.E. Bernholdt, S. Hirata, C. Lam, R.M. Pitzer, J. Ramanujam,
P. Sadayappan,
Technical Report No. OSUCISRC2/05TR10,
Dept. of Computer Science and Engineering, The Ohio State University,
February 2005.

Synthesis of HighPerformance Parallel Programs for a Class
of Ab Initio Quantum Chemistry Models
G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella,
D. Cociorva, X. Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy,
S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R.M. Pitzer, J. Ramanujam,
P. Sadayappan, A. Sibiryakov.
Proceedings of the IEEE,
Vol. 93, No. 2, February 2005, pp. 276292.
2004

Efficient Layout Transformation Support for Diskbased
Multidimensional Arrays
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha,
P. Sadayappan.
In L. Bougé, V.K. Prasanna (eds.),
Proceedings of the 11th Annual International Conference on
HighPerformance Computing (HiPC '04),
Bangalore, India, 1922 December 2004.
In Lecture Notes in Computer Science, Vol. 3296, SpringerVerlag,
pp. 386398.

Compiler Techniques for Efficient Parallelization of
OutofCore Tensor Contractions
X. Gao, S.K. Sahoo, Q. Lu, G. Baumgartner, C. Lam,
J. Ramanujam, P. Sadayappan.
Technical Report No. OSUCISRC12/04TR67,
Dept. of Computer Science and Engineering, The Ohio State University,
December 2004.

Layout Transformation Support for the Disk Resident Arrays
Framework
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha,
P. Sadayappan.
In Proceedings of the
Los Alamos Computer Science Initiative Symposium (LACSI '04).
Santa Fe, New Mexico. 1214 October 2004, 14 pages.

Empirical PerformanceModel Driven Data Layout Optimization
Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam,
P. Sadayappan.
In R. Eigenmann, Z. Li, S. Midkiff (eds.),
Proceedings of the 17th International Workshop on
Languages and Compilers for Parallel Computing (LCPC '04),
West Lafayette, Indiana, 2225 September 2004.
Lecture Notes in Computer Science, Vol. 3602, SpringerVerlag,
2005, pp. 7286.

A HighLevel Approach to Synthesis of HighPerformance Codes
for Quantum Chemistry: The Tensor Contraction Engine
G. Baumgartner, D.E. Bernholdt, V. Choppella, J. Ramanujam,
P. Sadayappan.
In Proceedings of the 11th Workshop
on Compilers for Parallel Computers (CPC '04),
Chiemsee, Germany, 79 July 2004, pp. 281290.

Efficient Synthesis of OutofCore Algorithms Using a
Nonlinear Optimization Solver
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam,
J. Ramanujam, P. Sadayappan, V. Choppella.
In Proceedings of the International Parallel
and Distributed Processing Symposium (IPDPS '04),
Santa Fe, New Mexico, 2630 April 2004.
IEEE Computer Society Press, Abstract p. 34b, 10 pages.
Best paper award.
2003

Data Locality Optimization for Synthesis of Efficient
OutofCore Algorithms
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, D. Cociorva,
C. Lam, P. Sadayappan, J. Ramanujam, D.E. Bernholdt, V. Choppella.
In Proceedings of the International Conference on
HighPerformance Computing (HiPC '03),
Hyderabad, India, 1720 December 2003.
Lecture Notes in Computer Science, Vol. 2913, SpringerVerlag,
pp. 406417. Best paper award.

Efficient Parallel OutofCore Matrix Transposition
S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam,
P. Sadayappan.
In Proceedings of the IEEE International Conference on
Cluster Computing (Cluster '03), Hong Kong, China,
14 December 2003.
IEEE Computer Society Press, pp. 300307.

MemoryConstrained Data Locality Optimization for
Tensor Contractions
A. Bibireata, S. Krishnan, G. Baumgartner, D. Cociorva, C. Lam,
P. Sadayappan, J. Ramanujam, D.E. Bernholdt, V. Choppella.
In L. Rauchwerger (ed.),
Proceedings of the 16th International Workshop on
Languages and Compilers for Parallel Computing (LCPC '03),
College Station, Texas, 24 October 2003.
Lecture Notes in Computer Science, Vol. 2958, SpringerVerlag,
2004, pp. 93108.

On Efficient OutofCore Matrix Transposition
S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam,
P. Sadayappan.
Technical Report OSUCISRC9/03TR52, Dept. of Computer and
Information Science, The Ohio State University, September 2003.

Global Communication Optimization for Tensor Contraction
Expressions under Memory Constraints
D. Cociorva, X. Gao, S. Krishnan, G. Baumgartner, C. Lam,
P. Sadayappan, J. Ramanujam.
In Proceedings of the 2003 International Parallel and
Distributed Processing Symposium (IPDPS '03),
Nice, France, 2226 April 2003.
IEEE Computer Society Press, Abstract p. 37b, 8 pages.

CompileTime Optimizations for Tensor Contraction Expressions
G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam.
In Proceedings of the Workshop on Compilers for Parallel
Computers (CPC '03), Amsterdam, The Netherlands,
810 January, 2003, pp 281290.
2002

A HighLevel Approach to Synthesis of HighPerformance Codes for
Quantum Chemistry
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison, S. Hirata,
C. Lam, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan.
In Proceedings of Supercomputing 2002,
Baltimore, Maryland, 1622 November 2002.
IEEE Computer Society Press, Abstract p. 5, 10 pages.

MemoryConstrained Communication Minimization for a Class of
Array Computations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam.
In B. Pugh, C. Tseng (eds.),
Proceedings of the 15th International Workshop
on Languages and Compilers for Parallel Computing (LCPC '02),
College Park, Maryland, 2527 July 2002.
Lecture Notes in Computer Science, Vol. 2481, SpringerVerlag,
2005, pp. 115.

Automatic Synthesis of HighPerformance Codes for Quantum
Chemistry Applications
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison,
C. Lam, M. Nooijen, J. Ramanujam, P. Sadayappan.
In Proceedings of the Workshop on Performance
Optimization for HighLevel Languages and Libraries
(POHLL '02), New York, New York, 22 June 2002, 10 pages.

SpaceTime TradeOff Optimization for a Class of Electronic
Structure Calculations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan,
J. Ramanujam, M. Nooijen, D.E. Bernholdt, R.J. Harrison.
In Proceedings of the ACM SIGPLAN 2002 Conference on
Programming Language Design and Implementation (PLDI '02),
Berlin, Germany, 1719 June 2002, pp. 177186.

A Performance Optimization Framework for Compilation of Tensor
Contraction Expressions into Parallel Programs
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison,
C. Lam, M. Nooijen, J. Ramanujam, P. Sadayappan.
7th International Workshop on HighLevel Parallel
Programming Models and Supportive Environments (HIPS '02),
In Proceedings of the International Parallel and Distributed
Processing Symposium (IPDPS '02),
Fort Lauderdale, Florida, 15 April 2002,
IEEE Computer Society, pp. 106114.
2001

Towards Automatic Synthesis of HighPerformance Codes for
Electronic Structure Calculations: Data Locality Optimization
D. Cociorva, J. Wilkins, G. Baumgartner, P. Sadayappan, J. Ramanujam,
M. Nooijen, D.E. Bernholdt, R.J. Harrison.
In Proceedings of the International Conference on
HighPerformance Computing (HiPC '01),
Hyderabad, India, 1721 December 2001.
Lecture Nodes in Computer Science, Vol. 2228, SpringerVerlag,
pp. 237248.

SpaceTime TradeOff Optimization for a Class of Electronic
Structure Calculations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan,
J. Ramanujam, M. Nooijen, D.E. Bernholdt, R.J. Harrison.
Technical Report No. OSUCISRC11/01TR24,
Dept. of Computer and Information Science, The Ohio State University,
November 2001.

Loop Optimizations for a Class of MemoryConstrained
Computations
D. Cociorva, J. Wilkins, C. Lam, G. Baumgartner, P. Sadayappan,
J. Ramanujam.
In Proceedings of the 15th ACM International Conference on
Supercomputing (ICS '01),
Sorrento, Italy, 1621 June 2001, pp. 103113.

MemoryOptimal Evaluation of Expression Trees Involving Large
Objects
C. Lam, D. Cociorva, G. Baumgartner, P. Sadayappan.
Technical Report No. OSUCISRC5/99TR13,
Dept. of Computer and Information Science, The Ohio State University,
May 1999, updated March 2001.
1999

MemoryOptimal Evaluation of Expression Trees Involving Large
Objects
C. Lam, D. Cociorva, G. Baumgartner, P. Sadayappan.
In Proceedings of the 1999 International Conference on High
Performance Computing (HiPC '99), Calcutta, India,
1720 December 1999, IEEE Computer Society.
Lecture Notes in Computer Science, Vol. 1745, SpringerVerlag,
pp. 103110.

Optimization of Memory Usage Requirement for
a Class of Loops Implementing MultiDimensional Integrals
C. Lam, D. Cociorva, G. Baumgartner, P. Sadayappan,
In J. Ferrante, L. Carter (eds.),
Proceedings of the 12th International Workshop on
Languages and Compilers for Parallel Computing (LCPC '99),
San Diego, California, 46 August 1999.
Lecture Notes in Computer Science, Vol. 1863, SpringerVerlag,
pp. 350364.
Funding
This material is based upon work supported by the National Science
Foundation under the following grants. Any opinions, findings,
and conclusions or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the views of
the National Science Foundation.
 NSF
CISE Computing Research Infrastructure (CRI) Program,
Award #1059417,
June 2011  May 2016.

NSF
Experimental Program to Stimulate Competitive Research (EPSCOR),
Award #1003897,
Oct. 2014  Sep. 2015.
 NSF
Foundations of Computing Processes and Artifacts (CPA) Program,
Award #0541409,
May 2006  Apr. 2011.
 NSF
Computer Systems Research (CSR) Program,
Award #0509442,
Sept. 2005  Aug. 2009.
 NSF
Information Technology Research (ITR) Program,
Award #0121676,
Sept. 2001  Sept. 2007.
Gerald Baumgartner
Last modified: Fri Jul 10 17:07:25 CDT 2015