Loading…

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Technical Talk [clear filter]
Wednesday, October 17
 

10:30am

Coroutine Representations and ABIs in LLVM
Coroutines can serve as the basis for implementing many powerful language features. In this talk, we will discuss coroutines holistically and explore requirements and trade-offs at different stages in their translation. For this purpose, we will introduce several prospective language features in the Swift programming language and discuss how the differences between them affect how they should best be represented and optimized in both Swift's high-level SIL intermediate representation and in LLVM's lower-level IR. We will also contrast Swift's requirements with those imposed by the draft C++ coroutines TS and explain how the differences between languages lead to differences in the LLVM representation. Finally, we will discuss various final ABIs for lowering coroutines and talk about their capabilities and trade-offs.

Speakers

Wednesday October 17, 2018 10:30am - 11:00am
1 - General Session (Rm LL20ABC)

11:00am

Build Impact of Explicit and C++ Standard Modules
Somewhat a continuation of my 2017 LLVM Developers' Meeting talk: The Further Benefits of Explicit Modularization.

Examine and discuss the build infrastructure impact of explicit modules working from the easiest places and rolling in the further complications to see where we can end up.

* Explicit modules with no modularized dependencies * Updating a build system (like CMake) to allow the developer to describe modular groupings and use that information to build modules and modular objects and link those modular objects in the final link * Updating a build system to cope with proposed C++ standardized modules * How C++ standardized modules (& Clang modules before them) differ from other language modularized systems - non-portable binary format and the challenges that presents for build systems * Possible solutions * implicit modules * explicit cache path * interaction with the compiler for module dependence graph discovery * similar to include path discovery * callback from the compiler

There's a lot of unknowns in this space - the goal of this talk is to at the very least discuss those uncertainties and why they are there, and/or discuss any conclusions from myself and the C++ standardization work (Richard Smith, Nathan Sitwell, and others) that is ongoing.

Speakers
DB

David Blaikie

Software Engineer, Google Inc.


Wednesday October 17, 2018 11:00am - 11:30am
1 - General Session (Rm LL20ABC)

11:00am

Stories from RV: The LLVM vectorization ecosystem
Vectorization in LLVM has long been restricted to explicit vector instructions, SLP vectorization or the automatic vectorization of inner-most loops. As the VPlan infrastructure is maturing it becomes apparent that the support API provided by the LLVM ecosystem needs to evolve with it. Apart from short SIMD, new ISAs such as ARM SVE, the RISC-V V extension and NEC SX-Aurora pose new requirements and challenges to vectorization in LLVM. To this end, the Region Vectorzer is a great experimentation ground for dealing with issues that sooner or later will need to be resolved for the LLVM vectorization infrastructure. These include the design of a flexible replacment for the VECLIB mechanism in TLI, inter-procecural vectorization and the development of a LLVM-SVE backend for NEC SX-Aurora. The idea of the talk is to provide data points to inform vectorization-related design decisions in LLVM based on our experience with the Region Vectorizer.

Speakers
SM

Simon Moll

Researcher/PhD Student, Saarland University


Wednesday October 17, 2018 11:00am - 11:30am
2 - Technical Talk (Rm LL21AB)

11:30am

Efficiently Implementing Runtime Metadata with LLVM
Rich runtime metadata can enable powerful language features and tooling support, but also comes with code size, memory usage, and startup time costs. To mitigate these costs, the Swift programming language compiler uses some clever techniques and under-explored corners of LLVM to optimize metadata to minimize size, startup time, and memory costs while making it usable both in-process and offline, avoiding some of the costs traditionally associated with vtables, RTTI, and other data structures in languages like C++. This talk goes into detail of some of these techniques, including using relative references to make metadata position-independent, using mangled type names as a compact and offline-interpretable representation of language concepts, and organizing optional reflection metadata into its own segment of binaries so it can be discovered at load time and optionally stripped from binaries in cases where it is not desired. These techniques could also be applied to other languages, including C++, to reduce the costs of these data structures.

Speakers
DG

Doug Gregor

dgregor@apple.com
JG

Joe Groff

jgroff@apple.com


Wednesday October 17, 2018 11:30am - 12:00pm
1 - General Session (Rm LL20ABC)

11:30am

Outer Loop Vectorization in LLVM: Current Status and Future Plans
We recently proposed adding an alternative VPlan native path in Loop Vectorizer (LV) to implement support for outer loop vectorization. In this presentation, we first give a status update and discuss progress made since our initial proposal. We briefly talk about the addition of a VPlan-native code path in LV, initial explicit outer loop vectorization support, cost modelling and vector code generation in the VPlan-native path. We also summarize the current limitations.

Next, we introduce VPlan-to-VPlan transformations, which highlight a major strength of VPlan infrastructure. Different vectorization strategies can be modelled using the VPlan representation which allows reuse of VPlan-based cost modelling and code generation infrastructure. Starting from an initial VPlan, a set of VPlan-to-VPlan transformations can be applied, resulting in a set of plans representing different optimization strategies (e.g. interleaving of memory accesses, using SLP opportunities, predication). These plans can then be evaluated against each other and code generated for the most profitable one. We present VPlan-based SLP and predication as concrete examples of VPlan-to-VPlan transformation.

We end this talk with a discussion of the next steps in the VPlan roadmap. In particular, we discuss plans to achieve convergence of the inner loop and VPlan-native vectorization paths. We present opportunities to get involved with VPlan development and possibilities for collaboration. Furthermore, we discuss how vectorization for scalable vector architectures could fit into VPlan. We also plan to organize a VPlan focused hacker’s table after the talk, to provide a space for more in-depth discussions relating to VPlan.

Speakers
SG

Satish Guggilla

Intel Corporation


Wednesday October 17, 2018 11:30am - 12:00pm
2 - Technical Talk (Rm LL21AB)

12:00pm

Extending the SLP vectorizer to support variable vector widths
The SLP Vectorizer performs auto-vectorization of straight-line code. It works by scanning the code looking for scalar instructions that can be grouped together, and then replacing each group with its vectorized form. In this work we show that the current design of the SLP pass in LLVM cannot efficiently handle code patterns that require switching from one vector width to another. We provide detailed examples of when this happens and we show in detail why the current design is failing. We present a non-intrusive design based on the existing SLP Vectorization pass, that addresses this issue and improves performance.

Speakers
VP

Vasileios Porpodas

Intel Corporation


Wednesday October 17, 2018 12:00pm - 12:30pm
2 - Technical Talk (Rm LL21AB)

12:00pm

Sound Devirtualization in LLVM
Devirtualization is an optimization transforming virtual calls into direct calls.

The first proposed model for handling devirtualization for C++ in LLVM, that was enabled by -fstrict-vtable-pointers flag, had an issue that could potentially cause miscompilation. We took a step back and built the model in more structured way, thinking about semantics of the dynamic pointers, not what kind of barriers we need to use and what kind of transformations we can do on them to make it work. Our new model fixes this issue and enables more optimizations. In this talk we are going to explain how it works and what are the next steps to turn it on by default.

Speakers
avatar for Piotr Padlewski

Piotr Padlewski

masters student, University of Warsaw
KP

Krzysztof Pszeniczny

University of Warsaw


Wednesday October 17, 2018 12:00pm - 12:30pm
1 - General Session (Rm LL20ABC)

4:30pm

Faster, Stronger C++ Analysis with the Clang Static Analyzer
Over the last year we’ve made the Clang Static Analyzer faster and improved its C++ support. In this talk, we will describe how we have sped up the analyzer by changing the order in which it explores programs to bias it towards covering code that hasn’t been explored yet. In contrast with the previous exploration strategy, which was based on depth-first search, this coverage-guided approach gives shorter, more understandable bug reports and can find up to 20% more bugs on typical code bases. We will also explain how we’ve reduced C++ false positives by providing infrastructure in Clang’s control-flow graph to help the analyzer understand the myriad of ways in which C++ objects can be constructed, destructed, and have their lifetime extended. This infrastructure will also make it easier for the analyzer to support C++ as the language continues to evolve.

Speakers

Wednesday October 17, 2018 4:30pm - 5:00pm
1 - General Session (Rm LL20ABC)

4:30pm

Porting Function merging pass to thinlto
In this talk I'll discuss the process of porting the Merge function pass to thinlto infrastructure. Funcion merging (FM) is an interprocedural pass useful for code-size optimization. It deduplicates common parts of similar functions and outlines them to a separate function thus reducing the code size. This is particularly useful for code bases making heavy use of templates which gets instantiated in multiple translation units.

Porting FM to thinlto offers leveraging its functionality to dedupe functions across entire program. I'll discuss the engineering effort required to port FM to thinlto. Specifically, functionality to uniquely identify similar functions, augmenting function summary with a hash code, populating module summary index, modifying bitcode reader+writer, and codesize numbers on open source benchmarks.

Speakers
AK

Aditya Kumar

Senior Compiler Engineer, Facebook


Wednesday October 17, 2018 4:30pm - 5:00pm
2 - Technical Talk (Rm LL21AB)

5:00pm

Improving code reuse in clang tools with clangmetatool
This talk will cover the lessons we learned from the process of writing tools with Clang's LibTooling. We will also introduce clangmetatool, the open source framework we use (and developed) to reuse code when writing Clang tools.

When we first started writing Clang tools, we realized that there is a lot of lifecycle management that we had to repeat. In some cases, people advocate for the usage of global variables to manage the lifecycle of that data, but this actually makes code reuse across tools even harder.

We also learned that, when writing a tool, it is beneficial if the code is split into two phases -- a data collection phase and, later, a post-processing phase which actually performs the bulk of the logic of the tool.

More details at https://bloomberg.github.io/clangmetatool/

Speakers
DR

Daniel Ruoso

Bloomberg


Wednesday October 17, 2018 5:00pm - 5:30pm
2 - Technical Talk (Rm LL21AB)

5:00pm

Memory Tagging, how it improves C++ memory safety, and what does it mean for compiler optimizations
Memory safety in C++ remains largely unresolved. A technique usually called "memory tagging" may dramatically improve the situation if implemented in hardware with reasonable overhead. In this talk we will describe three existing implementations of memory tagging. One is SPARC ADI, a full hardware implementation. Another is HWASAN, a partially hardware-assisted LLVM-based tool for AArch64. Last but not least, ARM MTE, a recently announced hardware extension for AArch64. We describe the basic idea, evaluate the three implementations, and explain how they improve memory safety. We'll pay extra attention to compiler optimizations required to support memory tagging efficiently.
If you know what AddressSanitizer (ASAN) is, think of Memory Tagging as of "Low-overhead ASAN on steroids in hardware". This talk is partially based on the paper “Memory Tagging and how it improves C/C++ memory safety” (https://arxiv.org/pdf/1802.09517.pdf)

Speakers
avatar for Kostya Serebryany

Kostya Serebryany

Software Engineer, Google
Konstantin (Kostya) Serebryany is a Software Engineer at Google. His team develops and deploys dynamic testing tools, such as AddressSanitizer and ThreadSanitizer. Prior to joining Google in 2007, Konstantin spent 4 years at Elbrus/MCST working for Sun compiler lab and then 3 years... Read More →



Wednesday October 17, 2018 5:00pm - 5:30pm
1 - General Session (Rm LL20ABC)
 
Thursday, October 18
 

10:00am

Graph Program Extraction and Device Partitioning in Swift for TensorFlow
Swift for Tensorflow (https://github.com/tensorflow/swift) is an Open Source project that provides a new way to develop machine learning models. It combines the usability/debuggability of imperative “define by run” programming models (like TensorFlow Eager and PyTorch) with the performance of TensorFlow session/XLA (graph compilation).

In this talk, we describe the design and implementation of deabstraction, Graph Program Extraction (GPE) and device partitioning used by Swift for TensorFlow. These algorithms rely on aggressive mid-level transformations that incorporate techniques including inlining, program slicing, interpretation, and advanced control flow analysis. While the initial application of these algorithms is to TensorFlow and machine learning, these algorithms may be applied to any domain that would benefit from an imperative definition of a computation graph, e.g. for high performance accelerators in other domains.


Thursday October 18, 2018 10:00am - 10:30am
1 - General Session (Rm LL20ABC)

10:00am

Working with Standalone Builds of LLVM sub-projects
There are two ways to build LLVM sub-projects, the first is to place the sub-project source code in the tools or project directory of the LLVM tree and build everything together. The second way is to build the sub-projects standalone against a pre-compiled build of LLVM.

This talk will focus on how to make standalone builds of sub-projects like clang, lld, compiler-rt, lldb, and libcxx work and how this method can be used to help reduce build times for both developers and CI systems. In addition, we will look at the cmake helpers provided by LLVM and how they are used during the standalone builds and also how you can use them to build your own LLVM-based project in a standalone fashion.

Speakers

Thursday October 18, 2018 10:00am - 10:30am
2 - Technical Talk (Rm LL21AB)

11:00am

Art Class for Dragons: Supporting GPU compilation without metadata hacks!
Modern programming languages targeting GPUs include features that are not commonly found in conventional programming languages, such as C and C++, and are, therefore, not natively representable in LLVM IR.

This limits the applicability of LLVM to target GPU hardware for both graphics and massively parallel compute applications. Moreover, the lack of a unified way to represent GPU-related features has led to different and mutually incompatible solutions across different vendors, thereby limiting interoperability of LLVM-based GPU transformation passes and tools.

Many features within the Vulkan graphics API and language [1] highlight the diversity of GPU hardware. For example, Vulkan allows different attributes on structures that specify different memory padding rules. Such semantic information is currently not natively representable in LLVM IR. Graphics programming models also make extensive use of special memory regions that are mapped as address spaces in LLVM. However, no semantic information is attributed to address spaces at the LLVM IR level and the correct behaviour and transformation rules have to be inferred from the address space within the compilation passes.

As some of these features have no direct representation in LLVM, various translators, e.g SPIR-V->LLVM translator [2], Microsoft DXIL compiler [3], AMD's OpenSource compiler for Vulkan [4], make use of side features of LLVM IR, such as metadata and intrinsics, to represent the semantic information that cannot be easily captured. This creates an extra burden on compilation passes targeting GPU hardware as the semantic information has to be recreated from the metadata. Additionally, some translators such as the Microsoft DXIL compiler have forked the Clang and LLVM repositories and made proprietary changes to the IR in order to more easily support the required features natively. A more general approach would be to look at how upstream LLVM can be augmented to represent some, if not all, of the semantic information required for massively parallel SIMD, SPMD, and in general, graphics applications.

This talk will look at the proprietary LLVM IR modifications made in translators such as the Khronos SPIRV-LLVM translator, AMDs open source driver for Vulkan SPIRV, the original Khronos SPIR specification [5], Microsoft's DXIL compiler and Nvidia's NVVM specification [6]. The aim is to extract a common set of features present in modern graphics and compute languages for GPUs, describe how translators are currently representing these features in LLVM and suggest ways of augmenting the LLVM IR to natively represent these features. The intention with this talk is to open up a dialogue among IR developers to look at how we can, if there is agreement, extend LLVM in a way that supports a more diverse set of hardware types.

[1] - https://www.khronos.org/registry/vulkan/ [2] - https://github.com/KhronosGroup/SPIRV-LLVM-Translator [3] - https://github.com/Microsoft/DirectXShaderCompiler/blob/master/docs/DXIL.rst [4] - https://github.com/GPUOpen-Drivers/AMDVLK [5] - https://www.khronos.org/registry/SPIR/specs/spir_spec-2.0.pdf [6] - https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf

Speakers

Thursday October 18, 2018 11:00am - 11:30am
2 - Technical Talk (Rm LL21AB)

11:00am

Understanding the performance of code using LLVM's Machine Code Analyzer (llvm-mca)
llvm-mca is a LLVM based tool that uses information available in LLVM’s scheduling models to statically measure the performance of machine code in a specific CPU. The goal of this tool is not just to predict the performance of the code when run on the target, but also to help with diagnosing potential performance issues. In this talk we, will discuss how llvm-mca works and walk the audience through example uses of this tool.

Speakers
AD

Andrea Di Biagio

Senior Compiler Engineer, Sony Interactive Entertainment
avatar for Matt Davis

Matt Davis

Compiler Engineer, Sony Interactive Entertainment
(void *)0


Thursday October 18, 2018 11:00am - 11:30am
1 - General Session (Rm LL20ABC)

11:30am

Profile Guided Function Layout in LLVM and LLD
The layout of code in memory can have a large impact on the performance of an application. This talk will cover the reasons for this along with the design, implementation, and performance results of LLVM and LLD's new profile guided function layout pipeline. This pipeline leverages LLVM's profile guided optimization infrastructure and is based on the Call-Chain Clustering heuristic.

Speakers
avatar for Michael Spencer

Michael Spencer

Compiler Engineer, Apple


Thursday October 18, 2018 11:30am - 12:00pm
1 - General Session (Rm LL20ABC)

12:00pm

Optimizing Indirections, using abstractions without remorse.
Indirections, either through memory, library interfaces, or function pointers, can easily induce major performance penalties as the current optimization pipeline is not able to look through them. The available inter-procedural-optimizations (IPO) are generally not well suited to deal with these issues as they require all code to be available and analyzable through techniques based on tracking value dependencies. Importantly, the use of class/struct objects and (parallel) runtime libraries commonly introduce indirections that prohibit basically all optimizations. In this talk, we introduce these problems with real-world examples and show how new analyses can mitigate them. We especially focus on:

- A field-sensitive, inter-procedural memory analysis that models simple communication through memory. - The integration of indirect, potentially hidden, call sites, e.g., in libraries like the OpenMP runtime library, into existing analysis and optimizations (function attribute detection, argument promotion, …). - Automatic and portable (non-LLVM specific) information transfer from library implementations through their interfaces to the user sites.

While our work originates in the optimization of parallel code, we want to show how the problems we encountered there are similar to existing ones in sequential programs. To this end, we try to augment the available analyses and optimizations rather than introducing new ones that are specific to parallel programs. As a consequence, we not only expect positive results for parallel code regions [1], but also hope to improve generic code that employs indirections or simply exhibit optimization opportunities similar to the ones that commonly arise for parallel programs.

The goal of this talk is to introduce possible solutions to several problems that commonly prevent optimization of code featuring indirections. As we want to introduce these solutions into the LLVM codebase, we hope to start a discussion on these issues as well as the caveats that we encountered while resolving them.

[1] https://www.youtube.com/watch?v=u2Soj49R-i4

Speakers
avatar for Johannes Doerfert

Johannes Doerfert

Argonne National Laboratory


Thursday October 18, 2018 12:00pm - 12:30pm
1 - General Session (Rm LL20ABC)

2:00pm

Lessons Learned Implementing Common Lisp with LLVM over Six Years.
I will present the lessons learned while using LLVM to efficiently implement a complex memory managed, dynamic programming language within which everything can be redefined on the fly. I will present Clasp, a new Common Lisp compiler and programming environment that uses LLVM as its back-end and that interoperates smoothly with C++/C. Clasp is written in both C++ and Common Lisp. The Clasp compiler is written in Common Lisp and makes extensive use of the LLVM C++ API and the ORC JIT to generate native code both ahead of time and just in time. Among its unique features, Clasp uses a compacting garbage collector to manage memory, incorporates multithreading, uses C++ compatible exception handling to achieve stack unwinding and an incorporates an advanced compiler written in Common Lisp to achieve performance that approaches that of C++. Clasp is being developed as a high-performance scientific and general purpose programming language that makes use of available C++ libraries.

Speakers
avatar for Christian Schafmeister

Christian Schafmeister

Temple University
I'm a professor of Chemistry developing large molecules to solve real-world problems. We have developed a software environment (Cando) for designing these molecules that implements a Common Lisp compiler, incorporates many C++ libraries, interoperates with C++ and uses LLVM as the... Read More →


Thursday October 18, 2018 2:00pm - 2:30pm
1 - General Session (Rm LL20ABC)

2:00pm

Loop Transformations in LLVM: The Good, the Bad, and the Ugly
Should loop transformations be done by the compiler, a library (such as Kokkos, RAJA, Halide) or be subject of (domain specific) programming languages such as CUDA, LIFT, etc? Such optimizations can take place on more than one level and the decision for the compiler-level has already been made in LLVM: We already support a small zoo of transformations: Loop unrolling, unroll-and-jam, distribution, vectorization, interchange, unswitching, idiom recognition and polyhedral optimization using Polly. When clear that we want loop optimizations in the compiler, why not making them as good as possible?

Today, with the exception of some shared code and analyses related to vectorization , LLVM loop passes don't know about each other. This makes cooperation between them difficult, and that includes difficulty in heuristically determining whether some combination of transformations is likely to be profitable. With user-directed transformations such as #pragma omp parallel for, #pragma clang loop vectorize(enable), the only order these transformations can be applied is the order of the passes in the pipeline.

In this talk, we will explore what already works well (e.g. vectorization of inner loops), things that do not work as well (e.g. loop passes destroying each other's structures), things that becomes ugly with the current design if we want to support more loop passes (e.g. exponential code blowup due to each pass doing its own loop versioning) and possible solutions.

Speakers
MK

Michael Kruse

Argonne National Lab


Thursday October 18, 2018 2:00pm - 2:30pm
2 - Technical Talk (Rm LL21AB)

2:30pm

Implementing an OpenCL compiler for CPU in LLVM
Compiling a heterogeneous language for a CPU in an optimal way is a challenge, OpenCL C/SPIR-V specifics require additions and modifications of the old-fashioned driver approach and compilation flow. Coupled together with aggressive just-in-time code optimizations, interfacing with OpenCL runtime, standard OpenCL C functions library, etc. implementation of OpenCL for CPU comprises a complex structure. We’ll cover Intel’s approach in hope of revealing common patterns and design solutions, discover possible opportunities to share and collaborate with other OpenCL CPU vendors under LLVM umbrella! This talk will describe the compilation of OpenCL C source code down to machine instructions and interacting with OpenCL runtime, illustrate different paths that compilation may take for different modes (classic online/OpenCL 2.1 SPIR-V path vs. OpenCL 1.2/2.0 with device-side enqueue and generic address space), put particular emphasis on the resolution of CPU-unfriendly OpenCL aspects (barrier, address spaces, images) in the optimization flow, explain why OpenCL compiler frontend can easily handle various target devices (GPU/CPU/FGPA/DSP etc.) and how it all neatly revolves around LLVM/clang & tools.

Speakers

Thursday October 18, 2018 2:30pm - 3:00pm
1 - General Session (Rm LL20ABC)

2:30pm

Methods for Maintaining OpenMP Semantics without Being Overly Conservative
The SSA-based LLVM IR provides elegant representation for compiler analyses and transformations. However, it presents challenges to the OpenMP code generation in the LLVM backend, especially when the input program is compiled under different optimization levels. This paper presents a practical and effective framework on how to perform the OpenMP code generation based on the LLVM IR. In this presentation, we propose a canonical OpenMP loop representation under different optimization levels to preserve the OpenMP loop structure without being affected by compiler optimizations. A code-motion guard intrinsic is proposed to prevent code motion across OpenMP regions. In addition, a utility based on the LLVM SSA updater is presented to perform the SSA update during the transformation. Lastly, the scope alias information is used to preserve the alias relationship for backend-outlined functions. This framework has been implemented in Intel’s LLVM compiler.

Speakers
JL

Jin Lin

Intel


Thursday October 18, 2018 2:30pm - 3:00pm
2 - Technical Talk (Rm LL21AB)

3:00pm

Developer Toolchain for the Nintendo Switch
Nintendo Switch was developed using Clang/LLVM for the developer tools and C++ libraries. We describe how we converted from using almost exclusively proprietary tools and libraries to open tools and libraries. We’ll also describe our process for maintaining our out-of-tree toolchain and what we’d like to improve.

We started with Clang, binutils, and LLVM C++ libraries (libc++, libc++abi) and other open libraries. We will also describe our progress in transitioning to LLD and other LLVM binutils equivalents. Additionally, we will share some of our performance results using LLD and LTO.

Finally, we’ll discuss some of the areas that are important to our developers moving forward.

Speakers
avatar for Bob Campbell

Bob Campbell

Principle Engineer, Nintendo Technology Development


Thursday October 18, 2018 3:00pm - 3:30pm
1 - General Session (Rm LL20ABC)

3:00pm

Revisiting Loop Fusion, and its place in the loop transformation framework.
Despite several efforts [1-3], loop fusion is one of the classical loop optimizations still missing in LLVM. As we are currently working to remedy this situation, we want to share our experience in designing, implementing, and tuning a new loop transformation pass. While we want to explain how loop fusion can be implemented using the set of existing analyses, we also plan to talk about the current loop transformation framework and extensions thereof. We currently plan to include:

- The interplay between different existing loop transformations. - A comparison to the IBM/XL loop optimization pipeline. - Source level guidance of loop transformations. - Shortcomings of the current infrastructure, especially loop centric dependence analyses. - Interaction with polyhedral-model-backed dependence information.

The (default) loop optimizations performed by LLVM are currently lacking transformations and tuning. One reason is the absence of a dedicated framework that provides the necessary analyses information and heuristics. With the introduction of loop fusion we want to explore how different transformations could be used together and what a uniform dependence analysis for multiple loops could look like. The latter is explored with regards to a Scalar Evolution (or SCEV) based dependence analysis, like the current intra-loop access analysis, and a polyhedral-model-based alternative, e.g., via LLVM/Polly or the Polyhedral Value/Memory Analysis [4].

As our work is still ongoing, we cannot provide evaluation results at this point. However, earlier efforts [3], that did not make it into LLVM, already showed significant improvements which we expect to replicate. We anticipate having preliminary performance results available to present at the conference.

Note that the goal of this talk is not necessarily to provide final answers to the above described problems, but instead we want to start a discussion and bring interested parties together.

[1] https://reviews.llvm.org/D7008 [2] https://reviews.llvm.org/D17386 [3] https://llvm.org/devmtg/2015-04/slides/LLVMEuro2015LoopFusionAmidComplexControlFlow.pdf [4] https://www.youtube.com/watch?v=xSA0XLYJ-G0

Speakers
avatar for Kit Barton

Kit Barton

Technical lead for LLVM on Power and XL Compilers, IBM Canada


Thursday October 18, 2018 3:00pm - 3:30pm
2 - Technical Talk (Rm LL21AB)