Loading…
Wednesday, October 17
 

8:00am PDT

Registration & Breakfast
Registration opens and breakfast in the lobby

Wednesday October 17, 2018 8:00am - 9:00am PDT
0 - Lobby Area

9:00am PDT

Welcome
Opening

Speakers
avatar for Tanya Lattner

Tanya Lattner

President, LLVM Foundation
President, LLVM Foundation


Wednesday October 17, 2018 9:00am - 9:15am PDT
1 - General Session (Rm LL20ABC)

9:15am PDT

The Future Direction of C++ and the Four Horsemen of Heterogeneous Computing
The C++ Direction Group has set a future direction for C++ and includes recommendation for C++ in the short and medium tem. It will have immediate impact on what will enter C++20, and beyond. First half of this talk will devote to the Directions Groups description of where future C++ is heading as a member of the DG.

It also includes a guidance towards Heterogeneous C++.

The introduction of the executors TS means for the first time in C++ there will be a standard platform for writing applications which can execute across a wide range of architectures including multi-core and many-core CPUs, GPUs, DSPs, and FPGAs. Fi
The SYCL standard from the Khronos Group is a strong candidate to implement this upcoming C++ standard as are many other C++ frameworks from DOE, and HPX for the distributed case. One of the core ideas of this standard is that everything must be standard C++, the only exception being that some feature of C++ cannot be used in places that can be executed on an OpenCL device, often due to hardware limitation.

Implementing Heteorgeneous C++ is like battling the four Horsemen of the Apocalypse. These are:
Data movement
Data Locality
Data Layout
Data Affinity
The rest of this talk presents some of the challenges and solutions to implement a Heterogeneous C++ standard in clang based on our implementation of Khrono's SYCL language with Codeplay's ComputeCpp compiler, with the fast growth of C++ and clang being a platform of choice to prototype many of the new C++ features.
We describe the major issues with ABI for separate compilation tool chain that comes from non-standard layout type of lambdas, as well as the issues of data addressing that comes from non-flat and possibly non-coherent address space.
We also describe various papers which are being proposed to ISO C++ to move towards standardizing heterogeneous and distributed computing in C++. The introduction of a unified interface for execution across a wide range of different hardware, extensions to this to support concurrent exception handling and affinity queries, and an approach to improve the capability of the parallel algorithms through composability. All of this adds up to a future C++ which is much more aware of heterogeneity and capable of taking advantage of it to improve parallelism and performance.



Speakers
avatar for Michael Wong

Michael Wong

Distinguished Engineer, VP, Codeplay,ISOCPP
Michael Wong is Distinguished Engineer/VP of R&D at Codeplay Software. He is a current Director and VP of ISOCPP , and a senior member of the C++ Standards Committee with more then 15 years of experience. He chairs the WG21 SG5 Transactional Memory and SG14 Games Development/Low Latency/Financials... Read More →


Wednesday October 17, 2018 9:15am - 10:00am PDT
1 - General Session (Rm LL20ABC)

10:00am PDT

Break
Coffee Break

Wednesday October 17, 2018 10:00am - 10:30am PDT
0 - Lobby Area

10:30am PDT

Lifecycle of LLVM bug reports
The goal of the BoF is to improve the (currently non-existing) definition and documentation of the lifecycle of LLVM bug tickets. Not having a documented lifecycle results in a number of issues, of which few have come up recently on the mailing list, including: -- When bugs get closed, what is the right amount of info that should be required so that the bug report is as meaningful as possible without putting unreasonable demands on the person closing the bug? -- When bugs get reported, what level of triaging, and to what timeline, should we aim for to keep bug reporters engaged? -- What should we aim to achieve during triaging?

Speakers
avatar for Kristof Beyls

Kristof Beyls

Senior Principal Engineer, Arm
compilers and related tools, profiling, security.
PR

Paul Robinson

Senior Staff Compiler Engineer, Sony Interactive Entertainment


Wednesday October 17, 2018 10:30am - 11:00am PDT
3 - BoF (Rm LL21CD)

10:30am PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):

Wednesday October 17, 2018 10:30am - 11:00am PDT
4 - Round Tables (Rm LL21EF)

10:30am PDT

Coroutine Representations and ABIs in LLVM
Coroutines can serve as the basis for implementing many powerful language features. In this talk, we will discuss coroutines holistically and explore requirements and trade-offs at different stages in their translation. For this purpose, we will introduce several prospective language features in the Swift programming language and discuss how the differences between them affect how they should best be represented and optimized in both Swift's high-level SIL intermediate representation and in LLVM's lower-level IR. We will also contrast Swift's requirements with those imposed by the draft C++ coroutines TS and explain how the differences between languages lead to differences in the LLVM representation. Finally, we will discuss various final ABIs for lowering coroutines and talk about their capabilities and trade-offs.

Speakers

Wednesday October 17, 2018 10:30am - 11:00am PDT
1 - General Session (Rm LL20ABC)

11:00am PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
ORC JIT

Wednesday October 17, 2018 11:00am - 11:30am PDT
4 - Round Tables (Rm LL21EF)

11:00am PDT

Build Impact of Explicit and C++ Standard Modules
Somewhat a continuation of my 2017 LLVM Developers' Meeting talk: The Further Benefits of Explicit Modularization.

Examine and discuss the build infrastructure impact of explicit modules working from the easiest places and rolling in the further complications to see where we can end up.

* Explicit modules with no modularized dependencies * Updating a build system (like CMake) to allow the developer to describe modular groupings and use that information to build modules and modular objects and link those modular objects in the final link * Updating a build system to cope with proposed C++ standardized modules * How C++ standardized modules (& Clang modules before them) differ from other language modularized systems - non-portable binary format and the challenges that presents for build systems * Possible solutions * implicit modules * explicit cache path * interaction with the compiler for module dependence graph discovery * similar to include path discovery * callback from the compiler

There's a lot of unknowns in this space - the goal of this talk is to at the very least discuss those uncertainties and why they are there, and/or discuss any conclusions from myself and the C++ standardization work (Richard Smith, Nathan Sitwell, and others) that is ongoing.

Speakers
DB

David Blaikie

Software Engineer, Google Inc.


Wednesday October 17, 2018 11:00am - 11:30am PDT
1 - General Session (Rm LL20ABC)

11:00am PDT

Stories from RV: The LLVM vectorization ecosystem
Vectorization in LLVM has long been restricted to explicit vector instructions, SLP vectorization or the automatic vectorization of inner-most loops. As the VPlan infrastructure is maturing it becomes apparent that the support API provided by the LLVM ecosystem needs to evolve with it. Apart from short SIMD, new ISAs such as ARM SVE, the RISC-V V extension and NEC SX-Aurora pose new requirements and challenges to vectorization in LLVM. To this end, the Region Vectorzer is a great experimentation ground for dealing with issues that sooner or later will need to be resolved for the LLVM vectorization infrastructure. These include the design of a flexible replacment for the VECLIB mechanism in TLI, inter-procecural vectorization and the development of a LLVM-SVE backend for NEC SX-Aurora. The idea of the talk is to provide data points to inform vectorization-related design decisions in LLVM based on our experience with the Region Vectorizer.

Speakers
SM

Simon Moll

Researcher/PhD Student, Saarland University


Wednesday October 17, 2018 11:00am - 11:30am PDT
2 - Technical Talk (Rm LL21AB)

11:30am PDT

Debug Info BoF
There have been significant improvements to LLVM's handling of debug info in optimized code over the past year. We will highlight recent improvements (many of which came from new contributors!) and outline some important challenges ahead.

To get the conversation going, we will present data showing improvements in source variable availability and identify passes that need more work. Potential topics for discussion include eliminating codegen differences in the presence of debug info, improving line table fidelity, features missing in LLVM's current debug info representation, and higher quality backtraces in the presence of tail calls and outlining.

Speakers
avatar for Adrian Prantl

Adrian Prantl

Apple
Ask me about debug information in LLVM, Clang and Swift!


Wednesday October 17, 2018 11:30am - 12:00pm PDT
3 - BoF (Rm LL21CD)

11:30am PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):


Wednesday October 17, 2018 11:30am - 12:00pm PDT
4 - Round Tables (Rm LL21EF)

11:30am PDT

Efficiently Implementing Runtime Metadata with LLVM
Rich runtime metadata can enable powerful language features and tooling support, but also comes with code size, memory usage, and startup time costs. To mitigate these costs, the Swift programming language compiler uses some clever techniques and under-explored corners of LLVM to optimize metadata to minimize size, startup time, and memory costs while making it usable both in-process and offline, avoiding some of the costs traditionally associated with vtables, RTTI, and other data structures in languages like C++. This talk goes into detail of some of these techniques, including using relative references to make metadata position-independent, using mangled type names as a compact and offline-interpretable representation of language concepts, and organizing optional reflection metadata into its own segment of binaries so it can be discovered at load time and optionally stripped from binaries in cases where it is not desired. These techniques could also be applied to other languages, including C++, to reduce the costs of these data structures.

Speakers
DG

Doug Gregor

dgregor@apple.com
JG

Joe Groff

jgroff@apple.com


Wednesday October 17, 2018 11:30am - 12:00pm PDT
1 - General Session (Rm LL20ABC)

11:30am PDT

Outer Loop Vectorization in LLVM: Current Status and Future Plans
We recently proposed adding an alternative VPlan native path in Loop Vectorizer (LV) to implement support for outer loop vectorization. In this presentation, we first give a status update and discuss progress made since our initial proposal. We briefly talk about the addition of a VPlan-native code path in LV, initial explicit outer loop vectorization support, cost modelling and vector code generation in the VPlan-native path. We also summarize the current limitations.

Next, we introduce VPlan-to-VPlan transformations, which highlight a major strength of VPlan infrastructure. Different vectorization strategies can be modelled using the VPlan representation which allows reuse of VPlan-based cost modelling and code generation infrastructure. Starting from an initial VPlan, a set of VPlan-to-VPlan transformations can be applied, resulting in a set of plans representing different optimization strategies (e.g. interleaving of memory accesses, using SLP opportunities, predication). These plans can then be evaluated against each other and code generated for the most profitable one. We present VPlan-based SLP and predication as concrete examples of VPlan-to-VPlan transformation.

We end this talk with a discussion of the next steps in the VPlan roadmap. In particular, we discuss plans to achieve convergence of the inner loop and VPlan-native vectorization paths. We present opportunities to get involved with VPlan development and possibilities for collaboration. Furthermore, we discuss how vectorization for scalable vector architectures could fit into VPlan. We also plan to organize a VPlan focused hacker’s table after the talk, to provide a space for more in-depth discussions relating to VPlan.

Speakers
SG

Satish Guggilla

Intel Corporation


Wednesday October 17, 2018 11:30am - 12:00pm PDT
2 - Technical Talk (Rm LL21AB)

12:00pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):​​​​
  • Improve Buildtimes 
  • (Incremental/Compilathon Link)
  • Compiler as Server

Wednesday October 17, 2018 12:00pm - 12:30pm PDT
4 - Round Tables (Rm LL21EF)

12:00pm PDT

Extending the SLP vectorizer to support variable vector widths
The SLP Vectorizer performs auto-vectorization of straight-line code. It works by scanning the code looking for scalar instructions that can be grouped together, and then replacing each group with its vectorized form. In this work we show that the current design of the SLP pass in LLVM cannot efficiently handle code patterns that require switching from one vector width to another. We provide detailed examples of when this happens and we show in detail why the current design is failing. We present a non-intrusive design based on the existing SLP Vectorization pass, that addresses this issue and improves performance.

Speakers
VP

Vasileios Porpodas

Intel Corporation


Wednesday October 17, 2018 12:00pm - 12:30pm PDT
2 - Technical Talk (Rm LL21AB)

12:00pm PDT

Sound Devirtualization in LLVM
Devirtualization is an optimization transforming virtual calls into direct calls.

The first proposed model for handling devirtualization for C++ in LLVM, that was enabled by -fstrict-vtable-pointers flag, had an issue that could potentially cause miscompilation. We took a step back and built the model in more structured way, thinking about semantics of the dynamic pointers, not what kind of barriers we need to use and what kind of transformations we can do on them to make it work. Our new model fixes this issue and enables more optimizations. In this talk we are going to explain how it works and what are the next steps to turn it on by default.

Speakers
avatar for Piotr Padlewski

Piotr Padlewski

masters student, University of Warsaw
KP

Krzysztof Pszeniczny

University of Warsaw


Wednesday October 17, 2018 12:00pm - 12:30pm PDT
1 - General Session (Rm LL20ABC)

12:30pm PDT

Lunch
Lunch in the lobby area. Seating is in the lobby area, outside, and round table room.

Wednesday October 17, 2018 12:30pm - 2:00pm PDT
0 - Lobby Area

2:00pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  • Loop Optimization

Wednesday October 17, 2018 2:00pm - 2:30pm PDT
4 - Round Tables (Rm LL21EF)

2:00pm PDT

Lightning Talks
  1. Flang Update - Steve Scalpone
  2. Dex: efficient symbol index for Clangd - Kirill Bobyrev
  3. Hardware Interference Size - JF Bastien
  4. Using TAPI to Understand APIs and Speed Up Builds - Steven Wu
  5. DWARF v5 Highlights - Why You Care - Paul Robinson
  6. What’s New In Outlining - Jessica Paquette 
  7. Refuting False Bugs in the Clang Static Analyzer using SMT Solvers - Mikhail R. Gadelha
  8. ThinLTO Summaries in JIT Compilation - Stefan Gränitz
  9. Repurposing GCC Regression for LLVM Based Tool Chains - Jeremy Bennett
  10.  atJIT: an online, feedback-directed optimizer for C++ - Kavon Farvardin
  11. Mutating the clang AST from Plugins - Andrei Homescu

Speakers
avatar for Jeremy Bennett

Jeremy Bennett

Chief Executive, Embecosm
Bio: Dr Jeremy Bennett is founder and Chief Executive of Embecosm(http://www.embecosm.com), a consultancy implementing open sourcecompilers, chip simulators and AI/ML for major corporations around the world.He is a author of the standard textbook "Introduction to CompilingTechniques... Read More →
avatar for Kirill Bobyrev

Kirill Bobyrev

Software Engineering Intern, Google
KF

Kavon Farvardin

University of Chicago
MR

Mikhail R. Gadelha

University of Southampton
AH

Andrei Homescu

Immunant, Inc.
PR

Paul Robinson

Senior Staff Compiler Engineer, Sony Interactive Entertainment
SS

Steve Scalpone

NVIDIA
Flang, F18, and NVIDIA C, C++, and Fortran for high-performance computing.
avatar for Steven Wu

Steven Wu

Compiler Engineer, Apple


Wednesday October 17, 2018 2:00pm - 3:00pm PDT
1 - General Session (Rm LL20ABC)

2:00pm PDT

LLVM backend development by example (RISC-V)
This tutorial steps through how to develop an LLVM backend for a modern RISC target (RISC-V). It will be of interest to anyone who hopes to implement a new backend, modify an existing backend, or simply better understand this part of the LLVM infrastructure. It provides a high level introduction to the MC layer, instruction selection, as well as small selection of represenative implementation challenges. No experience with LLVM backends is required, but a basic level of familiarity with LLVM IR would be useful.

Speakers
avatar for Alex Bradbury

Alex Bradbury

Co-founder and Director, lowRISC CIC


Wednesday October 17, 2018 2:00pm - 3:00pm PDT
2 - Technical Talk (Rm LL21AB)

2:30pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  • LinuxKernel + Clang

Wednesday October 17, 2018 2:30pm - 3:00pm PDT
4 - Round Tables (Rm LL21EF)

3:00pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):

Wednesday October 17, 2018 3:00pm - 3:30pm PDT
4 - Round Tables (Rm LL21EF)

3:00pm PDT

Lightning Talks
  1. Automatic Differentiation in C/C++ Using Clang Plugin Infrastructure - Aleksandr Efremov
  2. More efficient LLVM devs: 1000x faster build file generation, -j1000 builds, and O(1) test execution - Nico Weber
  3. Heap-to-Stack Conversion - Hal Finkel
  4. TWINS - This Workflow is Not Scrum: Adapting Agile for Open Source Interaction - Joshua Magee
  5. clang-doc: an elegant generator for more civilized documentation - Julie Hockett
  6. Code Coverage with CPU Performance Monitoring Unit - Bharathi Seshadri
  7. VecClone Pass: Function Vectorization via LoopVectorizer - Konstantina Mitropoulou
  8. ISL Memory Management Using Clang Static Analyzer - Malhar Thakkar
  9. Error Handling in Libraries: A Case Study - James Henderson
  10. NEC SX-Aurora - A Scalable Vector Architecture - Erich Focht
  11. Eliminating always_inline in libc++: a journey of visibility and linkage - Louis Dionne

Speakers
avatar for Hal Finkel

Hal Finkel

Argonne National Laboratory
avatar for James Henderson

James Henderson

Software Engineer, SN Systems (Sony Interactive Entertainment)
I have been working in toolchain software development since graduating from Bristol University six years ago. For the majority of that time, I have been part of the SN Systems binary utilities team, with my main focus on developing the PlayStation® linker. Over the past two years... Read More →
JM

Joshua Magee

Sony Interactive Entertainment
KM

Konstantina Mitropoulou

Intel Corporation
BS

Bharathi Seshadri

Cisco Systems
avatar for Malhar Thakkar

Malhar Thakkar

Columbia University


Wednesday October 17, 2018 3:00pm - 4:00pm PDT
1 - General Session (Rm LL20ABC)

3:00pm PDT

Register Allocation: More than Coloring
This tutorial explains the design and implementation of LLVMs register allocation passes. The focus is on the greedy register allocator and the supporting passes like two address handling, copy coalescing and live range splitting.

The tutorial will give tips for debugging register allocator problems and understanding the allocator debugging output. It will also explain how to implement the various callbacks to tune for target specifics.

Speakers
avatar for Matthias Braun

Matthias Braun

Apple Inc.
I am an LLVM developer working on the code generation part of the compiler, specifically register allocation and scheduling.


Wednesday October 17, 2018 3:00pm - 4:00pm PDT
2 - Technical Talk (Rm LL21AB)

3:30pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  • Flang Discussion

Wednesday October 17, 2018 3:30pm - 4:00pm PDT
4 - Round Tables (Rm LL21EF)

4:00pm PDT

Break
Break with food and drinks

Wednesday October 17, 2018 4:00pm - 4:30pm PDT
0 - Lobby Area

4:30pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  • Git Migration Planning
  • SPIR-V Support
  • VECLIB
  • EH/.CFI Discussion
  • Divergence Analysis


Wednesday October 17, 2018 4:30pm - 5:00pm PDT
4 - Round Tables (Rm LL21EF)

4:30pm PDT

Faster, Stronger C++ Analysis with the Clang Static Analyzer
Over the last year we’ve made the Clang Static Analyzer faster and improved its C++ support. In this talk, we will describe how we have sped up the analyzer by changing the order in which it explores programs to bias it towards covering code that hasn’t been explored yet. In contrast with the previous exploration strategy, which was based on depth-first search, this coverage-guided approach gives shorter, more understandable bug reports and can find up to 20% more bugs on typical code bases. We will also explain how we’ve reduced C++ false positives by providing infrastructure in Clang’s control-flow graph to help the analyzer understand the myriad of ways in which C++ objects can be constructed, destructed, and have their lifetime extended. This infrastructure will also make it easier for the analyzer to support C++ as the language continues to evolve.

Speakers

Wednesday October 17, 2018 4:30pm - 5:00pm PDT
1 - General Session (Rm LL20ABC)

4:30pm PDT

Porting Function merging pass to thinlto
In this talk I'll discuss the process of porting the Merge function pass to thinlto infrastructure. Funcion merging (FM) is an interprocedural pass useful for code-size optimization. It deduplicates common parts of similar functions and outlines them to a separate function thus reducing the code size. This is particularly useful for code bases making heavy use of templates which gets instantiated in multiple translation units.

Porting FM to thinlto offers leveraging its functionality to dedupe functions across entire program. I'll discuss the engineering effort required to port FM to thinlto. Specifically, functionality to uniquely identify similar functions, augmenting function summary with a hash code, populating module summary index, modifying bitcode reader+writer, and codesize numbers on open source benchmarks.

Speakers
AK

Aditya Kumar

Senior Compiler Engineer, Facebook


Wednesday October 17, 2018 4:30pm - 5:00pm PDT
2 - Technical Talk (Rm LL21AB)

5:00pm PDT

Round Tables
Round Table discussion are small informal get togethers of groups of people to discuss a topic. You may suggest a topic and sign up outside the room. To see topics, look for a whiteboard outside the room.
  • RESTRICT
  • LLV-SVE

Wednesday October 17, 2018 5:00pm - 5:30pm PDT
4 - Round Tables (Rm LL21EF)

5:00pm PDT

Improving code reuse in clang tools with clangmetatool
This talk will cover the lessons we learned from the process of writing tools with Clang's LibTooling. We will also introduce clangmetatool, the open source framework we use (and developed) to reuse code when writing Clang tools.

When we first started writing Clang tools, we realized that there is a lot of lifecycle management that we had to repeat. In some cases, people advocate for the usage of global variables to manage the lifecycle of that data, but this actually makes code reuse across tools even harder.

We also learned that, when writing a tool, it is beneficial if the code is split into two phases -- a data collection phase and, later, a post-processing phase which actually performs the bulk of the logic of the tool.

More details at https://bloomberg.github.io/clangmetatool/

Speakers
avatar for Daniel Ruoso

Daniel Ruoso

Senior Software Engineer, Bloomberg
Currently working as the lead for Code Governance at Bloomberg, where we focus on driving large scale Static Analysis and Automated Refactoring. Daniel has been working over the past 20+ years with a persistent lens on how to help engineers be more effective with build, deployment... Read More →


Wednesday October 17, 2018 5:00pm - 5:30pm PDT
2 - Technical Talk (Rm LL21AB)

5:00pm PDT

Memory Tagging, how it improves C++ memory safety, and what does it mean for compiler optimizations
Memory safety in C++ remains largely unresolved. A technique usually called "memory tagging" may dramatically improve the situation if implemented in hardware with reasonable overhead. In this talk we will describe three existing implementations of memory tagging. One is SPARC ADI, a full hardware implementation. Another is HWASAN, a partially hardware-assisted LLVM-based tool for AArch64. Last but not least, ARM MTE, a recently announced hardware extension for AArch64. We describe the basic idea, evaluate the three implementations, and explain how they improve memory safety. We'll pay extra attention to compiler optimizations required to support memory tagging efficiently.
If you know what AddressSanitizer (ASAN) is, think of Memory Tagging as of "Low-overhead ASAN on steroids in hardware". This talk is partially based on the paper “Memory Tagging and how it improves C/C++ memory safety” (https://arxiv.org/pdf/1802.09517.pdf)

Speakers
avatar for Kostya Serebryany

Kostya Serebryany

Software Engineer, Google
Konstantin (Kostya) Serebryany is a Software Engineer at Google. His team develops and deploys dynamic testing tools, such as AddressSanitizer and ThreadSanitizer. Prior to joining Google in 2007, Konstantin spent 4 years at Elbrus/MCST working for Sun compiler lab and then 3 years... Read More →



Wednesday October 17, 2018 5:00pm - 5:30pm PDT
1 - General Session (Rm LL20ABC)

5:30pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  • LLVM.org admin synch


Wednesday October 17, 2018 5:30pm - 6:00pm PDT
4 - Round Tables (Rm LL21EF)

6:30pm PDT

Reception
Reception at the Tech Museum
201 S Market St, San Jose, CA 95113

Wednesday October 17, 2018 6:30pm - 10:00pm PDT
Tech Museum
 
Thursday, October 18
 

8:15am PDT

Registration & Breakfast
Registration opens and breakfast in the lobby

Thursday October 18, 2018 8:15am - 9:15am PDT
0 - Lobby Area

9:15am PDT

Glow: LLVM-based machine learning compiler
Glow is an LLVM-based machine learning compiler for heterogeneous hardware that's developed as part of the PyTorch project. It is a pragmatic approach to compilation that enables the generation of highly optimized code for CPUs, GPUs and accelerators. Glow lowers the traditional neural network data-flow graph into a two-phase strongly-typed intermediate representation (inspired by SIL). Finally Glow emits LLVM-IR and uses the LLVM code generator to generate highly-optimized code. In this talk we'll describe the structure of machine learning programs and how Glow is designed to compile these graphs into multiple targets. We'll explain how we use the LLVM infrastructure and go over some of the techniques that we use to generate high-performance code using LLVM.

Speakers

Thursday October 18, 2018 9:15am - 10:00am PDT
1 - General Session (Rm LL20ABC)

10:00am PDT

GlobalISel Design & Development
GlobalISel is the planned successor to the SelectionDAG instruction selection framework, currently used by the AArch64 target by default at the -O0 optimization level, and with partial implementations in several other backends. It also has downstream users in the Apple GPU compiler. The long term goals for the project are to replace the block-based selection strategy of SelectionDAG with a more efficient framework that can also do function-wide analysis and optimization. The design of GlobalISel has evolved over its lifetime and continues to do so. The aim of this BoF is to bring together developers and other parties interested in the state and future progress of GlobalISel, and to discuss some issues that would benefit from community feedback. We will first give a short update on progress within the past year. Possible discussion topics include:

• The current design of GISel’s pipeline, with particular focus on how well the architecture will scale as the focus on optimisation increases. • For new backends, how is the experience in bringing up a GlobalISel based code generator? For existing backends, are there any impediments to continuing development? • Does using GlobalISel mean that double the work is necessary with more maintenance costs? How can this be mitigated? • Are there additional features that developers would like to see in the framework? What SelectionDAG annoyances should we take particular care to avoid or improve upon?

Speakers
avatar for Amara Emerson

Amara Emerson

Apple
Analysis, optimization, loops, code generation, back-end, GlobalISel.


Thursday October 18, 2018 10:00am - 10:30am PDT
3 - BoF (Rm LL21CD)

10:00am PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):

  1. RESTRICT

Thursday October 18, 2018 10:00am - 10:30am PDT
4 - Round Tables (Rm LL21EF)

10:00am PDT

Graph Program Extraction and Device Partitioning in Swift for TensorFlow
Swift for Tensorflow (https://github.com/tensorflow/swift) is an Open Source project that provides a new way to develop machine learning models. It combines the usability/debuggability of imperative “define by run” programming models (like TensorFlow Eager and PyTorch) with the performance of TensorFlow session/XLA (graph compilation).

In this talk, we describe the design and implementation of deabstraction, Graph Program Extraction (GPE) and device partitioning used by Swift for TensorFlow. These algorithms rely on aggressive mid-level transformations that incorporate techniques including inlining, program slicing, interpretation, and advanced control flow analysis. While the initial application of these algorithms is to TensorFlow and machine learning, these algorithms may be applied to any domain that would benefit from an imperative definition of a computation graph, e.g. for high performance accelerators in other domains.


Thursday October 18, 2018 10:00am - 10:30am PDT
1 - General Session (Rm LL20ABC)

10:00am PDT

Working with Standalone Builds of LLVM sub-projects
There are two ways to build LLVM sub-projects, the first is to place the sub-project source code in the tools or project directory of the LLVM tree and build everything together. The second way is to build the sub-projects standalone against a pre-compiled build of LLVM.

This talk will focus on how to make standalone builds of sub-projects like clang, lld, compiler-rt, lldb, and libcxx work and how this method can be used to help reduce build times for both developers and CI systems. In addition, we will look at the cmake helpers provided by LLVM and how they are used during the standalone builds and also how you can use them to build your own LLVM-based project in a standalone fashion.

Speakers

Thursday October 18, 2018 10:00am - 10:30am PDT
2 - Technical Talk (Rm LL21AB)

10:30am PDT

Break
Coffee Break

Thursday October 18, 2018 10:30am - 11:00am PDT
0 - Lobby Area

11:00am PDT

LLVM Foundation BoF
Come meet the new members of the LLVM Foundation board, get program updates and your questions answered.

Speakers
avatar for Chandler Carruth

Chandler Carruth

Software Engineer, Google
Chandler Carruth is the technical lead for Google's programming languages and software foundations. He has worked extensively on the C++ programming language and the Clang and LLVM compiler infrastructure. Previously, he worked on several pieces of Google's distributed build system... Read More →
avatar for Mike Edwards

Mike Edwards

LLVM Foundation
avatar for Hal Finkel

Hal Finkel

Argonne National Laboratory
avatar for Anton Korobeynikov

Anton Korobeynikov

Associate Professor, Center for Algorithmic Biotechnology, Saint Petersburg State University, 6 linia V.O., 11/21d, 1990034 St Petersburg, Russia
avatar for Tanya Lattner

Tanya Lattner

President, LLVM Foundation
President, LLVM Foundation
JR

John Regehr

University of Utah


Thursday October 18, 2018 11:00am - 11:30am PDT
3 - BoF (Rm LL21CD)

11:00am PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  1. Scalable vectors in IR (ARM, NEC?, RISC-V?)

Thursday October 18, 2018 11:00am - 11:30am PDT
4 - Round Tables (Rm LL21EF)

11:00am PDT

Art Class for Dragons: Supporting GPU compilation without metadata hacks!
Modern programming languages targeting GPUs include features that are not commonly found in conventional programming languages, such as C and C++, and are, therefore, not natively representable in LLVM IR.

This limits the applicability of LLVM to target GPU hardware for both graphics and massively parallel compute applications. Moreover, the lack of a unified way to represent GPU-related features has led to different and mutually incompatible solutions across different vendors, thereby limiting interoperability of LLVM-based GPU transformation passes and tools.

Many features within the Vulkan graphics API and language [1] highlight the diversity of GPU hardware. For example, Vulkan allows different attributes on structures that specify different memory padding rules. Such semantic information is currently not natively representable in LLVM IR. Graphics programming models also make extensive use of special memory regions that are mapped as address spaces in LLVM. However, no semantic information is attributed to address spaces at the LLVM IR level and the correct behaviour and transformation rules have to be inferred from the address space within the compilation passes.

As some of these features have no direct representation in LLVM, various translators, e.g SPIR-V->LLVM translator [2], Microsoft DXIL compiler [3], AMD's OpenSource compiler for Vulkan [4], make use of side features of LLVM IR, such as metadata and intrinsics, to represent the semantic information that cannot be easily captured. This creates an extra burden on compilation passes targeting GPU hardware as the semantic information has to be recreated from the metadata. Additionally, some translators such as the Microsoft DXIL compiler have forked the Clang and LLVM repositories and made proprietary changes to the IR in order to more easily support the required features natively. A more general approach would be to look at how upstream LLVM can be augmented to represent some, if not all, of the semantic information required for massively parallel SIMD, SPMD, and in general, graphics applications.

This talk will look at the proprietary LLVM IR modifications made in translators such as the Khronos SPIRV-LLVM translator, AMDs open source driver for Vulkan SPIRV, the original Khronos SPIR specification [5], Microsoft's DXIL compiler and Nvidia's NVVM specification [6]. The aim is to extract a common set of features present in modern graphics and compute languages for GPUs, describe how translators are currently representing these features in LLVM and suggest ways of augmenting the LLVM IR to natively represent these features. The intention with this talk is to open up a dialogue among IR developers to look at how we can, if there is agreement, extend LLVM in a way that supports a more diverse set of hardware types.

[1] - https://www.khronos.org/registry/vulkan/ [2] - https://github.com/KhronosGroup/SPIRV-LLVM-Translator [3] - https://github.com/Microsoft/DirectXShaderCompiler/blob/master/docs/DXIL.rst [4] - https://github.com/GPUOpen-Drivers/AMDVLK [5] - https://www.khronos.org/registry/SPIR/specs/spir_spec-2.0.pdf [6] - https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf

Speakers

Thursday October 18, 2018 11:00am - 11:30am PDT
2 - Technical Talk (Rm LL21AB)

11:00am PDT

Understanding the performance of code using LLVM's Machine Code Analyzer (llvm-mca)
llvm-mca is a LLVM based tool that uses information available in LLVM’s scheduling models to statically measure the performance of machine code in a specific CPU. The goal of this tool is not just to predict the performance of the code when run on the target, but also to help with diagnosing potential performance issues. In this talk we, will discuss how llvm-mca works and walk the audience through example uses of this tool.

Speakers
AD

Andrea Di Biagio

Senior Compiler Engineer, Sony Interactive Entertainment
avatar for Matt Davis

Matt Davis

Compiler Engineer, Sony Interactive Entertainment
(void *)0


Thursday October 18, 2018 11:00am - 11:30am PDT
1 - General Session (Rm LL20ABC)

11:30am PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  • Loop Optimization


Thursday October 18, 2018 11:30am - 12:00pm PDT
4 - Round Tables (Rm LL21EF)

11:30am PDT

Profile Guided Function Layout in LLVM and LLD
The layout of code in memory can have a large impact on the performance of an application. This talk will cover the reasons for this along with the design, implementation, and performance results of LLVM and LLD's new profile guided function layout pipeline. This pipeline leverages LLVM's profile guided optimization infrastructure and is based on the Call-Chain Clustering heuristic.

Speakers
avatar for Michael Spencer

Michael Spencer

Compiler Engineer, Apple


Thursday October 18, 2018 11:30am - 12:00pm PDT
1 - General Session (Rm LL20ABC)

11:30am PDT

How to use LLVM to optimize your parallel programs
As Moore's law comes to an end, chipmakers are increasingly relying on both heterogeneous and parallel architectures for performance gains. This has led to a diverse set of software tools and paradigms such as CUDA, OpenMP, Cilk, and many others to best exploit a program’s parallelism for performance gain. Yet, while such tools provide us ways to express parallelism, they come at a large cost to the programmer, requiring in depth knowledge of what to parallelize, how to best map the parallelism to the hardware, and having to rework the code to match the programming model chosen by the software tool.

In this talk, we discuss how to use Tapir, a parallel extension to LLVM, to optimize parallel programs. We will show how one can use Tapir/LLVM to represent programs in attendees’ favorite parallel framework by extending clang, how to perform various optimizations on parallel code, and how to to connect attendees’ parallel language to a variety of parallel backends for execution (PTX, OpenMP Runtime, Cilk Runtime).

Speakers
avatar for William Moses

William Moses

PhD Candidate, MIT


Thursday October 18, 2018 11:30am - 12:30pm PDT
2 - Technical Talk (Rm LL21AB)

12:00pm PDT

Clang Static Analyzer BoF
This BoF will provide an opportunity for developers and users of the Clang Static Analyzer to discuss the present and future of the analyzer. We will start with a brief overview of analyzer features added by the community over the last year, including our Google Summer of Code projects on theorem prover integration and detection of deallocated inner C++ pointers. We will discuss possible focus areas for the next year, including laying the foundations for analysis that crosses the boundaries of translation units. We would also like to brainstorm and gather community feedback on potential dataflow-based checks, ask for community help to improve analyzer C++17 support, and discuss the challenges and opportunities of C++20 support, including contracts.

Speakers

Thursday October 18, 2018 12:00pm - 12:30pm PDT
3 - BoF (Rm LL21CD)

12:00pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  1. gn build

Thursday October 18, 2018 12:00pm - 12:30pm PDT
4 - Round Tables (Rm LL21EF)

12:00pm PDT

Optimizing Indirections, using abstractions without remorse.
Indirections, either through memory, library interfaces, or function pointers, can easily induce major performance penalties as the current optimization pipeline is not able to look through them. The available inter-procedural-optimizations (IPO) are generally not well suited to deal with these issues as they require all code to be available and analyzable through techniques based on tracking value dependencies. Importantly, the use of class/struct objects and (parallel) runtime libraries commonly introduce indirections that prohibit basically all optimizations. In this talk, we introduce these problems with real-world examples and show how new analyses can mitigate them. We especially focus on:

- A field-sensitive, inter-procedural memory analysis that models simple communication through memory. - The integration of indirect, potentially hidden, call sites, e.g., in libraries like the OpenMP runtime library, into existing analysis and optimizations (function attribute detection, argument promotion, …). - Automatic and portable (non-LLVM specific) information transfer from library implementations through their interfaces to the user sites.

While our work originates in the optimization of parallel code, we want to show how the problems we encountered there are similar to existing ones in sequential programs. To this end, we try to augment the available analyses and optimizations rather than introducing new ones that are specific to parallel programs. As a consequence, we not only expect positive results for parallel code regions [1], but also hope to improve generic code that employs indirections or simply exhibit optimization opportunities similar to the ones that commonly arise for parallel programs.

The goal of this talk is to introduce possible solutions to several problems that commonly prevent optimization of code featuring indirections. As we want to introduce these solutions into the LLVM codebase, we hope to start a discussion on these issues as well as the caveats that we encountered while resolving them.

[1] https://www.youtube.com/watch?v=u2Soj49R-i4

Speakers
avatar for Johannes Doerfert

Johannes Doerfert

Argonne National Laboratory


Thursday October 18, 2018 12:00pm - 12:30pm PDT
1 - General Session (Rm LL20ABC)

2:00pm PDT

Lessons Learned Implementing Common Lisp with LLVM over Six Years.
I will present the lessons learned while using LLVM to efficiently implement a complex memory managed, dynamic programming language within which everything can be redefined on the fly. I will present Clasp, a new Common Lisp compiler and programming environment that uses LLVM as its back-end and that interoperates smoothly with C++/C. Clasp is written in both C++ and Common Lisp. The Clasp compiler is written in Common Lisp and makes extensive use of the LLVM C++ API and the ORC JIT to generate native code both ahead of time and just in time. Among its unique features, Clasp uses a compacting garbage collector to manage memory, incorporates multithreading, uses C++ compatible exception handling to achieve stack unwinding and an incorporates an advanced compiler written in Common Lisp to achieve performance that approaches that of C++. Clasp is being developed as a high-performance scientific and general purpose programming language that makes use of available C++ libraries.

Speakers
avatar for Christian Schafmeister

Christian Schafmeister

Temple University
I'm a professor of Chemistry developing large molecules to solve real-world problems. We have developed a software environment (Cando) for designing these molecules that implements a Common Lisp compiler, incorporates many C++ libraries, interoperates with C++ and uses LLVM as the... Read More →


Thursday October 18, 2018 2:00pm - 2:30pm PDT
1 - General Session (Rm LL20ABC)

2:00pm PDT

Loop Transformations in LLVM: The Good, the Bad, and the Ugly
Should loop transformations be done by the compiler, a library (such as Kokkos, RAJA, Halide) or be subject of (domain specific) programming languages such as CUDA, LIFT, etc? Such optimizations can take place on more than one level and the decision for the compiler-level has already been made in LLVM: We already support a small zoo of transformations: Loop unrolling, unroll-and-jam, distribution, vectorization, interchange, unswitching, idiom recognition and polyhedral optimization using Polly. When clear that we want loop optimizations in the compiler, why not making them as good as possible?

Today, with the exception of some shared code and analyses related to vectorization , LLVM loop passes don't know about each other. This makes cooperation between them difficult, and that includes difficulty in heuristically determining whether some combination of transformations is likely to be profitable. With user-directed transformations such as #pragma omp parallel for, #pragma clang loop vectorize(enable), the only order these transformations can be applied is the order of the passes in the pipeline.

In this talk, we will explore what already works well (e.g. vectorization of inner loops), things that do not work as well (e.g. loop passes destroying each other's structures), things that becomes ugly with the current design if we want to support more loop passes (e.g. exponential code blowup due to each pass doing its own loop versioning) and possible solutions.

Speakers
MK

Michael Kruse

Argonne National Lab


Thursday October 18, 2018 2:00pm - 2:30pm PDT
2 - Technical Talk (Rm LL21AB)

2:00pm PDT

Coding Lab for RISC-V Tutorial
Coding lab to compliment the RISC-V Tutorial.

Speakers
avatar for Alex Bradbury

Alex Bradbury

Co-founder and Director, lowRISC CIC


Thursday October 18, 2018 2:00pm - 3:30pm PDT
4 - Round Tables (Rm LL21EF)

2:30pm PDT

Migrating to C++14, and beyond!
C++11 was a huge step forward for C++. C++14 is a much smaller step, yet still brings plenty of great features. C++17 will be equally small but nice. The LLVM community should discuss how we want to handle migration to newer versions of C++: how do we handle compiler requirements, notify developers early enough, manage ABI changes in toolchains, and do the actual migration itself. Let’s get together and hash this out!

Speakers

Thursday October 18, 2018 2:30pm - 3:00pm PDT
3 - BoF (Rm LL21CD)

2:30pm PDT

Implementing an OpenCL compiler for CPU in LLVM
Compiling a heterogeneous language for a CPU in an optimal way is a challenge, OpenCL C/SPIR-V specifics require additions and modifications of the old-fashioned driver approach and compilation flow. Coupled together with aggressive just-in-time code optimizations, interfacing with OpenCL runtime, standard OpenCL C functions library, etc. implementation of OpenCL for CPU comprises a complex structure. We’ll cover Intel’s approach in hope of revealing common patterns and design solutions, discover possible opportunities to share and collaborate with other OpenCL CPU vendors under LLVM umbrella! This talk will describe the compilation of OpenCL C source code down to machine instructions and interacting with OpenCL runtime, illustrate different paths that compilation may take for different modes (classic online/OpenCL 2.1 SPIR-V path vs. OpenCL 1.2/2.0 with device-side enqueue and generic address space), put particular emphasis on the resolution of CPU-unfriendly OpenCL aspects (barrier, address spaces, images) in the optimization flow, explain why OpenCL compiler frontend can easily handle various target devices (GPU/CPU/FGPA/DSP etc.) and how it all neatly revolves around LLVM/clang & tools.

Speakers

Thursday October 18, 2018 2:30pm - 3:00pm PDT
1 - General Session (Rm LL20ABC)

2:30pm PDT

Methods for Maintaining OpenMP Semantics without Being Overly Conservative
The SSA-based LLVM IR provides elegant representation for compiler analyses and transformations. However, it presents challenges to the OpenMP code generation in the LLVM backend, especially when the input program is compiled under different optimization levels. This paper presents a practical and effective framework on how to perform the OpenMP code generation based on the LLVM IR. In this presentation, we propose a canonical OpenMP loop representation under different optimization levels to preserve the OpenMP loop structure without being affected by compiler optimizations. A code-motion guard intrinsic is proposed to prevent code motion across OpenMP regions. In addition, a utility based on the LLVM SSA updater is presented to perform the SSA update during the transformation. Lastly, the scope alias information is used to preserve the alias relationship for backend-outlined functions. This framework has been implemented in Intel’s LLVM compiler.

Speakers
JL

Jin Lin

Intel


Thursday October 18, 2018 2:30pm - 3:00pm PDT
2 - Technical Talk (Rm LL21AB)

3:00pm PDT

Implementing the parallel STL in libc++
LLVM 7.0 almost has complete support for C++17, but libc++ is missing a major component: the parallel algorithms. Let's meet to discuss the options for how to implement support.

Speakers

Thursday October 18, 2018 3:00pm - 3:30pm PDT
3 - BoF (Rm LL21CD)

3:00pm PDT

Developer Toolchain for the Nintendo Switch
Nintendo Switch was developed using Clang/LLVM for the developer tools and C++ libraries. We describe how we converted from using almost exclusively proprietary tools and libraries to open tools and libraries. We’ll also describe our process for maintaining our out-of-tree toolchain and what we’d like to improve.

We started with Clang, binutils, and LLVM C++ libraries (libc++, libc++abi) and other open libraries. We will also describe our progress in transitioning to LLD and other LLVM binutils equivalents. Additionally, we will share some of our performance results using LLD and LTO.

Finally, we’ll discuss some of the areas that are important to our developers moving forward.

Speakers
avatar for Bob Campbell

Bob Campbell

Principle Engineer, Nintendo Technology Development


Thursday October 18, 2018 3:00pm - 3:30pm PDT
1 - General Session (Rm LL20ABC)

3:00pm PDT

Revisiting Loop Fusion, and its place in the loop transformation framework.
Despite several efforts [1-3], loop fusion is one of the classical loop optimizations still missing in LLVM. As we are currently working to remedy this situation, we want to share our experience in designing, implementing, and tuning a new loop transformation pass. While we want to explain how loop fusion can be implemented using the set of existing analyses, we also plan to talk about the current loop transformation framework and extensions thereof. We currently plan to include:

- The interplay between different existing loop transformations. - A comparison to the IBM/XL loop optimization pipeline. - Source level guidance of loop transformations. - Shortcomings of the current infrastructure, especially loop centric dependence analyses. - Interaction with polyhedral-model-backed dependence information.

The (default) loop optimizations performed by LLVM are currently lacking transformations and tuning. One reason is the absence of a dedicated framework that provides the necessary analyses information and heuristics. With the introduction of loop fusion we want to explore how different transformations could be used together and what a uniform dependence analysis for multiple loops could look like. The latter is explored with regards to a Scalar Evolution (or SCEV) based dependence analysis, like the current intra-loop access analysis, and a polyhedral-model-based alternative, e.g., via LLVM/Polly or the Polyhedral Value/Memory Analysis [4].

As our work is still ongoing, we cannot provide evaluation results at this point. However, earlier efforts [3], that did not make it into LLVM, already showed significant improvements which we expect to replicate. We anticipate having preliminary performance results available to present at the conference.

Note that the goal of this talk is not necessarily to provide final answers to the above described problems, but instead we want to start a discussion and bring interested parties together.

[1] https://reviews.llvm.org/D7008 [2] https://reviews.llvm.org/D17386 [3] https://llvm.org/devmtg/2015-04/slides/LLVMEuro2015LoopFusionAmidComplexControlFlow.pdf [4] https://www.youtube.com/watch?v=xSA0XLYJ-G0

Speakers
avatar for Kit Barton

Kit Barton

Technical lead for LLVM on Power and XL Compilers, IBM Canada


Thursday October 18, 2018 3:00pm - 3:30pm PDT
2 - Technical Talk (Rm LL21AB)

3:30pm PDT

Poster Session
Gaining fine-grain control over pass management
serge guelton, adrien guinet, pierrick brunet, juan manuel martinez, béatrice creusillet

Integration of OpenMP, libcxx and libcxxabi packages into LLVM toolchain
Reshabh Sharma 

Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
Ananthakrishna Sowda, Djordje Todorovic, Nikola Prica, Ivan Baev

Automatic Compression for LLVM RISC-V
Sameer AbuAsal, Ana Pazos

Guaranteeing the Correctness of LLVM RISC-V Machine Code with Fuzzing
Jocelyn Wei, Ana Pazos, Mandeep Singh Grang 

NEC SX-Aurora - A Scalable Vector Architecture
Kazuhisa Ishizaka, Kazushi Marukawa, Erich Focht, Simon Moll, Matthias Kurtenacker, Sebastian Hack 

Extending Clang Static Analyzer to enable Cross Translation Unit Analysis
Varun Subramanian

Refuting False Bugs in the Clang Static Analyzer using SMT Solvers
Mikhail R. Gadelha

libcu++: Porting LLVM's C++ Standard Library to CUDA
Bryce Lelbach

Repurposing GCC Regression for LLVM Based Tool Chains 
Jeremy Bennett, Simon Cook, Ed Jones

Memory Tagging, how it improves C++ memory safety, and what does it mean for compiler optimizations
Kostya Serebryany, Evgenii Stepanov, Vlad Tsyrklevich

goSLP: Globally Optimized Superword Level Parallelism Framework
Charith Mendis


Thursday October 18, 2018 3:30pm - 4:30pm PDT
0 - Lobby Area

4:30pm PDT

Ideal versus Reality: Optimal Parallelism and Offloading Support in LLVM
Ideal versus Reality: Optimal Parallelism and Offloading Support in LLVM

Explicit parallelism and offloading support is an important and growing part of LLVM’s eco-system for CPUs, GPUs, FPGAs and accelerators. LLVM's optimizer has not traditionally been involved in explicit parallelism and offloading support, and specifically, the outlining logic and lowering translation into runtime-library calls resides in Clang. While there are several reasons why the optimizer must be involved in parallelization in order to suitably handle a wide set of applications, the design an appropriate parallel IR for LLVM remains unsettled. Several groups (ANL, Intel, MIT, UIUC) have been experimenting with implementation techniques that push this transformation process into LLVM's IR-level optimization passes [1, 2, 3, 4, 5]. These efforts all aim to allow the optimizer to leverage language properties and optimize parallel constructs before they're transformed into runtime calls and outlined functions. Over the past couple of years, these groups have implemented out-of-tree extensions to LLVM IR to represent and optimize parallelism, and these designs have been influenced by community RFCs [6] and discussions on this topic. In this BoF, we will discuss the use cases we'd like to address and several of the open design questions, including:

* Is a canonical loop form necessary for parallelization and vectorization? * What is the SSA update requirements for extensive loop parallelization and transformation? * What are the required changes to, and impact on, existing LLVM optimization passes and analyses? E.g. inlining and aliasing-information propagation * How to represent and leverage language properties of parallel constructs in LLVM IR? * Where is the proper place in the pipeline to lower these constructs?

The purpose of this BoF is to bring together all parties interested in optimizing parallelism and offloading support in LLVM, as well as the experts in parts of the compiler that will need to be modified. Our goal is to discuss the gap between ideal and reality and identify the pieces that are still missing. In the best case we expect interested parties to agree on the next steps towards better parallelism support Clang and LLVM.



Speakers
avatar for Johannes Doerfert

Johannes Doerfert

Argonne National Laboratory
avatar for Hal Finkel

Hal Finkel

Argonne National Laboratory
XT

Xinmin Tian

Intel Corp.


Thursday October 18, 2018 4:30pm - 5:00pm PDT
3 - BoF (Rm LL21CD)

4:30pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  1. llvm-objcopy

Thursday October 18, 2018 4:30pm - 5:00pm PDT
4 - Round Tables (Rm LL21EF)

4:30pm PDT

Updating ORC JIT for Concurrency
LLVM’s ORC JIT APIs have undergone a major redesign over the last year to support compilation of multithreaded code and concurrent compilation within the JIT itself. Internally, ORC’s symbol resolution scheme has been replaced with a system that provides transactional, batch symbol queries. This new scheme both exposes opportunities for parallel compilation within the JIT, and provides a basis for synchronizing interdependent JIT tasks when they reach the JIT linker stage. Alongside this query system, a new “Responsibility” API is introduced to track compilation tasks and enforce graceful termination of the JIT (and JIT’d code) in the event of unrecoverable IPC/RPC failures or other errors. In this talk we will describe the new design, how the API has changed, and the implementation details of the new symbol resolution and responsibility schemes. We will also talk about new developments in the ORC C APIs, and discuss future directions for LLVM’s JIT APIs.

Speakers
LH

Lang Hames

Apple Inc.


Thursday October 18, 2018 4:30pm - 5:30pm PDT
1 - General Session (Rm LL20ABC)

5:00pm PDT

Should we go beyond `#pragma omp declare simd`?
Should we go beyond `#pragma omp declare simd`?

BoF for the people involved in the development of the interface between the compiler and the vector routines provided by a library (including C99 math functions), or via user code. The discussion is ongoing in the ML [1] http://lists.llvm.org/pipermail/llvm-dev/2018-July/124520.html
Problem statement: "How should the compiler know about which vector functions are available in a library or in a module when auto-vectorizing scalar calls, when standards like OpenMP and Vector Function ABIs cannot provide a 1:1 mapping from the scalar functions to the vector one?".
Practical example of the problem:
1. library L provides vector `sin` for target T operating on four lanes, but in two versions: a _slow_ vector `sin` that guarantee high precision, and a _fast_ version with lazy requirement on precision. How should the compiler allow the user to choose between the two? 2. A user can write serial code and rely on the auto-vectorization capabilities of the compiler to generate vector functions using the `#pragma omp declare simd` directive of OpenMP. Sometimes the compiler doesn't do a good job in vectorizing such functions, because not all the micro-architectural capabilities of a vector extension can be exposed in the vectorizer pass. This situation often forces a user to write target specific vector loops that invoke a target specific implementation of the vector function, mostly via pre-processor directive that reduce the maintainability and portability of the code. How can we help clang users to avoid such situation? Could they rely on the compiler in picking up the correct version of the vector function without modifying the original code other than adding the source of the hand optimized version of the vector function?
Proposed schedule:
1. Enunciate the problem that we are trying to solve. 2. List the proposed solutions. 3. Discuss pros and cons of each of them. 4. Come up with a common plan that we can implement in clang/LLVM.

Speakers

Thursday October 18, 2018 5:00pm - 5:30pm PDT
3 - BoF (Rm LL21CD)

5:00pm PDT

Round Tables
Round table discussions. These are informal discussions among a group of people on a specific topic.

Topics to be discussed during this time slot (but check the flipchart outside the room for most current list):
  1. LLVM session at ISC-HPC

Thursday October 18, 2018 5:00pm - 5:30pm PDT
4 - Round Tables (Rm LL21EF)

5:30pm PDT

Closing
Closing

Speakers
avatar for Tanya Lattner

Tanya Lattner

President, LLVM Foundation
President, LLVM Foundation


Thursday October 18, 2018 5:30pm - 5:45pm PDT
1 - General Session (Rm LL20ABC)
 
Filter sessions
Apply filters to sessions.