Fabien Gaud
Since November 2017, I am a software development engineer at Amazon AWS, working on S3
From May 2014 to August 2017, I was a senior software engineer at Coho Data. My main focus was improving the performance and scalability of a high performance distributed NFS server.
From October 2011 to May 2014, I was a post-doctoral fellow at Simon Fraser University under the supervision of Dr. Alexandra Fedorova.
Previously, I was a PHD student from 2007 to 2010 at Grenoble University. My advisors were Renaud Lachaize, Vivien Quéma and Jean-Bernard Stefani. I was involved in Sardes, a joint team of the INRIA and LIG.
From January 2011 to September 2011, I stayed in the Sardes team as a post-doctoral fellow.
Improving performance of a hig-performance distributed NFS server
Coho Data was distributing a high-performance NFS appliance. Two different architectures were supported: a hybrid architecture (SSH + Spinning disks) and an all-flash array (NVME Flash + SSD drives).
My main role was do work on the performance of the NFS server. I especially worked on data replication and striping, wrote several performance analysis tools, implemented a container for small files which drastically reduced metadata operations latency by 50% and rebalancing duration from hours to minutes, designed a new cluster membership and distributed locking mechnism and redesigned the snapshot mechanism and implemented data deduplication.
Improving large page support on NUMA multicore architectures
Large pages are commonly used in large workloads to reduce the impact of TLB misses on application performance. However, with modern NUMA multicore systems, some care must be taken since increasing page size can generate undesirable NUMA effects that strongly degrade performance. We identified two different types of problems that can occur when enabling large pages on NUMA systems: hot pages and page-level false sharing. We designed a new large page management algorithm that detects and fixes these NUMA effects with large pages while keeping, whenever possible, the positive impact of large page usage on performance.
Code is available here.
Managing resource contention management for modern multicore architectures
In modern multicore architectures, cores share several resources such as caches, interconnects and memory controllers. If not taken into account properly, this sharing can significantly hurt application performance.
My main project was to produce a new memory management algorithm to mitigate memory controller and interconnect congestion on NUMA multicore machines. While previous work considered data locality as the most important optimization, we identified that reducing memory controller and interconnect congestion was actually more important. In order to reduce congestion, we leveraged three different techniques (migrating memory, evenly distributing memory on memory nodes, replicating memory) that aim at balancing the load on memory controllers, while keeping data as local as possible. Our experiments showed that this strategy produces large performance improvements.
The code of our novel memory management algorithm, Carrefour, is available here.
Scaling the Apache Web server on NUMA multicore architectures
Apache is the most widely used Web server. It is thus important that this server scales perfectly on now mainstream NUMA multicore systems. We identified, using the well-known SPECweb 2005 Support workload, that this was not the case on a 4-node NUMA system. We proposed three different optimizations to handle both software and hardware related issues. Using these optimizations, we were able to reach an ideal performance scalability on a 4-node (16 cores) server.
Improving support for event-driven application on multicore architectures
On single-core architectures, event-driven programming was a popular way to produce robust servers. Libasync-mp was the only available runtime that allows event-driven applications to take advantage of multicore architectures with minimal effort from the programmer. As many runtimes, Libsync-mp relies on a work stealing algorithm to balance load on cores. We identified several points in this algorithm that limit performance of applications. Especially, we showed that exploiting the specificity of even-driven applications when stealing work and carefully designing the runtime to reduce the overhead of this mechanism provide great performance improvements on data servers.
We produced a new runtime (Mely) backward-compatible with Libasync-mp. This runtime is available here.
Publications
The Linux Scheduler: A Decade of Wasted Cores.
Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Alexandra Fedorova and Vivien Quéma. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys), London, United Kingdom, April 2016
[pdf]
[slides]
Challenges of memory management on modern NUMA systems.
Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, and Mark Roth. Communication of the ACM 58, 12, pp. 59-66, December 2015
[pdf]
Large Pages May be Harmful on NUMA Systems.
Fabien Gaud, Baptiste Lepers, Justin Funston, Jeremie Decouchant, Justin Funston, Alexandra Fedorova and Vivien Quéma. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC), Philadelphia, USA, June 2014.
[pdf]
[slides]
Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems.
Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quéma, and Mark Roth. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Houston, USA, March 2013.
[pdf]
[slides]
A Practical Method for Estimating Performance Degradation on Multicore Processors and its Application to HPC Workloads.
Tyler Dwyer, Alexandra Fedorova, Sergey Blagodurov, Mark Roth, Fabien Gaud and Jian Pei. Supercomputing Conference (SC), 2012
[pdf]
Efficient Workstealing for Multicore Event-Driven Systems.
Fabien Gaud, Sylvain Genevès, Renaud Lachaize, Baptiste Lepers, Fabien Mottet, Gilles Muller, and Vivien Quéma. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS), pages 516-525, Genoa, Italy, June 2010.
[pdf]
[slides]
National Conferences (French)
Optimisations applicatives pour multi-cœurs NUMA : un cas d’étude avec le serveur web Apache.
Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Gilles Muller, Vivien Quéma. In Proceedings of Conférence Française sur les Systèmes d’Exploitation (CFSE'8), Saint-Malo, France, May 2011.
[pdf]
[slides]
Vol de tâches efficace pour systèmes événementiels multi-coeurs.
Fabien Gaud, Sylvain Genevès, Fabien Mottet. In Proceedings of Conférence Française en Systèmes d’Exploitation (CFSE’7), Toulouse, September 2009.
[pdf]
[slides]
Technical reports
Application-Level Optimizations on NUMA Multicore Architectures : the Apache Case Study
Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Gilles Muller, and Vivien Quéma. Research report RR-LIG-011, LIG, Grenoble, France, March 2011.
[pdf]
Thesis (French)
Étude et amélioration de la performance des serveurs de données pour les architectures multi-cœurs.
Fabien Gaud. PhD Thesis, Grenoble University, France, 2010.
[pdf]
[slides (English)]
[tel]
Gestion autonome de flots d'exécution événementiels.
Fabien Gaud. Master's thesis, Université Joseph Fourier, Grenoble, France, 2007i.
[pdf]
[slides]
From 2007 to 2010, I was a teaching assistant at Université Joseph Fourier. I was teaching both at UFR-IMA and at RICM, a department of the Polytech'Grenoble school.
2009-2010
Instructor
Middlewares and Databases, instructor of the Middleware partSlides: Introduction, RMI, Servlets, Multi-tier architectures
Teaching assistant
Ecom, a J2EE e-commerce applicationIntroduction to computer networks
2008-2009
Instructor
Middlewares and Databases, instructor of the Middleware partSlides: Introdution, Sockets, RMI, Servlets, JSP, Multi-tier architectures
Teaching assistant
Ecom, a J2EE e-commerce applicationIntroduction to computer networks
2007-2008
Teaching assistant
Ecom, a J2EE e-commerce applicationCorba project, developing a distributed repertory using Corba
BGP, experimentation around the BGP protocol
End-year project
Introduction to computer science