Papers

IPDPSW24

ZSMILES: An Approach for Efficient SMILES Storage for Random Access in Virtual Screening

Virtual screening is a technique used in drug discovery to select the most promising molecules to test in a lab. To perform virtual screening, we need a large set of molecules as input, and storing these molecules can become an issue. In fact, extreme-scale high-throughput virtual screening applications require a big dataset of input molecules and produce an even bigger dataset as output. These molecules’ databases occupy tens of TB of storage space, and domain experts frequently sample a small portion of this data. In this context, SMILES is a popular data format for storing large sets of molecules since it requires significantly less space to represent molecules than other formats (e.g., MOL2, SDF). This paper proposes an efficient dictionary-based approach to compress SMILES-based datasets. This approach takes advan-tage of domain knowledge to provide a readable output with separable SMILES, enabling random access. We examine the benefits of storing these datasets using ZSMILES to reduce the cold storage footprint in HPC systems. The main contributions concern a custom dictionary-based approach and a data preprocessing step. From experimental results, we can notice how ZSMILES leverage domain knowledge to compress ×1.13 more than state of the art in similar scenarios and up to 0.29 compression ratio. We tested a CUDA version of ZSMILES targetting NVIDIA’s GPUs, showing a potential speedup of 7×.

CSBJ24

Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities

Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.

FGCS24

Enabling performance portability on the LiGen drug discovery pipeline

In recent years, there has been a growing interest in developing high-performance implementations of drug discovery processing software. To target modern GPU architectures, such applications are mostly written in proprietary languages such as CUDA or HIP. However, with the increasing heterogeneity of modern HPC systems and the availability of accelerators from multiple hardware vendors, it has become critical to be able to efficiently execute drug discovery pipelines on multiple large-scale computing systems, with the ultimate goal of working on urgent computing scenarios. This article presents the challenges of migrating LiGen, an industrial drug discovery software pipeline, from CUDA to the SYCL programming model, an industry standard based on C++ that enables heterogeneous computing. We perform a structured analysis of the performance portability of the SYCL LiGen platform, focusing on different aspects of the approach from different perspectives. First, we analyze the performance portability provided by the high-level semantics of SYCL, including the most recent group algorithms and subgroups of SYCL 2020. Second, we analyze how low-level aspects such as kernel occupancy and register pressure affect the performance portability of the overall application. The experimental evaluation is performed on two different versions of LiGen, implementing two different parallelization patterns, by comparing them with a manually optimized CUDA version, and by evaluating performance portability using both known and ad hoc metrics. The results show that, thanks to the combination of high-level SYCL semantics and some manual tuning, LiGen achieves native-comparable performance on NVIDIA, while also running on AMD GPUs.

IWOCL24

Unlocking performance portability on LUMI-G supercomputer: A virtual screening case study

High-Performance Computing is the target system for virtual screening applications, which aim to suggest which candidates to test in the drug discovery process. The HPC heterogeneity of modern systems raises the functional and performance portability challenge. LiGen is a well-known virtual screening application that can offload the most demanding computation on GPUs. It has been used to perform extreme-scale virtual screening campaigns on HPC systems equipped with NVIDIA cards using a CUDA implementation. This paper reports the experience of running its SYCL implementation on the LUMI-G HPC system that leverages AMD GPUs. Based on the experimental results, the LiGen SYCL implementation performs well on AMD GPUs, enabling LiGen to run a virtual screening campaign on LUMI-G HPC infrastructure.

SC24

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Virtual screening is an early stage in the drug discovery process that selects the most promising candidates. In the urgent computing scenario, finding a solution in the shortest time frame is critical. Any improvement in the performance of a virtual screening application translates into an increase in the number of candidates evaluated, thereby raising the probability of finding a drug. In this paper, we show how we can improve application throughput using Out-of-kernel optimizations. They use input features, kernel requirements, and architectural features to rearrange the kernel inputs, executing them out of order, to improve the computation efficiency. These optimizations’ implementations are designed on an extreme-scale virtual screening application, named LiGen, that can hinge on CUDA and SYCL kernels to carry out the computation on modern supercomputer nodes. Even if they are tailored to a single application, they might also be of interest for applications that share a similar design pattern. The experimental results show how these optimizations can increase kernel performance by 2 X, respectively, up to 2.2X in CUDA and up to 1.9X, in SYCL. Moreover, the reported speedup can be achieved with the best-proposed parameterization, as shown by the data we collected and reported in this manuscript.

JDPC23

GPU-optimized approaches to molecular docking-based virtual screening in drug discovery: A comparative analysis

Finding a novel drug is a very long and complex procedure. Using computer simulations, it is possible to accelerate the preliminary phases by performing a virtual screening that filters a large set of drug candidates to a manageable number. This paper presents the implementations and comparative analysis of two GPU-optimized implementations of a virtual screening algorithm targeting novel GPU architectures. This work focuses on the analysis of parallel computation patterns and their mapping onto the target architecture. The first method adopts a traditional approach that spreads the computation for a single molecule across the entire GPU. The second uses a novel batched approach that exploits the parallel architecture of the GPU to evaluate more molecules in parallel. Experimental results showed a different behavior depending on the size of the database to be screened, either reaching a performance plateau sooner or having a more extended initial transient period to achieve a higher throughput (up to 5x), which is more suitable for extreme-scale virtual screening campaigns.

SC-W23

Domain-Specific Energy Modeling for Drug Discovery and Magnetohydrodynamics Applications

Over the past few years, the adoption of energy efficiency techniques in modern computer systems is becoming increasingly relevant for sustainable computing. A well-known power management software technique for energy-efficient computing is frequency scaling which modulates the device frequency to explore the energy-performance trade-off. To achieve energy savings, a frequency tuning phase is required because different applications can have different energy and runtime behaviors depending on the frequency setting. Machine learning models can be used to predict energy and runtime, and therefore optimal frequency configurations, based on static or dynamic features extracted from the target application. While general-purpose energy models can be very accurate for a wide range of applications, their accuracy can be limited by the specific input of the target application. We present an energy characterization that spans the fields of drug discovery and magnetohydrodynamics by using two real-world applications as case studies: LiGen and Cronos. Additionally, to overcome the limitations of general-purpose approaches, we define two domain-specific energy models, which enhance the general-purpose energy models by leveraging the target application’s input parameter to increase the final accuracy. Experimental results show that for both applications, domain-specific models achieve a ten times lower error compared to the general-purpose energy models.

Expert Opinion23

MEDIATE - Molecular DockIng at homE: Turning collaborative simulations into therapeutic solutions

Collaborative computing has attracted great interest in the possibility of joining the efforts of researchers worldwide. Its relevance has further increased during the pandemic crisis since it allows for the strengthening of scientific collaborations while avoiding physical interactions. Thus, the E4C consortium presents the MEDIATE initiative which invited researchers to contribute via their virtual screening simulations that will be combined with AI-based consensus approaches to provide robust and method-independent predictions. The best compounds will be tested, and the biological results will be shared with the scientific community.Areas coveredIn this paper, the MEDIATE initiative is described. This shares compounds’ libraries and protein structures prepared to perform standardized virtual screenings. Preliminary analyses are also reported which provide encouraging results emphasizing the MEDIATE initiative’s capacity to identify active compounds.Expert opinionStructure-based virtual screening is well-suited for collaborative projects provided that the participating researchers work on the same input file. Until now, such a strategy was rarely pursued and most initiatives in the field were organized as challenges. The MEDIATE platform is focused on SARS-CoV-2 targets but can be seen as a prototype which can be utilized to perform collaborative virtual screening campaigns in any therapeutic field by sharing the appropriate input files.

TETC23

EXSCALATE: An Extreme-Scale Virtual Screening Platform for Drug Discovery Targeting Polypharmacology to Fight SARS-CoV-2

The social and economic impact of the COVID-19 pandemic demands a reduction of the time required to find a therapeutic cure. In this paper, we describe the EXSCALATE molecular docking platform capable to scale on an entire modern supercomputer for supporting extreme-scale virtual screening campaigns. Such virtual experiments can provide in short time information on which molecules to consider in the next stages of the drug discovery pipeline, and it is a key asset in case of a pandemic. The EXSCALATE platform has been designed to benefit from heterogeneous computation nodes and to reduce scaling issues. In particular, we maximized the accelerators’ usage, minimized the communications between nodes, and aggregated the I/O requests to serve them more efficiently. Moreover, we balanced the computation across the nodes by designing an ad-hoc workflow based on the execution time prediction of each molecule. We deployed the platform on two HPC supercomputers, with a combined computational power of 81 PFLOPS, to evaluate the interaction between 70 billion of small molecules and 15 binding-sites of 12 viral proteins of SARS-CoV-2. The experiment lasted 60 hours and it performed more than one trillion ligand-pocket evaluations, setting a new record on the