Please, feel free to reach me if you need a preprint of a paper not available here.
A Polystore Architecture Using Knowledge Graphs to Support Queries on Heterogeneous Data Stores
L. Azevedo, R. Souza, E. S Soares, R. Thiago, J. Tesolin, A. Oliveira, and M. Moreno arXiv preprint Databases (cs.DB), 2023.
[J1] [doi] [online] [pdf]
[bibtex]
@article{azevedo2023polystore, author = {Azevedo, Leonardo Guerreiro and Souza, Renan and Soares, Elton F de S and Thiago, Raphael M and Tesolin, Julio Cesar Cardoso and Oliveira, Ann C and Moreno, Marcio Ferreira}, doi = {10.48550/arXiv.2308.03584}, journal = {arXiv preprint Databases (cs.DB)}, link = {https://arxiv.org/abs/2308.03584}, pdf = {https://arxiv.org/pdf/2308.03584}, title = {A Polystore Architecture Using Knowledge Graphs to Support Queries on Heterogeneous Data Stores}, year = {2023} }
|
Workflows Community Summit 2022: A Roadmap Revolution
R. da Silva, R. Badia, V. Bala, D. Bard, P. Bremer, I. Buckley, S. Caino-Lores, K. Chard, C. Goble, S. Jha, ..., R. Souza, and et al. arXiv preprint Distributed, Parallel, and Cluster Computing (cs.DC), 2023.
[J2] [doi] [online] [pdf]
[bibtex]
@article{da2023workflows, author = {da Silva, Rafael Ferreira and Badia, Rosa M and Bala, Venkat and Bard, Debbie and Bremer, Peer-Timo and Buckley, Ian and Caino-Lores, Silvina and Chard, Kyle and Goble, Carole and Jha, Shantenu and ... and Souza, Renan and {et al.}}, doi = {10.48550/arXiv.2304.00019}, journal = {arXiv preprint Distributed, Parallel, and Cluster Computing (cs.DC)}, link = {https://arxiv.org/abs/2304.00019}, pdf = {https://arxiv.org/pdf/2304.00019}, title = {Workflows Community Summit 2022: A Roadmap Revolution}, year = {2023} }
|
Workflow Provenance in the Lifecycle of Scientific Machine Learning
R. Souza, L. G. Azevedo, V. Lourenço, E. Soares, R. Thiago, R. Brandão, D. Civitarese, E. Vital Brazil, M. Moreno, P. Valduriez, M. Mattoso, R. Cerqueira, and M. A. S. Netto Concurrency and Computation: Practice and Experience, 2021.
[J3]
[abstract] [online] [pdf]
[bibtex]
Abstract. Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provide for critical requirements, such as reproducibility, model explainability, and experiment data understanding. However, scientific ML is multidisciplinary, heterogeneous, and affected by the physical constraints of the domain, making such analyses even more challenging. In this work, we leverage workflow provenance techniques to build a holistic view to support the lifecycle of scientific ML.
We contribute with (i) characterization of the lifecycle and taxonomy for data analyses; (ii) design principles to build this view, with a W3C PROV compliant data representation and a reference system architecture; and (iii) lessons learned after an evaluation in an Oil & Gas case using an HPC cluster with 393 nodes and 946 GPUs.
The experiments show that the principles enable queries that integrate domain semantics with ML models while keeping low overhead (<1%), high scalability, and an order of magnitude of query acceleration under certain workloads against without our representation.
@article{asouza2020workflow, abstract = {Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provide for critical requirements, such as reproducibility, model explainability, and experiment data understanding. However, scientific ML is multidisciplinary, heterogeneous, and affected by the physical constraints of the domain, making such analyses even more challenging. In this work, we leverage workflow provenance techniques to build a holistic view to support the lifecycle of scientific ML. We contribute with (i) characterization of the lifecycle and taxonomy for data analyses; (ii) design principles to build this view, with a W3C PROV compliant data representation and a reference system architecture; and (iii) lessons learned after an evaluation in an Oil \& Gas case using an HPC cluster with 393 nodes and 946 GPUs. The experiments show that the principles enable queries that integrate domain semantics with ML models while keeping low overhead (<1\%), high scalability, and an order of magnitude of query acceleration under certain workloads against without our representation.}, author = {Souza, Renan and G. Azevedo, Leonardo and Lourenço, Vítor and Soares, Elton and Thiago, Raphael and Brandão, Rafael and Civitarese, Daniel and Vital Brazil, Emilio and Moreno, Marcio and Valduriez, Patrick and Mattoso, Marta and Cerqueira, Renato and A. S. Netto, Marco}, journal = {Concurrency and Computation: Practice and Experience}, link = {https://doi.org/10.1002/cpe.6544}, pages = {1--21}, pdf = {https://arxiv.org/pdf/2010.00330.pdf}, title = {Workflow Provenance in the Lifecycle of Scientific Machine Learning}, volume = {e6544}, year = {2021} }
|
Distributed In-memory Data Management for Workflow Executions
R. Souza, V. Silva, A. Lima, D. Oliveira, P. Valduriez, and M. Mattoso PeerJ Computer Science, 2021.
[J4]
[abstract] [doi] [online] [pdf]
[bibtex]
Abstract. Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB’s principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB’s overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.
@article{souza_distributed_2021, abstract = {Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB's principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB's overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.}, author = {Souza, R. and Silva, V. and Lima, A. A. B. and Oliveira, D. and Valduriez, P. and Mattoso, M.}, doi = {10.7717/peerj-cs.527}, journal = {PeerJ Computer Science}, link = {https://peerj.com/articles/cs-527/}, pages = {1--30}, pdf = {https://arxiv.org/ftp/arxiv/papers/2105/2105.04720.pdf}, title = {Distributed In-memory Data Management for Workflow Executions}, volume = {7}, year = {2021} }
|
Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development
R. da Silva, H. Casanova, K. Chard, ..., R. Souza, and et al. arXiv preprint Distributed, Parallel, and Cluster Computing (cs.DC), 2021.
[J5] [online] [pdf]
[bibtex]
@inproceedings{rafael_2021_wf_summit, author = {da Silva, Rafael Ferreira and Casanova, Henri and Chard, Kyle and ... and Souza, Renan and {et al.}}, journal = {arXiv preprint Distributed, Parallel, and Cluster Computing (cs.DC)}, link = {https://arxiv.org/abs/2106.05177}, pages = {1--24}, pdf = {https://arxiv.org/pdf/2106.05177.pdf}, title = {Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development}, year = {2021} }
|
Adding Hyperknowledge-enabled data lineage to a machine learning workflow management system for oil and gas
L. Azevedo, R. Souza, R. Brandão, V. Lourenço, M. Costalonga, M. de Machado, M. Moreno, and R. Cerqueira First Break, 2020.
[J6] [doi]
[bibtex]
@article{azevedo2020adding, author = {Azevedo, Leonardo Guerreiro and Souza, Renan and Brandão, Rafael and Lourenço, Vítor N and Costalonga, Marcelo and de Machado, Marcelo and Moreno, Marcio and Cerqueira, Renato}, doi = {10.3997/1365-2397.fb2020055}, journal = {First Break}, number = {7}, pages = {89--93}, publisher = {European Association of Geoscientists \& Engineers}, title = {Adding Hyperknowledge-enabled data lineage to a machine learning workflow management system for oil and gas}, volume = {38}, year = {2020} }
|
Keeping Track of User Steering Actions in Dynamic Workflows
R. Souza, V. Silva, J. Camata, A. Coutinho, P. Valduriez, and M. Mattoso Future Generation Computer Systems, 2019.
[J7]
[abstract] [doi] [online] [pdf]
[bibtex]
Abstract. In long-lasting scientific workflow executions in HPC machines, computational scientists (the users in this work) often need to fine-tune several workflow parameters. These tunings are done through user steering actions that may significantly improve performance (e.g., reduce execution time) or improve the overall results. However, in executions that last for weeks, users can lose track of what has been adapted if the tunings are not properly registered. In this work, we build on provenance data management to address the problem of tracking online parameter fine-tuning in dynamic workflows steered by users. We propose a lightweight solution to capture and manage provenance of the steering actions online with negligible overhead. The resulting provenance database relates tuning data with data for domain, dataflow provenance, execution, and performance, and is available for analysis at runtime. We show how users may get a detailed view of the execution, providing insights to determine when and how to tune. We discuss the applicability of our solution in different domains and validate its ability to allow for online capture and analyses of parameter fine-tunings in a real workflow in the Oil and Gas industry. In this experiment, the user could determine which tuned parameters influenced simulation accuracy and performance. The observed overhead for keeping track of user steering actions at runtime is less than 1% of total execution time. Keywords: Dynamic workflows, Computational steering, Provenance data, Parameter tuning
@article{souza_keeping_2019, abstract = {In long-lasting scientific workflow executions in HPC machines, computational scientists (the users in this work) often need to fine-tune several workflow parameters. These tunings are done through user steering actions that may significantly improve performance (e.g., reduce execution time) or improve the overall results. However, in executions that last for weeks, users can lose track of what has been adapted if the tunings are not properly registered. In this work, we build on provenance data management to address the problem of tracking online parameter fine-tuning in dynamic workflows steered by users. We propose a lightweight solution to capture and manage provenance of the steering actions online with negligible overhead. The resulting provenance database relates tuning data with data for domain, dataflow provenance, execution, and performance, and is available for analysis at runtime. We show how users may get a detailed view of the execution, providing insights to determine when and how to tune. We discuss the applicability of our solution in different domains and validate its ability to allow for online capture and analyses of parameter fine-tunings in a real workflow in the Oil and Gas industry. In this experiment, the user could determine which tuned parameters influenced simulation accuracy and performance. The observed overhead for keeping track of user steering actions at runtime is less than 1\% of total execution time.}, author = {Souza, Renan and Silva, Vítor and Camata, Jose J. and Coutinho, Alvaro L. G. A. and Valduriez, Patrick and Mattoso, Marta}, doi = {10.1016/j.future.2019.05.011}, issn = {0167-739X}, journal = {Future Generation Computer Systems}, keyword = {Dynamic workflows, Computational steering, Provenance data, Parameter tuning}, link = {https://doi.org/10.1016/j.future.2019.05.011}, pages = {624--643}, pdf = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-02127456/document}, title = {Keeping Track of User Steering Actions in Dynamic Workflows}, volume = {99}, year = {2019} }
|
Adding Domain Data to Code Profiling Tools to Debug Workflow Parallel Execution
V. Silva, L. Neves, R. Souza, A. Coutinho, D. de Oliveira, and M. Mattoso Future Generation Computer Systems, 2018.
[J8] [doi]
[bibtex]
@article{silva_adding_2018, author = {Silva, Vítor and Neves, Leonardo and Souza, Renan and Coutinho, Alvaro L. G. A. and de Oliveira, Daniel and Mattoso, Marta}, doi = {10.1016/j.future.2018.05.078}, issn = {0167-739X}, journal = {Future Generation Computer Systems}, keyword = {Scientific workflow, Debugging, Provenance, Performance analysis}, pages = {624--643}, title = {Adding Domain Data to Code Profiling Tools to Debug Workflow Parallel Execution}, year = {2018} }
|
Data Reduction in Scientific Workflows Using Provenance Monitoring and User Steering
R. Souza, V. Silva, A. Coutinho, P. Valduriez, and M. Mattoso Future Generation Computer Systems, 2017.
[J9]
[abstract] [doi] [pdf]
[bibtex]
Abstract. Scientific workflows need to be iteratively, and often interactively, executed for large input datasets. Reducing data from input datasets is a powerful way to reduce overall execution time in such workflows. When this is accomplished online (i.e., without requiring the user to stop execution to reduce the data, and then resume), it can save much time. However, determining which subsets of the input data should be removed becomes a major problem. A related problem is to guarantee that the workflow system will maintain execution and data consistent with the reduction. Keeping track of how users interact with the workflow is essential for data provenance purposes. In this paper, we adopt the “human-in-the-loop” approach, which enables users to steer the running workflow and reduce subsets from datasets online. We propose an adaptive workflow monitoring approach that combines provenance data monitoring and computational steering to support users in analyzing the evolution of key parameters and determining the subset of data to remove. We extend a provenance data model to keep track of users’ interactions when they reduce data at runtime. In our experimental validation, we develop a test case from the oil and gas domain, using a 936-cores cluster. The results on this test case show that the approach yields reductions of 32% of execution time and 14% of the data processed. Keywords: Scientific Workflows, Human in the Loop, Online Data Reduction, Provenance Data, Dynamic Workflows
@article{Souza2017Data, abstract = {Scientific workflows need to be iteratively, and often interactively, executed for large input datasets. Reducing data from input datasets is a powerful way to reduce overall execution time in such workflows. When this is accomplished online (i.e., without requiring the user to stop execution to reduce the data, and then resume), it can save much time. However, determining which subsets of the input data should be removed becomes a major problem. A related problem is to guarantee that the workflow system will maintain execution and data consistent with the reduction. Keeping track of how users interact with the workflow is essential for data provenance purposes. In this paper, we adopt the “human-in-the-loop” approach, which enables users to steer the running workflow and reduce subsets from datasets online. We propose an adaptive workflow monitoring approach that combines provenance data monitoring and computational steering to support users in analyzing the evolution of key parameters and determining the subset of data to remove. We extend a provenance data model to keep track of users’ interactions when they reduce data at runtime. In our experimental validation, we develop a test case from the oil and gas domain, using a 936-cores cluster. The results on this test case show that the approach yields reductions of 32\% of execution time and 14\% of the data processed.}, author = {Souza, Renan and Silva, Vítor and Coutinho, Alvaro L. G. A. and Valduriez, Patrick and Mattoso, Marta}, doi = {10.1016/j.future.2017.11.028}, issn = {0167-739X}, journal = {Future Generation Computer Systems}, keyword = {Scientific Workflows, Human in the Loop, Online Data Reduction, Provenance Data, Dynamic Workflows}, pages = {481--501}, pdf = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-01679967/document}, title = {Data Reduction in Scientific Workflows Using Provenance Monitoring and User Steering}, volume = {110}, year = {2017} }
|
A Hybrid Architecture for Multi-party Conversational Systems
M. de Bayser, P. Cavalin, R. Souza, A. Braz, H. Candello, C. Pinhanez, and J. Briot arXiv preprint Computation and Language (cs.CL), 2017.
[J10] [online] [pdf]
[bibtex]
@article{de2017hybrid, author = {de Bayser, Maira Gatti and Cavalin, Paulo and Souza, Renan and Braz, Alan and Candello, Heloisa and Pinhanez, Claudio and Briot, Jean-Pierre}, journal = {arXiv preprint Computation and Language (cs.CL)}, link = {https://arxiv.org/abs/1705.01214}, pages = {1--40}, pdf = {https://arxiv.org/pdf/1705.01214.pdf}, title = {A Hybrid Architecture for Multi-party Conversational Systems}, year = {2017} }
|
Workflow Provenance in the Computing Continuum for Responsible, Trustworthy, and Energy-Efficient AI
R. Souza, S. Caino-Lores, M. Coletti, T. Skluzacek, A. Costan, F. Suter, M. Mattoso, and R. Silva IEEE International Conference on e-Science, 2024.
[C1]
[abstract] [pdf]
[bibtex]
Abstract. As Artificial Intelligence (AI) becomes more pervasive in our society, it is crucial to develop, deploy, and assess Responsible and Trustworthy AI (RTAI) models, i.e., those that consider not only accuracy but also other aspects, such as explainability, fairness, and energy efficiency. Workflow provenance data have historically enabled critical capabilities towards RTAI. Provenance data derivation paths contribute to responsible workflows through transparency in tracking artifacts and resource consumption. Provenance data are well-known for their trustworthiness, helping explainability, reproducibility, and accountability. However, there are complex challenges to achieving RTAI, which are further complicated by the heterogeneous infrastructure in the computing continuum (Edge-Cloud-HPC) used to develop and deploy models. As a result, a significant research and development gap remains between workflow provenance data management and RTAI. In this paper, we present a vision of the pivotal role of workflow provenance in supporting RTAI and discuss related challenges. We present a schematic view of the relationship between RTAI and provenance, and highlight open research directions. Keywords: Artificial Intelligence, Provenance, Machine Learning, AI workflows, ML workflows, Responsible AI, Trustworthy AI, Reproducibility, AI Lifecycle, Energy-efficient AI
@inproceedings{souza_rtai_2024, abstract = {As Artificial Intelligence (AI) becomes more pervasive in our society, it is crucial to develop, deploy, and assess Responsible and Trustworthy AI (RTAI) models, i.e., those that consider not only accuracy but also other aspects, such as explainability, fairness, and energy efficiency. Workflow provenance data have historically enabled critical capabilities towards RTAI. Provenance data derivation paths contribute to responsible workflows through transparency in tracking artifacts and resource consumption. Provenance data are well-known for their trustworthiness, helping explainability, reproducibility, and accountability. However, there are complex challenges to achieving RTAI, which are further complicated by the heterogeneous infrastructure in the computing continuum (Edge-Cloud-HPC) used to develop and deploy models. As a result, a significant research and development gap remains between workflow provenance data management and RTAI. In this paper, we present a vision of the pivotal role of workflow provenance in supporting RTAI and discuss related challenges. We present a schematic view of the relationship between RTAI and provenance, and highlight open research directions.}, author = {Renan Souza and Silvina Caino-Lores and Mark Coletti and Tyler J. Skluzacek and Alexandru Costan and Frederic Suter and Marta Mattoso and Rafael Ferreira da Silva}, booktitle = {IEEE International Conference on e-Science}, keyword = {Artificial Intelligence, Provenance, Machine Learning, AI workflows, ML workflows, Responsible AI, Trustworthy AI, Reproducibility, AI Lifecycle, Energy-efficient AI}, location = {Osaka, Japan}, pdf = {https://renansouza.org/data/papers/ProvRespAI_preprint.pdf}, publisher = {IEEE}, title = {Workflow Provenance in the Computing Continuum for Responsible, Trustworthy, and Energy-Efficient {AI}}, year = {2024} }
|
Integrating Evolutionary Algorithms with Distributed Deep Learning for Optimizing Hyperparameters on HPC System
M. Coletti, R. Souza, T. Skluzacek, F. Suter, and R. Silva Workflows in Support of Large-Scale Science (WORKS) workshop co-located with the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2024.
[C2]
[bibtex]
@inproceedings{coletti_2024, author = {Mark Coletti and Renan Souza and Tyler J. Skluzacek and Frederic Suter and Rafael Ferreira da Silva}, booktitle = {Workflows in Support of Large-Scale Science ({WORKS}) workshop co-located with the {ACM}/{IEEE} International Conference for High Performance Computing, Networking, Storage, and Analysis ({SC})}, location = {Atlanta, USA}, publisher = {IEEE}, title = {Integrating Evolutionary Algorithms with Distributed Deep Learning for Optimizing Hyperparameters on HPC System}, year = {2024} }
|
Eco-Driven AI-HPC: Optimizing Energy Efficiency in Distributed Scientific Workflows
R. Silva, W. Shin, F. Suter, A. Gainaru, R. Souza, D. Dietz, and S. Jha Energy-Efficient Computing for Science Workshop, 2024.
[C3]
[bibtex]
@inproceedings{silva_ee_2024, author = {Rafael Ferreira da Silva and Woong Shin and Frederic Suter and Ana Gainaru and Renan Souza and Dan Dietz and Shantenu Jha}, booktitle = {Energy-Efficient Computing for Science Workshop}, location = {Bethesda, MD, USA}, title = {Eco-Driven AI-HPC: Optimizing Energy Efficiency in Distributed Scientific Workflows}, year = {2024} }
|
Towards Cross-Facility Workflows Orchestration through Distributed Automation
T. Skluzacek, R. Souza, M. Coletti, F. Suter, and R. Silva Practice and Experience in Advanced Research Computing (PEARC 24), 2024.
[C4] [doi] [online]
[bibtex]
@inproceedings{skluzacek_pearc_2024, author = {Tyler J. Skluzacek and Renan Souza and Mark Coletti and Frederic Suter and Rafael Ferreira da Silva}, booktitle = {Practice and Experience in Advanced Research Computing (PEARC 24)}, doi = {10.1145/3626203.3670606}, link = {https://doi.org/10.1145/3626203.3670606}, location = {Providence, RI, USA}, publisher = {Association for Computing Machinery}, title = {Towards Cross-Facility Workflows Orchestration through Distributed Automation}, year = {2024} }
|
Advancing Computational Earth Sciences: Innovations and Challenges in Scientific HPC Workflows
R. da Silva, K. Maheshwari, T. Skluzacek, R. Souza, and S. Wilkinson European Geosciences Union (EGU), 2024.
[C5]
[bibtex]
@inproceedings{dasilva_agu_2024, author = {da Silva, Rafael Ferreira and Maheshwari, Ketan and Skluzacek, T and Souza, Renan and Wilkinson, Sean}, booktitle = {European Geosciences Union (EGU)}, title = {Advancing Computational Earth Sciences: Innovations and Challenges in Scientific HPC Workflows}, year = {2024} }
|
HKPoly: A Polystore Architecture to Support Data Linkage and Queries on Distributed and Heterogeneous Data
L. Azevedo, R. Souza, E. Soares, R. Thiago, J. Tesolin, A. Oliveira, and M. Moreno Proceedings of the 20th Brazilian Symposium on Information Systems (SBSI), 2024.
[C6] [doi] [online]
[bibtex]
@inproceedings{azevedo_hkpoly2024, address = {New York, NY, USA}, articleno = {50}, author = {Azevedo, Leonardo Guerreiro and Souza, Renan and Soares, Elton and Thiago, Raphael Melo and Tesolin, Julio Cesar Cardoso and Oliveira, Anna Carolina Carvalho Moreira and Moreno, Marcio Ferreira}, booktitle = {Proceedings of the 20th Brazilian Symposium on Information Systems (SBSI)}, doi = {10.1145/3658271.3658322}, isbn = {9798400709968}, keyword = {Business process, Database integration, Distributed databases, Microservices, Provenance., Query processing}, link = {https://doi.org/10.1145/3658271.3658322}, numpages = {10}, publisher = {Association for Computing Machinery}, series = {SBSI '24}, title = {HKPoly: A Polystore Architecture to Support Data Linkage and Queries on Distributed and Heterogeneous Data}, year = {2024} }
|
Towards Lightweight Data Integration using Multi-workflow Provenance and Data Observability
R. Souza, T. Skluzacek, S. Wilkinson, M. Ziatdinov, and R. da Silva IEEE International Conference on e-Science, 2023.
[C7]
[abstract] [doi] [online] [pdf]
[bibtex]
Abstract. Modern large-scale scientific discovery requires multidisciplinary collaboration across diverse computing facilities, including High Performance Computing (HPC) machines and the Edge-to-Cloud continuum. Integrated data analysis plays a crucial role in scientific discovery, especially in the current AI era, by enabling Responsible AI development, FAIR, Reproducibility, and User Steering. However, the heterogeneous nature of science poses challenges such as dealing with multiple supporting tools, cross-facility environments, and efficient HPC execution. Building on data observability, adapter system design, and provenance, we propose MIDA: an approach for lightweight runtime Multi-workflow Integrated Data Analysis. MIDA defines data observability strategies and adaptability methods for various parallel systems and machine learning tools. With observability, it intercepts the dataflows in the background without requiring instrumentation while integrating domain, provenance, and telemetry data at runtime into a unified database ready for user steering queries. We conduct experiments showing end-to-end multi-workflow analysis integrating data from Dask and MLFlow in a real distributed deep learning use case for materials science that runs on multiple environments with up to 276 GPUs in parallel. We show near-zero overhead running up to 100,000 tasks on 1,680 CPU cores on the Summit supercomputer.
@inproceedings{souza2023towards, abstract = {Modern large-scale scientific discovery requires multidisciplinary collaboration across diverse computing facilities, including High Performance Computing (HPC) machines and the Edge-to-Cloud continuum. Integrated data analysis plays a crucial role in scientific discovery, especially in the current AI era, by enabling Responsible AI development, FAIR, Reproducibility, and User Steering. However, the heterogeneous nature of science poses challenges such as dealing with multiple supporting tools, cross-facility environments, and efficient HPC execution. Building on data observability, adapter system design, and provenance, we propose MIDA: an approach for lightweight runtime Multi-workflow Integrated Data Analysis. MIDA defines data observability strategies and adaptability methods for various parallel systems and machine learning tools. With observability, it intercepts the dataflows in the background without requiring instrumentation while integrating domain, provenance, and telemetry data at runtime into a unified database ready for user steering queries. We conduct experiments showing end-to-end multi-workflow analysis integrating data from Dask and MLFlow in a real distributed deep learning use case for materials science that runs on multiple environments with up to 276 GPUs in parallel. We show near-zero overhead running up to 100,000 tasks on 1,680 CPU cores on the Summit supercomputer.}, author = {Souza, Renan and Skluzacek, Tyler J and Wilkinson, Sean R and Ziatdinov, Maxim and da Silva, Rafael Ferreira}, booktitle = {IEEE International Conference on e-Science}, doi = {10.1109/e-Science58273.2023.10254822}, link = {https://doi.org/10.1109/e-Science58273.2023.10254822}, pdf = {https://arxiv.org/pdf/2308.09004.pdf}, title = {Towards Lightweight Data Integration using Multi-workflow Provenance and Data Observability}, year = {2023} }
|
ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum
D. Rosendo, M. Mattoso, A. Costan, R. Souza, D. Pina, P. Valduriez, and G. Antoniu IEEE International Conference on Cluster Computing, 2023.
[C8] [doi] [online] [pdf]
[bibtex]
@inproceedings{rosendo2023provlight, author = {Rosendo, Daniel and Mattoso, Marta and Costan, Alexandru and Souza, Renan and Pina, D{\'e}bora and Valduriez, Patrick and Antoniu, Gabriel}, booktitle = {IEEE International Conference on Cluster Computing}, doi = {10.1109/CLUSTER52292.2023.00026}, link = {https://www.computer.org/csdl/proceedings-article/cluster/2023/079200a221/1SfUrCnjgAM}, pdf = {https://arxiv.org/pdf/2307.10658}, title = {{ProvLight}: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum}, year = {2023} }
|
Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds
R. Cunha, L. Real, R. Souza, B. Silva, and M. Netto IEEE International Conference on e-Science, 2021.
[C9] [doi] [pdf]
[bibtex]
@inproceedings{cunha_2021_context, author = {Cunha, Renato LF and Real, Lucas V and Souza, Renan and Silva, Bruno and Netto, Marco AS}, booktitle = {IEEE International Conference on e-Science}, doi = {10.1109/eScience51609.2021.00013}, pdf = {https://arxiv.org/pdf/2107.00187.pdf}, title = {Context-aware Execution Migration Tool for Data Science Jupyter Notebooks on Hybrid Clouds}, year = {2021} }
|
Supporting Polystore Queries using Provenance in a Hyperknowledge Graph
L. Azevedo, R. Souza, E. Soares, R. Thiago, A. Oliveira, and M. Moreno International Semantic Web Conference (ISWC), 2021.
[C10] [pdf]
[bibtex]
@inproceedings{azevedo_supporting_2021, author = {Azevedo, Leonardo and Souza, Renan and Soares, Elton and Thiago, Raphael and Oliveira, Anna and Moreno, Marcio}, booktitle = {International Semantic Web Conference (ISWC)}, pages = {1--4}, pdf = {http://ceur-ws.org/Vol-2980/paper368.pdf}, title = {Supporting Polystore Queries using Provenance in a Hyperknowledge Graph}, year = {2021} }
|
User Steering Support in Large-scale Workflows
R. Souza PhD Thesis Contest: Brazilian Symposium on Databases (SBBD), 2021.
[C11] [pdf]
[bibtex]
@inproceedings{souza_2021_ctd_sbbd, author = {Souza, Renan}, booktitle = {PhD Thesis Contest: Brazilian Symposium on Databases ({SBBD})}, pdf = {https://sol.sbc.org.br/index.php/sbbd_estendido/article/download/18185/18019}, title = {User Steering Support in Large-scale Workflows}, year = {2021} }
|
A Recommender for Choosing Data Systems based on Application Profiling and Benchmarking
E. Soares, R. Souza, R. Thiago, M. Machado, and L. Azevedo Brazilian Symposium on Databases (SBBD), 2021.
[C12]
[bibtex]
@inproceedings{soares_2021_recommender, author = {Soares, Elton and Souza, Renan and Thiago, Raphael and Machado, Marcelo and Azevedo, Leonardo}, booktitle = {Brazilian Symposium on Databases ({SBBD})}, pages = {265-270}, title = {A Recommender for Choosing Data Systems based on Application Profiling and Benchmarking}, year = {2021} }
|
Cycle Orchestrator: A Knowledge-Based Approach for Structuring Cyclic ML Pipelines in the O&G Industry
R. Brandão, V. Lourenço, M. Machado, L. Azevedo, M. Cardoso, R. Souza, G. Lima, R. Cerqueira, and M. Moreno International Semantic Web Conference (ISWC), 2020.
[C13]
[bibtex]
@inproceedings{brandao2020cycle, author = {Brand{\~a}o, Rafael and Louren{\c{c}}o, Vitor and Machado, Marcelo and Azevedo, Leonardo and Cardoso, Marcelo and Souza, Renan and Lima, Guilherme and Cerqueira, Renato and Moreno, Marcio}, booktitle = {International Semantic Web Conference (ISWC)}, title = {Cycle Orchestrator: A Knowledge-Based Approach for Structuring Cyclic ML Pipelines in the O\&G Industry}, year = {2020} }
|
A Knowledge-Based Approach for Structuring Cyclic Workflows
R. Brandão, V. Lourenço, M. Machado, L. Azevedo, M. Cardoso, R. Souza, G. Lima, R. Cerqueira, and M. Moreno International Semantic Web Conference (ISWC), 2020.
[C14]
[bibtex]
@inproceedings{brandao2020knowledge, author = {Brand{\~a}o, Rafael and Louren{\c{c}}o, Vitor and Machado, Marcelo and Azevedo, Leonardo and Cardoso, Marcelo and Souza, Renan and Lima, Guilherme and Cerqueira, Renato and Moreno, Marcio}, booktitle = {International Semantic Web Conference (ISWC)}, title = {A Knowledge-Based Approach for Structuring Cyclic Workflows}, year = {2020} }
|
Runtime Steering of Parallel CFD Simulations
R. Souza, J. Camata, M. Mattoso, and A. Coutinho International Conference on Parallel Computational Fluid Dynamics, 2020.
[C15]
[bibtex]
@inproceedings{souza_runtime_2020, author = {Souza, Renan and Camata, J. and Mattoso, Marta and Coutinho, Alvaro}, booktitle = {International Conference on Parallel Computational Fluid Dynamics}, title = {Runtime Steering of Parallel CFD Simulations}, year = {2020} }
|
Experiencing ProvLake to Manage the Data Lineage of AI Workflows
L. Azevedo, R. Souza, R. Thiago, E. Soares, and M. Moreno Innovation Summit on Information Systems (EISI) in Brazilian Symposium in Information Systems (SBSI), 2020.
[C16]
[bibtex]
@inproceedings{azevedo_experiencing_2020, author = {Azevedo, Leonardo and Souza, Renan and Thiago, Raphael and Soares, Elton and Moreno, Marcio}, booktitle = {Innovation Summit on Information Systems (EISI) in Brazilian Symposium in Information Systems (SBSI)}, title = {Experiencing ProvLake to Manage the Data Lineage of AI Workflows}, year = {2020} }
|
Modern Federated Databases: an Overview
L. Azevedo, R. Souza, E. Soares, and M. Moreno International Conference on Enterprise Information Systems (ICEIS), 2020.
[C17]
[bibtex]
@inproceedings{azevedo_federated_2020, author = {Azevedo, Leonardo and Souza, Renan and Soares, Elton and Moreno, Marcio}, booktitle = {International Conference on Enterprise Information Systems (ICEIS)}, title = {Modern Federated Databases: an Overview}, year = {2020} }
|
Supporting the Training of Physics Informed Neural Networks for Seismic Inversion Using Provenance
R. Souza, A. Codas, J. Nogueira Junior, M. Quinones, L. Azevedo, R. Thiago, E. Soares, M. Cardoso, and L. Martins American Association of Petroleum Geologists Annual Convention and Exhibition (AAPG), 2020.
[C18]
[bibtex]
@inproceedings{souza_aapg_2020, author = {Souza, Renan and Codas, A. and Nogueira Junior, J. Almeida and Quinones, M. P. and Azevedo, L. and Thiago, R. and Soares, E. and Cardoso, M. and Martins, L.}, booktitle = {American Association of Petroleum Geologists Annual Convention and Exhibition ({AAPG})}, title = {Supporting the Training of Physics Informed Neural Networks for Seismic Inversion Using Provenance}, year = {2020} }
|
Managing Data Lineage of O&G Machine Learning Models: The Sweet Spot for Shale Use Case
R. Thiago, R. Souza, L. Azevedo, E. Soares, R. Santos, W. Santos, M. De Bayser, M. Cardoso, M. Moreno, and R. Cerqueira European Association of Geoscientists and Engineers (EAGE) Digitalization Conference and Exhibition, 2020.
[C19] [doi] [pdf]
[bibtex]
@inproceedings{souza_eage_2020, author = {Thiago, Raphael and Souza, Renan and Azevedo, L. and Soares, E. and Santos, Rodrigo, and Santos, Wallas and De Bayser, Max and Cardoso, M. and Moreno, M. and Cerqueira, Renato}, booktitle = {European Association of Geoscientists and Engineers (EAGE) Digitalization Conference and Exhibition}, doi = {10.3997/2214-4609.202032075}, pdf = {https://arxiv.org/pdf/2003.04915.pdf}, title = {Managing Data Lineage of {O\&G} Machine Learning Models: The Sweet Spot for Shale Use Case}, year = {2020} }
|
Efficient Runtime Capture of Multiworkflow Data Using Provenance
R. Souza, L. Azevedo, R. Thiago, E. Soares, M. Nery, M. Netto, E. Brazil, R. Cerqueira, P. Valduriez, and M. Mattoso IEEE International Conference on e-Science, 2019.
[C20]
[abstract] [doi] [online] [pdf]
[bibtex]
Abstract. Computational Science and Engineering (CSE) projects are typically developed by multidisciplinary teams. Despite being part of the same project, each team manages its own workflows, using specific execution environments and data processingtools. Analyzing the data processed by all workflows globally is a core task in a CSE project. However, this analysis is hard because the data generated by these workflows are not integrated. In addition, since these workflows may take a long time to execute, data analysis needs to be done at runtime to reduce cost and time of the CSE project. A typical solution in scientific data analysis is to capture and relate the data in a provenance database while the workflows run, thus allowing for data analysisat runtime. However, the main problem is that such data capture competes with the running workflows, adding significant overhead to their execution. To mitigate this problem, we introduce in this paper a system called ProvLake, which adopts design principles for providing efficientdistributed data capture from the workflows. While capturing the data, ProvLake logically integrates and ingests them into a provenance database ready for analyses at runtime. We validated ProvLake ina real use case in the O&G industry encompassing four workflows that process 5TB datasets for a deep learning classifier. Compared with Komadu, the closest solution that meets our goals, our approach enables runtime multiworkflow data analysis with much smaller overhead, such as 0.1%. Keywords: Multiworkflow provenance, Multi-Data Lineage, Data Lake Provenance, ProvLake
@inproceedings{souza_efficient_2019, abstract = {Computational Science and Engineering (CSE) projects are typically developed by multidisciplinary teams. Despite being part of the same project, each team manages its own workflows, using specific execution environments and data processingtools. Analyzing the data processed by all workflows globally is a core task in a CSE project. However, this analysis is hard because the data generated by these workflows are not integrated. In addition, since these workflows may take a long time to execute, data analysis needs to be done at runtime to reduce cost and time of the CSE project. A typical solution in scientific data analysis is to capture and relate the data in a provenance database while the workflows run, thus allowing for data analysisat runtime. However, the main problem is that such data capture competes with the running workflows, adding significant overhead to their execution. To mitigate this problem, we introduce in this paper a system called ProvLake, which adopts design principles for providing efficientdistributed data capture from the workflows. While capturing the data, ProvLake logically integrates and ingests them into a provenance database ready for analyses at runtime. We validated ProvLake ina real use case in the O&G industry encompassing four workflows that process 5TB datasets for a deep learning classifier. Compared with Komadu, the closest solution that meets our goals, our approach enables runtime multiworkflow data analysis with much smaller overhead, such as 0.1\%.}, author = {Souza, Renan and Azevedo, Leonardo and Thiago, Raphael and Soares, Elton and Nery, Marcelo and Netto, Marco and Brazil, Emilio Vital and Cerqueira, Renato and Valduriez, Patrick and Mattoso, Marta}, booktitle = {IEEE International Conference on e-Science}, doi = {10.1109/eScience.2019.00047}, keyword = {Multiworkflow provenance, Multi-Data Lineage, Data Lake Provenance, ProvLake}, link = {https://doi.org/10.1109/eScience.2019.00047}, pages = {1--10}, pdf = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-02265932/document}, title = {Efficient Runtime Capture of Multiworkflow Data Using Provenance}, year = {2019} }
|
Managing Data Traceability in the Data Lifecycle for Deep Learning Applied to Seismic Data
R. Souza, E. Brazil, L. Azevedo, R. Ferreira, E. Soares, R. Thiago, M. Nery, V. Torres, and R. Cerqueira American Association of Petroleum Geologists Annual Convention and Exhibition (AAPG), 2019.
[C21] [online]
[bibtex]
@inproceedings{souza_managing_2019, author = {Souza, Renan and Brazil, Emilio Vital and Azevedo, Leonardo and Ferreira, Rodrigo and Chevitarese, Daniel and Soares, Elton and Thiago, Raphael and Nery, Marcelo and Torres, Viviane and Cerqueira, Renato}, booktitle = {American Association of Petroleum Geologists Annual Convention and Exhibition ({AAPG})}, link = {https://www.searchanddiscovery.com/abstracts/html/2019/ace2019/abstracts/1718.html}, title = {Managing Data Traceability in the Data Lifecycle for Deep Learning Applied to Seismic Data}, year = {2019} }
|
Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering
R. Souza, L. Azevedo, V. Lourenço, E. Soares, R. Thiago, R. Brandão, D. Civitarese, E. Vital Brazil, M. Moreno, P. Valduriez, M. Mattoso, R. Cerqueira, and M. A. S. Netto Workflows in Support of Large-Scale Science (WORKS) co-located with the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2019.
[C22]
[abstract] [doi] [pdf]
[bibtex]
Abstract. Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists’ expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle while keeping the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the Oil and Gas industry, along with its evaluation using 48 GPUs in parallel. Keywords: Machine Learning Lifecycle, Workflow Provenance, Computational Science and Engineering
@inproceedings{souza_provenancedata_2019, abstract = {Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle while keeping the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the Oil and Gas industry, along with its evaluation using 48 GPUs in parallel.}, author = {Souza, Renan and Azevedo, Leonardo and Lourenço, Vítor and Soares, Elton and Thiago, Raphael and Brandão, Rafael and Civitarese, Daniel and Vital Brazil, Emilio and Moreno, Marcio and Valduriez, Patrick and Mattoso, Marta and Cerqueira, Renato and A. S. Netto, Marco}, booktitle = {Workflows in Support of Large-Scale Science ({WORKS}) co-located with the {ACM}/{IEEE} International Conference for High Performance Computing, Networking, Storage, and Analysis ({SC})}, doi = {10.1109/WORKS49585.2019.00006}, keyword = {Machine Learning Lifecycle, Workflow Provenance, Computational Science and Engineering}, pages = {1--10}, pdf = {https://arxiv.org/pdf/1910.04223}, title = {Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering}, year = {2019} }
|
Towards a human-in-the-loop library for tracking hyperparameter tuning in deep learning development
R. Souza, L. Neves, L. Azeredo, R. Luiz, E. Tady, P. Cavalin, and M. Mattoso Latin American Data Science (LaDaS) workshop co-located with the Very Large Database (VLDB) conference, 2018.
[C23] [pdf]
[bibtex]
@inproceedings{souza_towards_2018, author = {Souza, Renan and Neves, Liliane and Azeredo, Leonardo and Luiz, Ricardo and Tady, Elaine and Cavalin, Paulo and Mattoso, Marta}, booktitle = {Latin American Data Science ({LaDaS}) workshop co-located with the Very Large Database ({VLDB}) conference}, eventtitle = {Latin American Data Science ({LaDaS}) workshop co-located with the Very Large Database ({VLDB}) conference}, location = {Rio de Janeiro, Brazil}, pages = {84--87}, pdf = {http://ceur-ws.org/Vol-2170/paper12.pdf}, title = {Towards a human-in-the-loop library for tracking hyperparameter tuning in deep learning development}, year = {2018} }
|
Capturing Provenance for Runtime Data Analysis in Computational Science and Engineering Applications
V. Silva, R. Souza, J. Camata, D. de Oliveira, P. Valduriez, A. Coutinho, and M. Mattoso Provenance and Annotation of Data and Processes - International Provenance and Annotation Workshop (IPAW), 2018.
[C24] [doi]
[bibtex]
@inproceedings{Silva2018Capturing, author = {Silva, Vítor and Souza, Renan and Camata, Jose and de Oliveira, Daniel and Valduriez, Patrick and Coutinho, Alvaro L. G. A. and Mattoso, Marta}, booktitle = {Provenance and Annotation of Data and Processes - International Provenance and Annotation Workshop (IPAW)}, doi = {10.1007/978-3-319-98379-0_15}, isbn = {978-3-319-98379-0}, pages = {183--187}, publisher = {Springer International Publishing}, series = {Lecture Notes in Computer Science ({LNCS})}, title = {Capturing Provenance for Runtime Data Analysis in Computational Science and Engineering Applications}, year = {2018} }
|
Provenance of Dynamic Adaptations in User-Steered Dataflows
R. Souza and M. Mattoso Provenance and Annotation of Data and Processes - International Provenance and Annotation Workshop (IPAW), 2018.
[C25] [doi] [pdf]
[bibtex]
@inproceedings{Souza2018Provenance, author = {Souza, Renan and Mattoso, Marta}, booktitle = {Provenance and Annotation of Data and Processes - International Provenance and Annotation Workshop (IPAW)}, doi = {10.1007/978-3-319-98379-0_2}, isbn = {978-3-319-98379-0}, pages = {16--29}, pdf = {https://www.researchgate.net/publication/327460259_Provenance_of_Dynamic_Adaptations_in_User-Steered_Dataflows_7th_International_Provenance_and_Annotation_Workshop_IPAW_2018_London_UK_July_9-10_2018_Proceedings}, publisher = {Springer International Publishing}, series = {Lecture Notes in Computer Science ({LNCS})}, title = {Provenance of Dynamic Adaptations in User-Steered Dataflows}, year = {2018} }
|
Ravel: A MAS orchestration platform for Human-Chatbots Conversations
M. de Bayser, C. Pinhanez, H. Candello, M. Affonso, M. Vasconcelos, M. Guerra, P. Cavalin, and R. Souza International Workshop on Engineering Multi-Agent Systems (EMAS@AAMAS 2018), 2018.
[C26] [pdf]
[bibtex]
@inproceedings{de2018ravel, author = {de Bayser, Maira Gatti and Pinhanez, Claudio and Candello, Heloisa and Affonso, Marisa and Vasconcelos, Mauro Pichiliani and Guerra, Melina Alberio and Cavalin, Paulo and Souza, Renan}, booktitle = {International Workshop on Engineering Multi-Agent Systems (EMAS@AAMAS 2018)}, pdf = {http://emas2018.dibris.unige.it/images/papers/EMAS18-19.pdf}, title = {Ravel: A MAS orchestration platform for Human-Chatbots Conversations}, year = {2018} }
|
Scientific Data Analysis Using Data-Intensive Scalable Computing: the SciDISC Project
P. Valduriez, M. Mattoso, R. Akbarinia, H. Borges, J. Camata, A. Coutinho, D. Gaspar, N. Lemus, J. Liu, H. Lustosa, F. Masseglia, F. Nogueira Da Silva, V. Silva, R. Souza, K. Ocaña, E. Ogasawara, D. Oliveira, E. Pacitti, F. Porto, and D. Shasha LADaS: Latin America Data Science Workshop, 2018.
[C27] [online] [pdf]
[bibtex]
@inproceedings{valduriez:lirmm-01867804, address = {Rio de Janeiro, Brazil}, author = {Valduriez, Patrick and Mattoso, Marta and Akbarinia, Reza and Borges, Heraldo and Camata, José and Coutinho, Alvaro L G A and Gaspar, Daniel and Lemus, Noel and Liu, Ji and Lustosa, Hermano and Masseglia, Florent and Nogueira Da Silva, Fabricio and Silva, Vitor and Souza, Renan and Ocaña, Kary and Ogasawara, Eduardo and Oliveira, Daniel and Pacitti, Esther and Porto, F{\'a}bio and Shasha, Dennis}, booktitle = {LADaS: Latin America Data Science Workshop}, keyword = {HPC ; Scalable Data-Intensive Computing ; Big data ; Scientific data}, link = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-01867804}, number = {2170}, pdf = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-01867804/file/ldas%202018%20-%20scidisc.pdf}, publisher = {CEUR-WS.org}, title = {Scientific Data Analysis Using Data-Intensive Scalable Computing: the SciDISC Project}, volume = {CEUR Workshop Proceedings}, year = {2018} }
|
Tracking of online parameter fine-tuning in scientific workflows
R. Souza, V. Silva, J. Camata, A. Coutinho, P. Valduriez, and M. Mattoso Workflows in Support of Large-Scale Science (WORKS) workshop co-located with the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2017.
[C28] [online]
[bibtex]
@inproceedings{Souza2017Tracking, author = {Souza, Renan and Silva, Vítor and Camata, José and Coutinho, Alvaro and Valduriez, Patrick and Mattoso, Marta}, booktitle = {Workflows in Support of Large-Scale Science ({WORKS}) workshop co-located with the {ACM}/{IEEE} International Conference for High Performance Computing, Networking, Storage, and Analysis ({SC})}, link = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-01620974}, location = {Denver, {CO}}, title = {Tracking of online parameter fine-tuning in scientific workflows}, year = {2017} }
|
Spark Scalability Analysis in a Scientific Workflow
R. Souza, V. Silva, P. Miranda, A. Lima, P. Valduriez, and M. Mattoso Brazilian Symposium on Databases (SBBD), 2017.
[C29] [pdf]
[bibtex]
@inproceedings{Souza2017Spark, author = {Souza, Renan and Silva, Vítor and Miranda, Pedro and Lima, Alexandre A B and Valduriez, Patrick and Mattoso, Marta}, booktitle = {Brazilian Symposium on Databases ({SBBD})}, pages = {288--293}, pdf = {http://sbbd.org.br/2017/wp-content/uploads/sites/3/2018/02/p288-293.pdf}, title = {Spark Scalability Analysis in a Scientific Workflow}, year = {2017} }
|
Parallel Execution of Workflows driven by Distributed Database Techniques
R. Souza MSc Thesis Contest: Brazilian Symposium on Databases (SBBD), 2017.
[C30] [pdf]
[bibtex]
@inproceedings{souza_2017_msc_contest, author = {Souza, Renan}, booktitle = {MSc Thesis Contest: Brazilian Symposium on Databases ({SBBD})}, pdf = {https://scholar.google.com.br/scholar?oi=bibs&cluster=17975638527136409759&btnI=1&hl=en}, title = {Parallel Execution of Workflows driven by Distributed Database Techniques}, year = {2017} }
|
Online Input Data Reduction in Scientific Workflows
R. Souza, V. Silva, A. Coutinho, P. Valduriez, and M. Mattoso Workflows in Support of Large-Scale Science (WORKS) workshop co-located with the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2016.
[C31] [online]
[bibtex]
@inproceedings{Souza2016Online, author = {Souza, Renan and Silva, Vítor and Coutinho, Alvaro and Valduriez, Patrick and Mattoso, Marta}, booktitle = {Workflows in Support of Large-Scale Science ({WORKS}) workshop co-located with the {ACM}/{IEEE} International Conference for High Performance Computing, Networking, Storage, and Analysis ({SC})}, link = {https://hal.archives-ouvertes.fr/lirmm-01400538}, pages = {1--10}, title = {Online Input Data Reduction in Scientific Workflows}, year = {2016} }
|
Integrating Domain-data Steering with Code-profiling Tools to Debug Data-intensive Workflows
V. Silva, L. Neves, R. Souza, A. Coutinho, D. Oliveira, and M. Mattoso Workflows in Support of Large-Scale Science (WORKS) workshop co-located with the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2016.
[C32]
[bibtex]
@inproceedings{Silva2016Integrating, author = {Silva, Vítor and Neves, Leonardo and Souza, Renan and Coutinho, Alvaro and Oliveira, Daniel De and Mattoso, Marta}, booktitle = {Workflows in Support of Large-Scale Science ({WORKS}) workshop co-located with the {ACM}/{IEEE} International Conference for High Performance Computing, Networking, Storage, and Analysis ({SC})}, keyword = {Provenance, Performance analysis, Scientific Workflow, Debugging}, location = {Salt Lake City, {USA}}, title = {Integrating Domain-data Steering with Code-profiling Tools to Debug Data-intensive Workflows}, year = {2016} }
|
Applying future Exascale HPC methodologies in the energy sector
J. Camata, J. Cela, D. Costa, A. Coutinho, D. Fernández-Galisteo, R. Souza, C. Jiménez, V. Kourdioumov, M. Mattoso, R. Mayo-García, T. Miras, J. Moríñigo, J. Navarro, D. de Oliveira, M. Rodríguez-Pascual, V. Silva, and P. Valduriez Russian Supercomputing Days, 2016.
[C33] [online] [pdf]
[bibtex]
@article{camata_applying_2016, author = {Camata, José J. and Cela, José M. and Costa, Danilo and Coutinho, Alvaro L. G. A. and Fernández-Galisteo, Daniel and Souza, Renan and Jiménez, Carmen and Kourdioumov, Vadim and Mattoso, Marta and Mayo-García, Rafael and Miras, Thomas and Moríñigo, José A. and Navarro, Jose and Oliveira, Daniel de and Rodríguez-Pascual, Manuel and Silva, Vítor and Valduriez, Patrick}, booktitle = {Russian Supercomputing Days}, keyword = {Algorithms and architectures for advanced scientific computing, Àrees temàtiques de la {UPC}::Energies, Biomass, Energy sources, Exascale, Hidrocarburs, {HPC}, Hydrocarbon, Hydrocarbon processing, Supercomputadors, Wind energy}, link = {https://upcommons.upc.edu/handle/2117/90905}, pages = {9--19}, pdf = {https://upcommons.upc.edu/bitstream/handle/2117/90905/Applying%20future%20Exascale%20HPC%20methodologies%20in%20the%20energy%20sector.pdf}, title = {Applying future Exascale {HPC} methodologies in the energy sector}, year = {2016} }
|
Building a question-answering corpus using social media and news articles
P. Cavalin, F. Figueiredo, M. de Bayser, L. Moyano, H. Candello, A. Appel, and R. Souza International Conference on Computational Processing of the Portuguese Language, 2016.
[C34]
[bibtex]
@inproceedings{cavalin2016building, author = {Cavalin, Paulo and Figueiredo, Flavio and de Bayser, Maíra and Moyano, Luis and Candello, Heloisa and Appel, Ana and Souza, Renan}, booktitle = {International Conference on Computational Processing of the Portuguese Language}, pages = {353--358}, title = {Building a question-answering corpus using social media and news articles}, year = {2016} }
|
Enhancing Energy Production with Exascale HPC Methods
J. Camata, J. Cela, D. Costa, A. Coutinho, D. Fernández-Galisteo, C. Jimenez, V. Kourdioumov, M. Mattoso, R. Mayo-García, T. Miras, J. Moríñigo, J. Navarro, P. Navaux, D. De Oliveira, M. Rodríguez-Pascual, V. Silva, R. Souza, and P. Valduriez CARLA: Latin American High Performance Computing Conference, 2016.
[C35] [doi] [online] [pdf]
[bibtex]
@inproceedings{camata:lirmm-01654914, address = {Mexico City, Mexico}, author = {Camata, Jos{\'e} and Cela, José M and Costa, Danilo and Coutinho, Alvaro L. G. A. and Fernández-Galisteo, Daniel and Jimenez, Carmen and Kourdioumov, Vadim and Mattoso, Marta and Mayo-García, Rafael and Miras, Thomas and Moríñigo, José A and Navarro, Jorge and Navaux, Philippe O A and De Oliveira, Daniel and Rodríguez-Pascual, Manuel and Silva, Vítor and Souza, Renan and Valduriez, Patrick}, booktitle = {CARLA: Latin American High Performance Computing Conference}, doi = {10.1007/978-3-319-57972-6\_17}, link = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-01654914}, number = {697}, pages = {233-246}, pdf = {https://hal-lirmm.ccsd.cnrs.fr/lirmm-01654914/file/Enhancing%20Energy%20Production%20with%20Exascale%20HPC.pdf}, publisher = {Springer}, title = {Enhancing Energy Production with Exascale HPC Methods}, volume = {Communications in Computer and Information Science}, year = {2016} }
|
Parallel Execution of Workflows Driven by a Distributed Database Management System
R. Souza, V. Silva, D. Oliveira, P. Valduriez, A. Lima, and M. Mattoso ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2015.
[C36] [online] [pdf]
[bibtex]
@inproceedings{Souza2015Parallel, author = {Souza, Renan and Silva, Vítor and Oliveira, Daniel and Valduriez, Patrick and Lima, Alexandre A. B. and Mattoso, Marta}, booktitle = {{ACM}/{IEEE} International Conference for High Performance Computing, Networking, Storage, and Analysis ({SC})}, link = {http://sc15.supercomputing.org/sites/all/themes/SC15images/tech_poster/tech_poster_pages/post284.html}, location = {Salt Lake City, {USA}}, pages = {1--3}, pdf = {http://sc15.supercomputing.org/sites/all/themes/SC15images/tech_poster/poster_files/post284s2-file3.pdf}, title = {Parallel Execution of Workflows Driven by a Distributed Database Management System}, year = {2015} }
|
Uma Abordagem para Publicação de Dados de Proveniência de Workflows Científicos na Web Semântica
R. Castro, R. Souza, V. Silva, K. Ocaña, D. Oliveira, and M. Mattoso Brazilian Symposium on Databases (SBBD), 2015.
[C37]
[bibtex]
@inproceedings{castro2015abordagem, author = {Castro, Rachel and Souza, Renan and Silva, Vítor and Ocaña, Kary and Oliveira, Daniel and Mattoso, Marta}, booktitle = {Brazilian Symposium on Databases ({SBBD})}, title = {Uma Abordagem para Publicação de Dados de Proveniência de Workflows Científicos na Web Semântica}, year = {2015} }
|
Applying data warehousing and big data techniques to analyze internet performance
T. Barbosa, R. Souza, S. Cruz, M. Campos, and R. Cottrell International Conference on Internet Applications, Protocols, and Services (NETAPPS), 2015.
[C38] [pdf]
[bibtex]
@inproceedings{barbosa2016applying, author = {Barbosa, TMS and Souza, Renan and Cruz, SMS and Campos, ML and Cottrell, R Les}, booktitle = {International Conference on Internet Applications, Protocols, and Services (NETAPPS)}, pdf = {https://www.slac.stanford.edu/pubs/slacpubs/16250/slac-pub-16464.pdf}, title = {Applying data warehousing and big data techniques to analyze internet performance}, year = {2015} }
|
Linked open data publication strategies: Application in networking performance measurement data
R. Souza, L. Cottrell, B. White, M. Campos, and M. Mattoso ASE BigData/SocialCom/CyberSecurity, Stanford, CA, 2014.
[C39] [pdf]
[bibtex]
@inproceedings{souza2014linked, author = {Souza, Renan and Cottrell, Les and White, Bebo and Campos, Maria L and Mattoso, Marta}, booktitle = {ASE BigData/SocialCom/CyberSecurity, Stanford, CA}, pdf = {https://www.slac.stanford.edu/cgi-bin/getdoc/slac-pub-15950.pdf}, title = {Linked open data publication strategies: Application in networking performance measurement data}, year = {2014} }
|
Last updated on 2024-09-16.