Applied Scientist & Sr. Software Engineer | Data & AI Platforms for Edge-Cloud-HPC

Summary

Tech lead, senior software engineer, and research scientist of intelligent data and AI platforms to accelerate scientific discovery. With 15+ years at IBM, ORNL, SLAC National Accelerator Laboratory, and the Federal University of Rio de Janeiro (UFRJ), I foster user-centric system design by keeping experts in the development loop, rapidly translating abstract requirements from domains such as Energy, Chemistry, Biology, and Climate into production-grade systems that are easier to operate, maintain, and scale. My research focuses on highly scalable, low-latency, observable, provenance- and metadata-first architectures that facilitate comprehensive data analysis across heterogeneous infrastructure, bridging edge instruments, cloud clusters, and leadership-class supercomputers, as well as data integration across SQL and NoSQL databases, knowledge graphs, messaging, streaming systems, and parallel file systems. My current focus includes AI and machine learning, LLM-driven workflows, and agentic systems. I authored 50+ papers, received best thesis and paper awards, held 10+ United States Patent and Trademark Office (USPTO) patents, and reviewed for major venues including IEEE Transactions on Parallel and Distributed Systems (TPDS), IEEE Big Data, IEEE eScience, Future Generation Computer Systems (FGCS), the Very Large Databases (VLDB) Journal, and ACM/IEEE Supercomputing.

Areas of Expertise

  • AI/ML, LLM-driven, and Agentic workflows
  • Edge-Cloud-HPC computing
  • Provenance-driven data analysis, lineage, and observability
  • Scalable data engineering (SQL, NoSQL, KGs, Streaming, Parallel Data Processing)

Education

Federal University of Rio de Janeiro, Brazil

    Ph.D. in Computer Science | Sep 2015 — Dec 2019

        Thesis: Supporting User Steering in Large-scale Workflows with Provenance Data

    M.Sc. in Computer Science | Jan 2013 — Jul 2015

        Thesis: Controlling the Parallel Execution of Workflows Relying on a Distributed Database

    B.Sc. in Computer Science | Jan 2009 — Dec 2012

        Thesis: Linked Open Data Publication Strategies: An Application in Network Performance Data

    International experience:

        Visiting Ph.D. Student - Inria/Univ. Montpellier, France

        Computer Science exchange student - Missouri State University, U.S.

Experience

Oak Ridge National Laboratory Oct 2022 — Present

Staff Scientist & Sr. Software Engineer, HPC Workflows, Data & AI | Knoxville, USA

  • Leading R&D on workflow provenance and observability for AI-driven science, focusing on transparency, reliability, and reproducibility in end-to-end workflows.

  • Designing and developing provenance models and open source systems (e.g., Flowcept) to connect user intent, agent decisions, workflow executions, and downstream results in unified traces.

  • Validated and applied these methods through high-profile projects in additive manufacturing, electron microscopy, and advanced biological analysis across Edge-Cloud-HPC environments.

  • Published and presented results in HPC and eScience venues, and drove community engagement through tutorials and reference architectures.

IBM Research Apr 2015 — Oct 2022

Staff Scientist & Sr. Software Engineer, Cloud, Data & AI | Rio de Janeiro, Brazil

  • Led applied R&D on hybrid cloud and HPC data platforms for AI systems, advancing scalable architectures on Kubernetes and OpenShift for distributed, enterprise-grade workloads.

  • Developed and validated knowledge graph-centric approaches for large-scale data integration, lineage, and governance across heterogeneous and distributed data stores and AI pipelines.

  • Partnered closely with internal global teams and major external clients, particularly in the Energy sector, to translate research into deployable systems adopted in production.

  • Produced sustained research impact through peer-reviewed publications and 10+ USPTO patents spanning provenance, polystores, AI lifecycle management, and hybrid cloud systems.

SLAC National Accelerator Laboratory May 2013 — Dec 2014

Research Software Engineering Intern | Menlo Park, USA

  • Applied semantic web and scalable data management methods to publish structured measurement data for broad community use.

Federal University of Rio de Janeiro Jan 2010 — Sep 2014

Software Engineer (Intern → Engineer) | Rio de Janeiro, Brazil

  • Led applied research on semantic web and linked open data systems, translating ontology-based models into production platforms for public-sector information access in user-facing systems.

  • Developed data warehousing approaches for integrating structured and unstructured data to support big data analytics, reporting, and information discovery across heterogeneous sources.

Petrobras May 2007 — May 2008

IT Intern | Rio de Janeiro, Brazil

  • Early industry experience in software development and user support.

Technical Knowledge

  • Programming Languages: Python, Java, C, C++, C#, Shell, NodeJS, Scala, Lua

  • Data Science & ML: PyTorch, MLFlow, Airflow, Pandas, Polars, Jupyter, Matplotlib, Plotly

  • Agentic AI: MCP, LangChain, CrewAI, Streamlit, Chainlit, RAG, LLM-based orchestration

  • Big Data, Streaming, and Messaging: Spark, Dask, Parsl, Kafka, Redis, RabbitMQ

  • Databases & Data Lakes: PostgreSQL, MySQL Cluster, MongoDB, Elasticsearch, HBase, Hive, Redis, LMDB; Object Storages, Polystores, Data lakes, Data warehouses, and Data Lakehouses

  • Knowledge Graphs: AllegroGraph, Jena, Virtuoso, RDF, SPARQL, OWL

  • Parallel & Distributed Programming: MPI, OpenMP, CUDA, PubSub

  • Cloud, HPC, DevOps: Kubernetes, OpenShift; Slurm, LSF; Nvidia/AMD GPU Profiling; Prometheus, Grafana

Selected Projects

Orchestrated Platform for Autonomous Laboratories (OPAL)

OPAL (FAMOUS) advances autonomous science across multiple laboratories using AI agents, robotics, and automation, enabling HPC-scale, human-in-the-loop discovery workflows.

American Science Cloud (AmSC)

AmSC is a core pillar of the DOE Genesis Mission, delivering a secure, federated platform for AI-driven science across national laboratories. At ORNL, work within the Intelligent Interface team focuses on shaping agentic AI workflows that integrate data, compute, and facilities for scalable, reusable, mission-aligned discovery.

Advanced Manufacturing into Leadership-class Supercomputers via AI Agents

A core AmSC use case demonstrating agentic AI integration between advanced manufacturing facilities and leadership-class supercomputers. I provide technical leadership in defining the end-to-end architecture and translating the scientific vision into an operational platform, including multi-agent communication, provenance-aware infrastructure, and dynamic steering across facilities.

Flowcept

Flowcept is a provenance platform that captures runtime data with low overhead and links tasks, lineage, telemetry, and AI-agent interactions into end-to-end traces for accountability and reproducibility. I created and lead the platform, which underpins multiple DOE initiatives, such as OPAL (BER/ASCR) and broader Autonomous Science (ASCR-ACT), and research work on provenance for agentic workflows.

ProvLake

ProvLake is a knowledge-graph-driven data lineage and management platform for hybrid data lakes spanning SQL databases, NoSQL stores, cloud object storage, and HPC file systems. It captures and integrates fine-grained data relationships across distributed workflows and services into a unified, semantic-rich provenance knowledge graph, enabling cross-store querying, explainability, and governance of complex AI and scientific pipelines. I created and led ProvLake as a foundational platform adopted across multiple IBM Research programs, supporting large-scale industry and internally funded research initiatives.

Selected Programmatic Leadership, Contributions, and Artifacts

OPAL - Orchestrated Platform for Autonomous Laboratories (FAMOUS) 2025 — Present

  • Role: Technical leadership for agentic AI and cross-facility workflow architecture

  • Scope and contributions: Technical leader designing, implementing, and deploying the Agentic AI platform, translating expert intent into interactive HPC execution and multimodal analytics, using Flowcept for agentic provenance while integrating ORNL’s APPL experimental workflows with OLCF Frontier in collaboration with biologists, AI researchers, and software engineers.

  • Validation and impact: Demonstrated and reviewed with ORNL and ANL senior leadership in biological systems; endorsed by DOE leadership; publicly highlighted by the DOE Undersecretary; used within OPAL and informing Genesis-aligned autonomous laboratory efforts.

American Science Cloud (AmSC) 2025 — Present

  • Role: Technical contributor shaping agentic AI workflow frameworks

  • Scope and contributions: Contributing to the early design of agentic AI workflow patterns to integrate data, compute, facilities, and AI agents across national laboratories, working with multidisciplinary teams of domain scientists, computational scientists, and engineers as the platform evolves.

  • Validation and impact: Contributions inform ongoing architectural discussions within AmSC and related Genesis-aligned efforts (e.g., ModCon, FAMOUS).

Additive Manufacturing Agentic Workflow - MDF / AmSC Use Case 2025 — Present

  • Role: Technical leadership for agentic AI and cross-facility workflow architecture

  • Scope and contributions: Acted as technical leader driving architectural convergence and resolving system design deadlocks within a multidisciplinary team of 10+ contributors, delivering an agentic cross-facility workflow connecting advanced manufacturing at MDF with the OLCF ACE Testbed. Established Flowcept-enabled provenance for agentic safety, transparency, and end-to-end workflow traceability, enabling agent-to-agent and agent-to-compute coordination with dynamic steering.

  • Validation and impact: Established as a reusable architectural reference within AmSC and adopted as a core use case, positioning ORNL as a leader across DOE labs in cross-facility agentic workflows. Recognized by senior DOE leadership as a breakthrough example of AI agents leveraging leadership-class computing, with early publications at Supercomputing workshops and eScience. Supported AmSC infrastructure deployment and DOE booth demonstrations at Supercomputing’25, ensuring successful presentation to external audiences.

ModCon - Transformational AI Models Consortium 2025 — Present

  • Role: Technical collaborator supporting Genesis-aligned platform direction

  • Scope and contributions: Collaborating with assigned ModCon team members by sharing architectural patterns, lessons learned, and implementation guidance from agentic, cross-facility workflow and provenance work, contributing to early design discussions within multidisciplinary teams.

  • Validation and impact: Contributions inform ongoing technical discussions with senior ANL researchers and early prototypes.

INTERSECT LDRD: Multi-workflow Orchestration and Integrated Data Analysis Across Facilities 2024 — 2025

  • Role: Principal Investigator

  • Scope and contributions: Led a program on multi-workflow orchestration and data analysis across facilities, coordinating teams of 10+ scientists, computer scientists, and engineers across NCCS and CSM divisions to orchestrate AI workflows between ORNL MDF to OLCF ACE Testbed and ORNL CNMS to OLCF Summit; and established Flowcept as a provenance foundation through partnerships across ORNL and ANL.

  • Validation and impact: Managed $1M+ in funding; delivered cross-facility workflow demonstrations presented to senior DOE labs personnel, published in peer-reviewed venues; produced reusable orchestration and provenance artifacts (open source software and system architectures) that informed subsequent DOE efforts.

IBM Research - Knowledge-Centric Systems (Internal Strategic Programs) 2020 — 2022

  • Role: Technical leadership for platform architecture, integration, and reuse

  • Scope and contributions: Led platform integration for the AI Workbench, enabling knowledge-centric scientific discovery across global IBM Research teams. Led provenance and knowledge management for context-aware platform reconfiguration, supporting hybrid cloud-HPC execution in interactive notebooks, and established ProvLake as a core shared platform adopted across IBM Research labs.

  • Validation and impact: Peer-reviewed publications, executive-level demonstrations, cross-lab adoption, and a portfolio of patents supporting hybrid cloud-HPC platforms.

IBM Research - Oil & Gas AI Client Programs (Galp, Shell, ExxonMobil) 2018 — 2020

  • Role: Technical leadership for data lakes and provenance platforms

  • Scope and contributions: Led the design and delivery of reusable data lake and provenance platforms supporting multiple concurrent AI programs for subsurface exploration, coordinating with 20+ researchers and engineers across IBM Research and industry-funded engagements. This work initiated the ProvLake research and development effort, which later evolved into a shared provenance platform adopted across programs and IBM Research labs worldwide.

  • Validation and impact: Multi-year client adoption, executive-level demonstrations, media coverage, productization of platform components, peer-reviewed publications in scientific and industrial venues, and granted patents.

IBM Research - Conversational AI Platform (Pre-LLM) 2016 — 2018

  • Role: Technical leadership for scalability, deployment, and operations

  • Scope and contributions: Led architectural design, DevOps, and scalability for a conversational AI platform coordinating user interaction with multiple bots in a same chat, supporting multiple concurrent users accessing the platform, coordinating engineers and researchers to achieve high availability and efficiency on cloud infrastructure.

  • Validation and impact: Industry client demonstrations, peer-reviewed publications in AI venues, a highly cited (100+) patent, and successful scaling to thousands of concurrent users.

Selected Institutional Recognition

  • ORNL performance award for top performers (2025).

  • IBM Patent Plateaus for high-impact software innovations (8+ USPTO patents) (2020, 2021).

Selected Academic Recognition

  • 2025, Distinguished Paper Award, WORKS @ IEEE/ACM Supercomputing — The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous Science

  • 2021, Runner-up (2nd Place) - Best Ph.D. Thesis — User Steering Support in Large-scale Workflows

  • 2017, Honorable Mention - Best Paper — Spark Scalability Analysis in a Scientific Workflow

  • 2015, Best M.Sc. Thesis Award — Parallel Execution of Workflows driven by Distributed Database Techniques

Media Highlights

Galp-IBM AI Platform for Seismic Interpretation link

Led workflows management for AI and knowledge-engineering for an AI-assisted seismic interpretation platform developed at IBM Research and deployed in production with Galp. The system integrated machine learning, visual analytics, and domain knowledge to accelerate geological decision-making on large-scale seismic data.

AI4Seismic: AI Platform for Geological Discovery link

Highlighted by SIAM News, this presentation introduced AI4Seismic, an end-to-end AI platform for seismic analysis that captures expert geological knowledge and integrates ML, provenance, and reproducible workflows to accelerate energy-critical geological discovery.

OPAL: AI-Assisted Biological Discovery with Frontier link1 link2 link3

ORNL Communications highlighted the OPAL project’s Agentic AI platform as an early, high-impact result of integrating laboratory instruments with the Frontier exascale supercomputer to enable autonomous, human-in-the-loop biological discovery under the DOE Genesis Mission.

DOE Genesis Mission Platform Demonstration link

Work referenced in U.S. Congressional testimony by DOE undersecretary for science highlighting early Genesis Mission milestones, including AI-enabled workflows that autonomously coordinate experiments, HPC execution, and analysis across national laboratories.

Grants and Fellowships

CAPES International Science Grant (2012-2013)

Competitive national fellowship supporting international research exchange, enabling academic placement at Missouri State University and research internship at SLAC National Accelerator Laboratory.

CAPES Master’s Scholarship (2013-2015)

Nationally funded graduate research scholarship.

Scientific Community Service

Chair and Editor

  • Frontiers in High Performance Computing - Editorial Board

  • IEEE International Conference on e-Science (eScience’23) - Session Chair

  • Brazilian Symposium on Databases (SBBD’20) - Session Chair

Technical Program Committee

  • International Workshop on AI Principles in Science Communication (AISC’25)

  • Int. Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’25,26)

  • Workflows in Distributed Environments (WiDE’24)

  • IEEE/ACM Supercomputing (SC’24)

  • IEEE International Conference on e-Science (eScience’23)

  • Workflows in Support of Large-Scale Science (WORKS at IEEE/ACM Supercomputing’20, 21, 23, 24, 25)

  • Brazilian Workshop on Database and Artificial Intelligence Integration

  • Brazilian Symposium on Databases (SBBD’20, 23, 24, 25, 26)

  • Brazilian e-Science (BreSci’26)

  • Innovation Summit on Information Systems (at SBSI’19,20)

Journal Reviewer

  • IEEE Transactions on Parallel and Distributed Systems

  • Future Generation Computer Systems

  • Concurrency Computation Practice and Experience

  • Journal of Parallel and Distributed Computing

  • The Very Large Databases (VLDB) Journal

  • IEEE Transactions on Big Data

  • Journal of Cloud Computing

  • Computer Physics Communications

  • Discover Data

  • Frontiers in High Performance Computing

Teaching and Supervisions

Courses

  • Databases Laboratory (UFRJ), 2017. Teacher assistant to Prof. Marta Mattoso

  • Semantic Web (UFRJ), 2013. Teacher assistant to Prof. Maria Luiza Machado Campos

  • Logics for Computer Science (UFRJ), 2012-2013. Teacher assistant to Prof. Mario Benevides

  • Metadata Management (UFRJ), 2011. Teacher assistant to Prof. Adriana Vivacqua

Supervisions

  • Pedro Paiva Miranda: A Mechanism for Fault Tolerance in Parallel Executions of Workflows supported by a Database. undergraduate (UFRJ), 2015.

  • Rachel de Castro: Publication of Workflow Provenance Data in the Semantic Web. undergraduate (UFRJ), 2015.

Badges and Certifications

Languages

  • English: Full professional proficiency

  • Portuguese: Native

  • Spanish: Reading fluent; speaking/listening intermediate

Publications and Events

For all publications and patents, visit renansouza.org/publications.

For talks and presentations, visit renansouza.org/events.