Summary
Tech lead, sr. software engineer, and researcher of intelligent data and AI platforms to accelerate discovery. With 15+ years at IBM, ORNL, SLAC, and UFRJ, I translate domain expertise into scalable, production-grade systems spanning edge, cloud, and leadership-class supercomputers. My work centers on highly scalable, low-latency, observable, provenance- and metadata-first architectures that integrate heterogeneous data systems to enable reliable, reproducible, and explainable large-scale agentic and AI workflows.
Areas of Expertise
- AI/ML, LLM-driven, and Agentic workflows
- Edge-Cloud-HPC Computing
- Provenance-driven data analysis, lineage, and observability
- Scalable data engineering (SQL, NoSQL, KGs, Streaming, Parallel File Systems)
Education
Federal University of Rio de Janeiro, Brazil
Ph.D. in Computer Science | Sep 2015 — Dec 2019
M.Sc. in Computer Science | Jan 2013 — Jul 2015
B.Sc. in Computer Science | Jan 2009 — Dec 2012
Experience
Oak Ridge National Laboratory Oct 2022 — Present
Staff Scientist & Sr. Software Engineer, HPC Workflows, Data & AI | Knoxville, USA
-
Led R&D on workflow provenance and observability for AI-driven science, focusing on transparency, reliability, and reproducibility in end-to-end workflows.
-
Developed provenance models and system mechanisms to connect user intent, agent decisions, workflow executions, and downstream results in unified traces.
-
Validated methods through real deployments spanning Edge-Cloud-HPC environments and interactive workflows that require low-latency, auditable responses.
-
Published and presented results in workflow and eScience venues, and drove community engagement through tutorials and reference architectures.
IBM Research Apr 2015 — Oct 2022
Staff Scientist & Sr. Software Engineer, Cloud, Data & AI | Rio de Janeiro, Brazil
-
Conducted applied R&D in data management and AI systems, producing peer-reviewed outputs and patented innovations.
-
Explored scalable data services and governance-aware approaches that support AI lifecycle requirements in enterprise contexts.
-
Collaborated internationally across research and engineering teams to validate ideas in real deployments and user scenarios.
SLAC National Accelerator Laboratory, Stanford University May 2013 — Dec 2014
Research Software Engineering Intern | Menlo Park, USA
- Applied semantic web and scalable data management methods to publish structured measurement data for broad community use.
Federal University of Rio de Janeiro Jan 2010 — Sep 2014
Software Engineer (Intern → Engineer) | Rio de Janeiro, Brazil
-
Built applied semantic web and linked data solutions, grounding research ideas in real systems and user needs.
-
Early applied work on integrating heterogeneous data sources for analytics and reporting.
Petrobras May 2007 — May 2008
IT Intern | Rio de Janeiro, Brazil
- Early industry experience in software delivery and operations.
Selected on-going Projects
American Science Cloud (AmSC)
A secure, federated cloud environment integrating DOE facilities and data to enable AI-ready datasets and scalable model services. I develop multi-agent methods and implementations within ORNL use cases for cross-facility science workflows.
Orchestrated Platform for Autonomous Laboratories (OPAL)
A DOE multi-lab initiative to make biological discovery self-driving using AI, robotics, and automated experimentation. I lead agentic AI systems that connect interactive intent to Frontier-scale execution and multimodal analysis.
Advanced Manufacturing into Leadership-class Supercomputers via AI Agents
A modular architecture for autonomous cross-facility experimentation using AI agents, programmable facility APIs, and provenance-aware workflows. I led the multi-agent communication and workflow steering components.
Flowcept
A provenance system for traceable, auditable, and reproducible agentic workflows that unifies runtime signals, lineage, and agent interactions. I created and lead Flowcept, and we are advancing agentic provenance capabilities.
Technical Knowledge
-
Programming Languages: Python, Java, C, C++, C#, Shell, NodeJS, Scala, Lua
-
Data Science/ML: PyTorch, MLFlow, Airflow, Pandas, Polars, Jupyter, Matplotlib, Plotly
-
Agentic AI: MCP, CrewAI, LangChain, Streamlit, Chainlit, RAG, LLM-based orchestration
-
Big Data & Streaming Platforms: Apache Spark, Dask, Kafka, Redis, RabbitMQ
-
Parallel & Distributed Programming: MPI, OpenMP, CUDA, PubSub
-
Cloud, HPC, DevOps: Kubernetes, OpenShift, Docker, Slurm, LSF, PBS; CI/CD GitHub Actions, Jenkins, Travis; Grafana
-
Databases & Knowledge Graphs: PostgreSQL/PostGIS, MySQL, MongoDB, Elasticsearch, HBase, Hive, Redis, LMDB, Polystores; AllegroGraph, Jena, Virtuoso, RDF, SPARQL, OWL
Publications and Events
For all publications and patents, visit renansouza.org/publications.
For all events participation, visit renansouza.org/events.