Skip to content
View eggduzao's full-sized avatar

Block or report eggduzao

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
eggduzao/README.md

Typing SVG

    Machine Learning & Data Systems at the intersection of Biology, Medicine, and Software Engineering

    Focused on end-to-end ML systems: from messy data -> reliable models -> long-lived production

    Strong emphasis on reproducibility, auditability, and systems that age well

    Experience spanning ML pipelines, data engineering, cloud/HPC, and bioinformatics

I would like to know more...

Hello, and welcome to my profile. My name is Eduardo β€” grab a cup of coffee and allow me to introduce myself.

I build Machine Learning & Data Systems where Biology, Medicine, and Software Engineering meet
(and occasionally argue β€” that’s fine, I’m a trained diplomat).

My work focuses on end-to-end ML systems. In practice, that means taking messy real-world data, ingesting it (a fancy name for ETL), and serving it hot and fresh for:

  • Large-scale processing β€” things need to flow smoothly.
  • Model training β€” using math to reward or punish learning machines
    (I once broke a gradient in an RNN’s head, but it was already vanishing).
  • Deployment β€” when ideas meet reality and still have to behave.

I care deeply about reliability, clarity of design, and systems that age well β€” like a good Madeira wine.

I have experience designing modular data pipelines, scalable data engineering architectures, cloud-orchestrated systems, and ML workflows using Python-centric stacks and modern deep learning frameworks. This usually translates to:

  • Production ML and data pipelines with strong reproducibility and auditability
  • Scalable processing for high-volume analytical workloads
  • Feature engineering layers serving both training and inference
  • Bioinformatics workflows integrated with HPC and cloud environments

I often work at the boundary between scientific complexity and engineering constraints, translating domains such as
Chromatin Biology, Cancer Immunology, Gene Regulation, Microscopy, Spatial Transcriptomics, and Precision Medicine into clear, testable, and auditable computational systems.

I value clean design, explicit trade-offs, and systems that are understandable by humans β€” not just machines.
Ethics, reproducibility, and long-term sustainability are not optional; they are part of the job.

Currently open to on-site or hybrid roles and long-term projects.
Relocation and onboarding take planning β€” good systems (and good moves) benefit from doing things properly.

Cheers.


    2024 | Senior | Full-time Senior Machine Learning & Bioinformatics Researcher | Germany

    2022 | Industry | Carreer-shift to Industry | Turku Biosciences | Finland

    2020 | Patent | LAG3-Targeting Cancer Therapy | Current owner: Bristol Myers Squibb

    2018 | PhD | Precision Medicine & Machine Learning | Harvard University | Summa Cum Laude

    2015 | PhD | Deep Learning & Bioinformatics | RWTH Aachen University | Summa Cum Laude

I would like to know more...

KEY MILESTONES

  • Machine Learning Engineer

  • Bioinformatics Researcher

  • Diplomat between medicine and AI

    • From personalized training to Platform Courses
    • Mentored 25+ MLOps Engineers
    • Mentored 15+ bioinformatics researchers
  • Currently: Cloud & MLOps

    • Developing efficient cloud-based ecosystems
    • Managing 8 Bioinfo, 12 MLOps personell
    • Filed 2 patents and improved operating margin by ~18%

WRITER AND EDUCATOR


Name: Eduardo G Gusmao
Role: Senior Machine Learning Researcher | Applied Scientist
Contact: eduardo@gusmaolab.org | English, Portuguese
Education: 2 x PhD | Machine Learning & Precision Medicine
Research_Profile: 5+ years experience on Translational & Production-Aware Method Development

Development_Environment:
  Infrastructure: AWS | HPC | GCP
  Languages: Python | SQL | C/C++ | R | Bash/Shell
  MLStack: PyTorch | TensorFlow | JAX | Spark | Grafana
  DataStack: PostgreSQL | MongoDB | Pinecone | REST/GraphQL | Pandas
  SysOps: (Micro)Mamba | Docker | Kubernetes | GH Actions | Prometheus
I would like to know more...
Name: Eduardo G Gusmao
Role: Senior Machine Learning Researcher | Applied Scientist
Contact: eduardo@gusmaolab.org | English, Portuguese, German, Spanish

Education:
  - PhD: 2017 | Harvard University & RWTH Aachen University | Machine Learning & Precision Medicine | Summa Cum Laude
  - BSc_MSc: 2013 | Federal University of PE | Computer Science & Machine Learning | GPA 3.96/4.00
Core_Expertise:
  - "Machine Learning"
  - "Deep Learning"
  - "Statistical Modeling"
  - "Bioinformatics"
Research_Profile:
  - "Method Development"
  - "Translational Modeling"
  - "Production-Aware Research"

Development_Environment:
  Hardware: ["AMD", "ARM", "NVIDIA", "Intel"]
  OS: ["MAC OS X", "Ubuntu", "Debian", "Fedora", "Windows"]
  Infrastructure:
    - "Bare-Metal Servers"
    - "VMs"
    - Cloud_Computing: ["AWS", "GCP", "Azure"]
    - HPC_Paradigm: ["Slurm", "OpenPBS", "MPI"]
    - Infra_as_Code: ["Terraform", "CloudFormation", "Pulumi"]

  Languages:
    - Multi_Paradigm: ["Python", "C/C++", "R", "Bash/Shell", "Julia", "Go", "Rust", "Kotlin"]
    - Web_OO: ["TypeScript", "Java", "Javascript", "C#", "Ruby", "PHP"]
    - Markup: ["YAML", "Quarto", "LaTeX", "HTML/CSS/Markdown"]
    - Declarative: ["SQL", "HCL"]
  Runtimes: ["CPython", "JVM", "Node.js"]

  ML_Stack:
    - Frameworks: ["PyTorch", "JAX", "TensorFlow", "Keras", "Hugging Face", "NLTK", "Scikit-Learn"]
    - Engines: ["Spark", "Ray", "TensorRT"]
    - Models: ["Generative Models", "Variational Inference", "Graph Neural Networks", "Attention Hypergraph"]

  Data_Stack:
    - Databases:
      - SQL: ["PostgreSQL", "MySQL"]
      - NoSQL: ["MongoDB", "Arangodb"]
      - Vector: ["Pinecone", "FAISS"]
    - APIs: ["REST", "GraphQL", "gRPC"]
    - Data_Software: ["Power BI", "Microsoft Suite", "HDF5/Parquet/Zarr"]
    - Data_Tools:
      - Basic: ["Pandas", "NumPy", "Scipy", "Bioconductor", "PySAM"]
      - Big_Data: ["Dask", "Polars"]
      - Specialized: ["PyCaret", "OpenCV"]

  Web_Stack:
    - Frameworks: ["Django", "React", "Next.js", "Express.js"]
    - Dashboards: ["Dash", "Streamlit", "Gradio"]

  Systems:
    - Version_Control: ["Git", "Github"]
    - Packaging: ["pip", "(micro)mamba", "(mini)conda", "poetry", "npm"]
    - Containers: ["Docker", "Singularity", "Podman"]
    - Orchestration: ["Kubernetes", "Helm"]
    - CI_CD: ["GitHub Actions", "Jenkins", "GitLab CI"]
    - Observability: ["Prometheus", "Grafana"]

    Email | eduardo@gusmaolab.org

    LinkedIN | https://www.linkedin.com/eduardogade

    Location | Recife, Brazil | Freising, Germany | Remote-friendly

    Status | Open to senior ML / Bioinformatics roles

I would like to know more...

Professional Profiles

    Website & Blog: https://www.gusmaolab.org

    One-Page Resume: https://www.gusmaolab.org/Gusmao-EG-CV.pdf

    Stack Overflow: https://stackoverflow.com/users/32223943/eduardo-gusmao

    Medium: https://medium.com/@eduardogade

    Dev.to: https://dev.to/eduardogade

    ORCiD: https://orcid.org/my-orcid?orcid=0000-0001-7461-1443

    ResearchGate: https://www.researchgate.net/home

    Google Scholar: https://scholar.google.com/citations?user=erHz7L8AAAAJ&hl=en

Practical notes

    Preferred contact: Email

    Response time: 1–2 business days

    Open to remote, hybrid, or relocation

Details

    See [availability & engagement details](#availability)

    See [writting & communication details](#communication)

    See [education](#education) & [leadership details](#career)


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

apple
Apple
python
Python
pytorch
PyTorch
gatk
GATK
git
Git
snakemake
Snakemake
gradio
Gradio
docker
Docker
aws
AWS
jira
Jira
linux
Linux
r
R
tensorflow
TensorFlow
bioconductor
Bioconduct
github
GitHub
nextflow
Nextflow
fastapi
FastAPI
kubernetes
K8s
terraform
Terraform
grafana
Grafana
vscode
VsCode
bash
Bash
jax
JAX
ruff
Ruff
githubactions
GActions
Mamba/Conda
Mamba
postgresql
Postgres
redis
Redis
databricks.svg
DtBricks
prometheus
Prometheus
I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.

Flagship: πŸ³οΈβ€βš§οΈ | πŸ³οΈβ€πŸŒˆ | πŸ‡ΊπŸ‡³


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


Placeholder.

I would like to know more...

Placeholder.


πŸš€ "If you ever change your mind about leaving it all behind, remember. Remember. No Geography." πŸš€

Designed & Built β€’ Eduardo Gusmao β€’ 2025

Popular repositories Loading

  1. eggduzao.github.io eggduzao.github.io Public

    Host Code of Eduardo G Gusmao's Lab Website Portfolio and Blogging

    CSS 1

  2. gusmaolab gusmaolab Public

    Host Code of Eduardo G Gusmao's Lab Website Portfolio and Blogging

    HTML 1 1

  3. Apollo Apollo Public

    A unified suite of post-hoc statistical procedures with bias-aware corrections designed for metrics common in computational and ML/DL pipelines

    Python

  4. Uqbar Uqbar Public

    Uqbar (Ubiquitously Broad Automation and Arquitecture) is a collection of multiple useful tools for small task automation.

    Python

  5. eggduzao eggduzao Public

    Personal customization repository and personal cookiecutter

    Python