Learning Resources

Legend:

- Programming Language

- Biostatistics

- Computational Modeling

- Bioinformatics

- Machine Learning

Einstein Courses

  1. Introduction to Systems Biology: An introduction to research areas in systems and computational biology; covering the basic concepts and detailed methodologies in these areas and how to use these methods to design experiments in systems and computational biology. Teaches basic Python and R programming skills with focus on computational modeling algorithms and applications.
  2. Computational Biology of Proteins: A systematic introduction to protein bioinformatics: the major techniques, algorithms and tools used for sequence alignments, classifications, secondary and tertiary structure predictions, modeling, sampling of conformations, energy functions, prediction of various functional and structural features of proteins, docking, etc… Also covers an introduction to Python programming with practical applications in bioinformatics.
  3. Quantitative Skills for the Biomedical Researcher (QSBR I, II, III, IV): Basic concepts of statistical inference, regression analysis, big data analysis, and statistical genetics. R is used throughout the courses for statistical analysis.
    • QSBR I-II: Fundamental concepts of biostatistics, applications of basic methods, statistical inference; management of R workspace; how to access R packages; R functions for statistical data analysis with examples; how to write R programs for simulations; basic R graphics.
    • QSBR III-IV: Basic computational skills and knowledge of R are pre-requisites for both courses. Several bioconductor packages are used for analyzing genomics data, and for genetic epidemiology studies. Students will also learn how to efficiently manipulate large data files
  4. Modern Artificial Intelligence for Biomedical Research: Exposure to modern methods in Artificial Intelligence (AI), particularly focusing on AI for biomedical research, through critical reading and discussion of selected papers, and associated coding tutorials.
  5. Introduction to the Mathematics of Theoretical Systems Biology: An introduction to the mathematical topics necessary to conduct theoretical and computational systems biology. Topics covered are stochastic processes, dynamical systems, modeling using ODEs and PDEs, and the basics of machine learning.
  6. Data Analysis (in Neurosciences): Targeted to the needs of Neuroscience graduate students, this course compliments and expands on existing mathematical-based instruction with practical, “plain English” explanations to provide the skills requisite to applying and interpreting statistical concepts appropriately,
  7. Systems Biology Seminar: It has long been recognized that scientific breakthroughs and groundbreaking research in the coming century requires multidisciplinary approaches to many areas of research. By means of critical reading of classical and contemporary articles the course will cover a broad range of relevant techniques from mathematical, statistical and computational sciences, and their relations to the specific scientific questions in each of the articles discussed. The course will cover 26 articles on biological questions that have been addressed both theoretically and experimentally. These articles will cover a broad range of biological topics from molecular biology, evolutionary biology, geonomics and neuroscience.

A more detailed description of Einstein courses and schedules can be found here.

Reading Groups

  1. Recent Advances in Machine Learning Reading Group: This monthly reading group focuses on discussion of recent publications and techniques in machine learning. The reading group offers the opportunity to discover new applications of machine learning and learn the techniques that make such advances possible. Some mathematical and computational details are covered, in addition to higher-level conceptual issues. The reading group meetings typically lasts 1 to 1.5 hours, with a slide presentation, questions, and discussions. Everyone is welcome to attend and join the interactive discussion. Since 2017 when the meeting started, students, postdocs, and faculty from 11 departments across Einstein have attended it. Check the webpage for the schedule and for information about signing up to present at a meeting.
  2. Recent Advances in Microbiome Science Reading Group: This monthly reading group covers topics related to the microbiome including RNA-sequencing methods, data visualization techniques, translational applications of basic microbiome research, and novel approaches for microbiome analysis and study design. Discussions cover mathematical and computational details, but the major goals of this reading group are to 1) discuss higher-level conceptual issues; and 2) bring together Einstein researchers who are either interested in or currently working on microbiome-related projects. The reading group meetings typically lasts 1 to 1.5 hours, with a slide presentation, questions, and discussions. Everyone is welcome to attend and join the interactive discussion. Check the webpage for the schedule and for information about signing up to present at a meeting.

Online Resources

Beginner R Online Resources:

  1. DataCamp; Module: “R Intro to Basics:”: Teaches basic R programming language via DataCamp website interface; best for the most novice coders (Intro to Basics module is free; other modules are $25/month) (link to course here).
  2. Swirl; Modules: “R Programming” and “R Programming Environment”: Teaches basic to advanced R programming skills; interactive learning with integrated Coursera materials (free). This course is perhaps the best one-stop shop for learning R. The course teaches you how to download R and RStudio onto your own computer. The two beginner modules are “R Programing” and “The R Programming Environment”, while the more advanced modules are listed below under the “Specialized R Online Resources” section (link to course here).
  3. Harvard EdX; “Data Science: R Basics”: Course teaches basic R programming skills on our own computer; most complete and extensive course for introduction to R; uses mostly learn by watching method (free audit; $219 for verified track as of Jan 2024). This course provides a very thorough learning method for R, so taking the entire course would require investment of significant amount of time. The course has non-interactive videos, readings, and tutorials with periodic assessments to track your progress. If you select to “Audit” the course then it is free and you can download the free textbook. The PDF textbook can be used as a good reference resource if you get stuck during any other programming courses. This course uses a “learn by watching” method, but is perhaps the most complete and extensive learning modality for an introduction to R. The course “Data Science: R Basics” covers four sections: 1) R Basics, Functions, and Data Types; 2) Vectors, Sorting; 3) Indexing, Data Wrangling, Plots; 4) Programming Basics (link to course here).

Beginner Python Online Resources:

  1. DataCamp; Module: “Introduction to Python: Python Basics”: Course teaches basic Python programming skills via DataCamp website interface; best for the most novice coders (Basics module is free; rest of modules are $25/month as of Jan 2024). This course teaches Python syntax and basic data types using DataCamp’s interactive Python terminals, so it will not involve installing Python on the student’s computer. The easy interface is interactive and easy to follow, making it a great course for people completely new to programming languages (link to course here).
  2. Udemy; “Learn Python 3 From Scratch | Python for Absolute Beginners”: Course teaches basic Python programming skills on our own computer; uses mostly learn by watching method with some exercises (free video content)

    This course teaches Python3 for absolute beginners. This course utilizes Python on your computer, and covers the installation of Python and Pycharm. (link to course here)
  3. Coursera; “Python for Genomic Data Science” (Part of Coursera Plus, $59 per month as of Jan 2024): Course teaches basic to intermediate Python programming skills with specific focus on and examples from genomics This course is best suited for people that already have some basic computer knowledge, like being able to open and use a terminal window. It does not cover step-by-step instructions for downloading and installing Python3 on your personal computer. The course is lecture based and not interactive. This is good starting point if you already have some basic programming language skills, but wish to learn Python (link to course here).
  4. The Hitchhiker’s Guide to Python! (https://docs.python-guide.org) This online guide provides a best practice handbook for the installation, configuration, and usage of Python on a daily basis.
  5. Python Computing for Data Science (https://github.com/profjsb/python-seminar/tree/master) An interactive Python Course through Jupyter notebooks that is part of a graduate seminar course at UC Berkeley.
  6. Getting started with Python (https://github.com/microsoft/c9-python-getting-started) Github repository developed by Microsoft bringing together 3 courses (including videos and exercises) to help get you up to speed on Python. This is ideal for complete beginners. These three courses are intended to show you the foundations necessary to walk through a tutorial or book.
  7. Python Data Science Handbook (https://jakevdp.github.io/PythonDataScienceHandbook/) Online book written by Jake VamderPlas, Director of Open Software at the University of Washington’s eScience Institute. Great introductory book to popular packages for data analysis using Python.

Specialized R Online Resources:

  1. HarvardX Data Analysis for Life Sciences Professional Certificate; Modules: “Statistics and R”, “Intro to Linear Models & Matrix Algebra”, “Statistical Inference & Modeling for High-throughput Experiments”, and “High Dimensional Data Analysis”: Course teaches advanced R programming skills and R tools for research applications (~$800 whole program as of Jan 2024) (link to course here)
  2. HarvardX Data Analysis for Genomics Professional Certificate; Modules: “Introduction to Bioconductor”, “Case Studies in Functional Genomics”, “Advanced Bioconductor”: Course teaches advanced R programming skills, R tools for genomics, and how to apply the tools in practical settings (~$600 whole program as of Jan 2024). (link to course here)
  3. Swirl; Modules: “Getting and Cleaning Data” and “Statistical Inference”: Course teaches basic to advanced R programming skills; interactive learning with integrated Coursera materials (free). (link to course here)
  4. Orchestrating Single-Cell Analysis with Bioconductor (https://bioconductor.org/books/3.18/OSCA/). This book teaches users some common workflows in R for the analysis of single-cell RNA-seq data (scRNA-seq). This book will show you how to make use of cutting-edge Bioconductor tools to process, analyze, visualize, and explore scRNA-seq data.
  5. R for Mass Spectrometry (https://rformassspectrometry.github.io/book/). The aim of this book is to provide efficient, thoroughly documented, tested and flexible R software for the analysis and interpretation of high throughput mass spectrometry assays, including proteomics and metabolomics experiments.

Specialized Python Online Resources:

  1. Harvard EdX; “Using Python for Research”: Course teaches review of Python basics, advanced Python programming skills, Python tools for research applications, and how to apply the tools in practical setting (free audit; $249 for verified track as of Jan 2024). (link to course here)
  2. Coursera; “Biology Meets Programming: Bioinformatics for Beginners” (part of Coursea Plus $59/month): Course teaches basic bioinformatics concepts and algorithms using Python (free; $49 for certificate) This course introduces students to key algorithms used in biomedical research in the context of real-life experiments and situations. An online, interactive textbook with discussion boards introduces each problem, relevant background, and programming steps. Should a student need help with code for a certain exercise, the course also integrates activities from Codecademy’s Learn Python module (link to course here).

Additional Resources

  1. YouTube Tutorials for Specific Tasks in R: many short tutorials for specific tasks in R Link to page here
  2. Stack Overflow: general programming help in R and Python Link to page here
  3. Cross Validated: biostatistics help
  4. Bioconductor: package help for Bioconductor
  5. BERD House: (BERD House): On-line biostatistics resources and tools developed by Einstein Division of Biostatistics.
  6. MATLAB programming (MATLAB Academy): MATLAB self-paced, on-line courses.
  7. Prism statistical software (Prism Academy - GraphPad) Online training center designed to help you master the fundamentals of Prism and key statistical concepts: t tests, ANOVA, linear regression, and more.
  8. Kaggle (https://www.kaggle.com) An online community of data scientists and machine learning engineers to share and stay up-to-date on the latest ML techniques and technologies. Kaggle allows users to find datasets they want to use in building AI models, publish datasets, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
  9. Modern Statistics for Modern Biology (https://www.huber.embl.de/msmb/) The aim of this book is to enable scientists working in biological research to quickly learn many of the important ideas and methods in R that they need to make the best of their experiments and of other available data.
  10. Nature Methods: releases a 1-2 page article almost every month called Points of Significance that offers a high-level overview of a statistical method geared towards biologists.