Functions: Theory and Application (in R)

Athit Kao, PhD
UCI Bioinformatics Support Group - April 25, 2018

whoami

  • R enthusiast

  • PhD, Biomedical Sciences, UCI

  • IGB Biomedical Informatics Training program alumnus

  • Focus on proteomics mass spectrometry and bioinformatics

Prerequisites

  • Understand content from previous deck: http://learn.athitkao.com/presentation_functions1.html

  • Basic programming experience necessary

  • R and RStudio installed

  • Consider a simple use case for programming in your field, e.g.:

    • Filter specific data from file
    • Munge proprietary data format into CSV
    • Calculating RPKM for sequencing data

Readability, part II

  • Encapsulation via functions is an aspect of code legibility
  • Others include:
    • Documentation (comments and READMEs)
    • Formatting (indentation and line wrapping)
    • Variable naming (obvious and consistent names)
  • There is a balance appropriate for you/your audience: Figure 10

Generalized Concept of Functions

Figure 2 Figure 4

Function Argument Order

Anonymous Functions (in lapply)

Multiple Arguments (in lapply)

  • First argument always the iterated variable
  • Must explicitly name additional arguments Figure 19

The Assignment Operator

  • Use “<-” or “=”?
  • I use “=” for consistency (with other programming languages) with no issues
  • Consistency even within R
  • Figure 30

Environments and Variable Scoping in R

  • Scoping refers to the visibility of variables in different environments
  • Global: Can be referenced from anywhere
  • Local: Accessible only within its environment
  • http://adv-r.had.co.nz/Environments.html
  • R uses “lexical” (a.k.a. static) scoping rules
  • Figure 31

Microsoft Excel

Pros

  • Quick editing and prototyping
  • WYSIWYG plots
  • Widely known and used program
  • Many similar alternatives (e.g. Sheets, Calc, Numbers, etc.)
  • Programming/automation with Visual Basic

Cons

  • Limit of 1M rows by 16K columns
  • 32bit version has 2GB file limit
  • Manual editing error-prone
  • Complex visualizations not possible
  • Third-party packages close to non-existent (e.g. advanced statistics, machine learning, etc.)
  • Closed source

R Programming Language

Pros

  • Quick prototyping towards minimum viable product
  • Can generate complex interactive visualizations (plots, reports, etc.)
  • No practical limit on data size
  • Easily reproduce results and adapt code to changes
  • You can do almost anything purely in R

Cons

  • Steep learning curve
  • Third-party package developers not forced to follow a set standard
  • Open source

Split-Apply-Combine Paradigm

Figure 0

Multithreading Split-Apply-Combine

Figure 0

Pre-loaded Data Sets in R

Toy Example: Split...

Figure 0

Toy Example: ..Apply and Combine

Figure 0

Summarize for me...

  1. Good and bad of spreadsheet programs?
  2. What is scoping?
  3. What is an anonymous function?
  4. T/F: You must comment all code.
  5. T/F: You can't return a function from a function.
  6. T/F: Split-Apply-Combine is exclusive to R.
  7. #4, FALSE; #5, FALSE; #6, FALSE

Questions?

  • Just try new code and see what happens; this isn't wet lab
  • Google: Don't just search with “R”, use “R language”
  • Stack Overflow: Use tag “[r]” Figure s2

Keep going, you got this!

Figure 0

App: Scoping (cont.)

  • f(  ) was defined in the global environment
  • f(  ) uses global apple and beets variables
  • f(  ) does not use the local apple and beets defined in g(  )
  • http://adv-r.had.co.nz/Environments.html

App: do.call(rbind, list)

  • lapply(  ) applies given function to each list element, iteratively
  • do.call(  ) applies given function to the list as a whole, once Figure 34

App: NIH Proficiency Scale

  • 1. Fundamental Awareness (basic knowledge): Common knowledge/understanding of basic techniques/concepts
  • 2. Novice (limited experience): Expected to need help when performing this skill
  • 3. Intermediate (practical application): Able to successfully complete tasks; expert required occasionally
  • 4. Advanced (applied theory): Able to successfully complete tasks without assistance
  • 5. Expert (recognized authority): Can provide guidance, troubleshooting, and answers related to this skill
  • Source: https://hr.nih.gov/working-nih/competencies/competencies-proficiency-scale