Data Visualization: Theory and Application (in R)

Athit Kao, PhD
UCI Bioinformatics Support Group - May 23, 2018

Distracted Boyfriend

Understand content from previous deck: http://learn.athitkao.com/presentation_datavis1.html
Basic programming experience necessary
R and RStudio installed (for 2nd half)
Consider difficulties encountered using spreadsheet programs, e.g.
- Generating a scatter plot with tens of thousands of datapoints
- Replicating a specific figure formatting style for new data
- Documenting changes made to source data of figure

Theory

Figure 20a

Figure 20c

Figure 18a

Figure 18b

Fixed visualizations types
Good luck plotting hundreds to thousands of megabytes of data (Excel has a limit of 32K datapoints/series)
E.g. Scatterplot with ~100K datapoints (3 x 32K = 3.7 MB)

E.g. Scatterplot with 1 million datapoints (3 x 333K = 40 MB, didn't attempt with Excel) $Figure 23a$

Three main systems with new ones always being developed
base: Original system for R
- Build up from a blank canvas
- Used in the previous presentation
lattice:
- Visualizations made with single function call
- Not covered here
ggplot2: install.packages(“ggplot2”)
- Hybrid between base and lattice, highly customizable
- Covered in this presentation

Figure F3

$Figure S6$

1. Fundamental Awareness (basic knowledge): Common knowledge/understanding of basic techniques/concepts
2. Novice (limited experience): Expected to need help when performing this skill
3. Intermediate (practical application): Able to successfully complete tasks; expert required occasionally
4. Advanced (applied theory): Able to successfully complete tasks without assistance
5. Expert (recognized authority): Can provide guidance, troubleshooting, and answers related to this skill
Source: https://hr.nih.gov/working-nih/competencies/competencies-proficiency-scale