What you’ll get: a complete, easy-to-scan reference that maps data types to chart choices—with rules of thumb, pitfalls, and quick checklists you can apply immediately.
Executive Summary
- Data type → chart choice: continuous vs discrete drives everything.
- Univariate vs bivariate vs multivariate: distributions → relationships → profiles/panels.
- Standardize scales & parameters (bins, bandwidth, axes) when comparing groups.
- Overplotting? Aggregate (hexbin/contours), smooth (KDE/LOESS), or facet.
- Bars/dots for categories; pies/radar only for quick feel with few categories.
- Trendlines: explore with non-parametric, present with linear/quadratic if clear.
- Color & accessibility: colorblind-safe palettes, direct labels, readable ticks.
- Workflow: Identify → Decide encodings → Execute minimal → Audit truth & access.
Quick Chart Picker
- 1 continuous: Histogram / Density / Box / Strip
- 2 continuous: Scatter (+ trend) / Hexbin / Contour (with Z)
- 1 categorical: Bar / Dot / Pie (≤6)
- Cat × continuous: Box/Violin / Dot/Strip
- Cat × categorical: Mosaic / Table plot
- Many groups: Violin / Box multiples / Ridgeline / Trellis
- Profiles across many categories: Radar (with caution)
At-a-Glance Comparison
| Chart | Data type | Vars | Shines at | Avoid when | Notes |
|---|---|---|---|---|---|
| Histogram | Continuous | 1 | First-look shape | Tiny N | Bin width matters; use density for shape |
| Density (KDE) | Continuous | 1 | Overlay 3–6 groups | Very small N | Fix bandwidth across groups |
| Strip | Continuous | 1 | Show every point | Huge N | Use jitter + small dots |
| Box | Continuous | 1 (per group) | Many groups | Need shape detail | Show N; 1.5×IQR rule |
| Violin | Continuous | 1 (per group) | Shape + summary | Bandwidth varies | Keep bandwidth constant |
| Bar | Categorical | 1 | Counts/% | Too many cats | Horizontal + sort |
| Dot | Categorical | 1 | Long lists | None | Less ink, clearer ranks |
| Pie/Donut | Categorical | 1 | ≤6 slices | Precision/ranking | Angle hard; prefer bars for detail |
| Radar | Categorical (many) | 1+ | Profiles | Few categories | Read spoke length, not area |
| Scatter | Num×Num | 2 | Form/outliers | Overplotting | Alpha, hexbin, contours |
| Bubble | Num×Num×Num | 3–4 | Size + color | Blob of points | Scale by area; size legend |
| Contour | 3 continuous | 3 | Surface shape | Sparse data | Levels + uniform colormap |
Table of Contents
- Basics of Analysis
- Distributional Analysis with Continuous Data
- Distributional Analysis with Discrete Data
- Visualizing Multiple Distributions
- Visualizing Relationships
- Visualizing Multi-Dimensional Relationships
- Conclusion
1. Basics of Analysis
a) Types of data
When to use: Choose encodings based on continuous vs discrete variables. What it shows: Structure (rows × columns), types (string, numeric: continuous/discrete). Checklist: units, missingness, outliers, cardinality.
b) Univariate, bivariate, and multivariate analysis
- Univariate: hist/density/box/strip.
- Bivariate: scatter (+ trend), box/violin, mosaic/heatmap.
- Multivariate: encodings (color/size/shape), facets/trellis, SPLOM.
b) Univariate, bivariate, and multivariate analysis
Univariate: hist/density/box/strip
Univariate analysis examines a single variable to understand its distribution—center, spread, skew, and outliers. Use histograms to see overall shape (bin width matters), density (KDE) plots for a smooth profile or when comparing shapes across groups (use the same bandwidth), box plots to compare medians and IQRs across many categories, and strip plots to show every observation for small–medium samples.
Bivariate: scatter (+ trend), box/violin, mosaic/heatmap
Bivariate analysis explores relationships between two variables. For numeric–numeric pairs, start with a scatter plot and add a trendline (non-parametric to explore curvature; linear/quadratic to summarize). For numeric–categorical comparisons, use box or violin plots to contrast groups. For categorical–categorical pairs, mosaic or heatmap views reveal composition and hotspots at a glance.
Multivariate analysis (3+ variables) layers information using visual encodings—color, size, and shape—or splits the view into coordinated panels ( facets/trellis ) so scales stay comparable. For many numeric variables, a SPLOM (scatter-plot matrix) replaces correlation tables with mini-scatters, making nonlinearity, clusters, and outliers visible.
Rule of thumb: pick the simplest view that answers the question, standardize scales/smoothing across groups, and annotate one clear takeaway.
2. Distributional Analysis with Continuous Data
a) Histograms
- When to use: first-look shape for a continuous variable.
- Design choices: bin width, density vs count, shared bin edges for groups.
- Pitfalls: misleading binning; tiny N.
- Pro tips: test ½×/2× bin width; report N + bin rule; log-X for heavy tails.
- Checklist: show mode(s), tails, outliers; annotate key ranges.
b) Density plots
- When to use: smooth shape; overlay 3–6 groups.
- Design choices: bandwidth, kernel (keep default), shared bandwidth across groups.
- Pitfalls: over-smoothing/under-smoothing; boundary bias.
- Pro tips: verify with hist/ECDF; use same axes + transparency.
c) Strip plots
- When to use: show every observation (small/medium N).
- Design choices: banding, jitter, dot size/alpha.
- Pitfalls: clutter at large N.
- Pro tips: overlay median/IQR; facet for groups.
d) Box plots
- When to use: compare many groups fast.
- Design choices: whisker rule (1.5×IQR), notches, order by median.
- Pitfalls: hides multimodality.
- Pro tips: show N; overlay light jitter or add violins when shape matters.
3. Distributional Analysis with Discrete Data
a) Bar graphs and dot plots
- When to use: categorical counts/%.
- Design choices: vertical vs horizontal, sorting, labels.
- Pitfalls: too many bars; missing zero baseline.
- Pro tips: dot plots for long lists; 100% stacked for shares.
b) Pie charts
- When to use: ≤6 slices; quick feel.
- Design choices: start angle, sorting, direct labels.
- Pitfalls: angle/area perception; multiple pies hard to compare.
- Pro tips: provide bar/dot companion if precision needed; sparing “explode”.
c) Radar plots
- When to use: profiles across many categories on a common scale.
- Design choices: spoke order, normalization, ≤5 overlays.
- Pitfalls: reading area; too few categories.
- Pro tips: add small-multiple bars for precision.
4. Visualizing Multiple Distributions
a) Multiple histogram and density plots
- Mirror histograms: for exactly two groups; shared bins.
- Overlaid densities: 3–6 groups; same bandwidth.
- Ridgelines: ~8–15 groups; sort meaningfully.
b) Multiple box and violin plots
- Boxes: rank & spread across many groups.
- Violins: add shape (bimodality/skew).
- Pro tips: show N; state whisker rule/bandwidth; facet if crowded.
c) Multiple bar graphs and dot plots
- Prefer one grouped chart over many panels; switch to % for composition.
- Stacked/100% stacked for shares; dots for very long category lists.
d) Multiple pie and radar plots
- Multiple pies: hard to compare; consider nested donuts.
- Radar: clearer profile comparison across many categories.
5. Visualizing Relationships
a) Scatter plots
- When to use: numeric×numeric relationships.
- Design choices: alpha, marker size/shape, color for groups.
- Pitfalls: overplotting; discrete stacking.
- Pro tips: hexbin/contours for density; jitter (disclose) for discrete.
b) Lines of best fit
- Parametric: linear/quadratic (communicable slope/equation).
- Non-parametric: LOESS/splines (explore shape).
- Workflow: start non-parametric → present linear/quadratic if appropriate; show CIs; check residuals.
c) Line plots
- When to use: ordered X (time).
- Design choices: markers for sparse; line-only for dense; event annotations.
- Pro tips: between-line area to show gaps; small multiples > 10 overlapping lines.
d) Table plots
- When to use: two categorical variables; each cell shows a tiny bar.
- Pro tips: choose counts/row%/col% to match the question; consider mosaic when group size matters.
6. Visualizing Multi-Dimensional Relationships
a) Matrix scatter (SPLOM) and trellis plots
- SPLOM: mini-scatters for each variable pair; names or 1-D plots on the diagonal; fixed axes.
- Trellis: same x/y & identical limits across panels for honest subgroup comparisons.
b) Bubble plots
- When to use: add a 3rd variable via size (area), 4th via color.
- Pro tips: area-based scaling; size legend; weighted trendline if size implies importance.
c) Contour plots
- When to use: Z=f(X,Y) with smooth, continuous variables.
- Pro tips: adequate coverage or model → grid; ~10 levels; perceptually uniform colormap; show colorbar + sample coverage.
Design & Accessibility Essentials
- Titles/Subtitles: say what + where + when; 1 main insight.
- Axes: include units; use “nice” ticks; keep zero baseline for bars.
- Color: colorblind-safe palettes; limit hues; consistent semantics across charts.
- Labels: prefer direct labels; minimal legend hopping; readable font sizes.
- Scales: keep common scales in comparisons; disclose bins/bandwidth.
FAQ
Counts or density on histograms? Density for shape across different N; counts when absolutes matter.
How many categories are too many for bars? If labels collide (>12–15), switch to dot plots, facets, or Top-N + “Other”.
Are donuts better than pies? Same encoding; donut frees center for annotations but doesn’t add precision—use bars for ranking.
Dual y-axis? Generally avoid; facet or normalize instead.
Glossary
IQR: Interquartile range (Q3–Q1). Bandwidth: KDE smoothing parameter. LOESS: Local regression smoother. SPLOM: Scatter Plot Matrix.
7. Conclusion — Time to Visualize
Tool ladder: Excel → R/Stata/SPSS → Python. Deepen statistics and design psychology (color theory, Gestalt). Keep iterating with a simple workflow: Identify → Decide → Execute → Audit.
Leave a comment
Your email address will not be published. Required fields are marked *
