Data Visualization
Python has very versatile plotting libraries. The primary libraries are matplotlib and seaborn, but there are others - for example, plotly and other libraries create interactive html plots. All of the standard plot types are available. For ideas, see here for matplotlib examples and here for seaborn. We can ask Julius to adjust the figure size, font sizes, tick locations, tick labels, axis labels, title, legend positioning, and more. Figures can be saved and downloaded as jpegs, pngs, pdfs, or other types. Different themes can also be applied – gridded or not gridded, light or dark background, legend or no legend, etc. We can put multiple plots in the same figure, and they can share a legend if it is appropriate. We can add annotations within plots. I recommend encouraging students to ask Julius what is possible.
Three particular seaborn features are worth mentioning. The regplot function creates scatter plots with regression lines. The default is to include a confidence interval around the line, but we can ask for it to be removed. The “hue” argument in seaborn provides a good means of plotting three-dimensional data. In addition to the x and y dimensions, a third dimension is represented by color. This works for scatter plots, bar plots, box plots, density plots, and other types. We can include separate regression lines for different colors in a scatter plot. We can also use the type of marker as a third dimension instead of color, or we can use color together with marker type to show four dimensions, though that becomes hard to interpret. I recommend indicating a fourth dimension by including multiple subplots in the same figure. The third feature that should be shown to students is the pairplot function. It creates a matrix of scatter plots (with density plots or histograms on the diagonal), so the correlations among multiple variables can be seen simultaneously.
Also on substack at kerryback.substack.com