The Craft of Scientific Illustration

The Good, the Bad, and the Ugly


Scientific Illustration is a rich and wide field for creative activity, ranging from artistic qualitative mixed-media drawings visualizing concepts of the directly accessible or non-accessible natural world to the quantitative, highly technical plotting of (numerical) data. This site is restricted to the latter aspect, to the presentation and plotting of scientific data and it is, almost necessarily, strongly personally biased.

The body of literature on scientific graphing is considerable; for me, it was mostly of limited usefulness at best. Most books mention the graphical presentation of scientific data in a byway chapter within the framework of scientific publishing. Most of the time spent on these chapters is wasted time. One source I found pleasantly out of the ordinary was E.R. Tufte's The Visual Display of Quantitative Information (Graphics Press, Connecticut 2001) (at Amazon).

The times of hand-drawing figures containing scientific data are gone for good. The quality of the illustrations did not turn to the better, unfortunately, this despite the very powerful and versatile computer plotting programs available, frequently even at no cost. Surprisingly enough, there are still some books on the market containing high-quality hand-drawn figures. It is questionable, though, if they really will serve as models of what to strive for. I mention some of them for those who have have no idea what I am up to here. One of them is Building Scientific Apparatus (2nd Ed.) by Moore, Davis, and Coplan, published by Addison-Wesely, it contains a large number of engineering-type ink drawings. The classic
Gravitation by Misner, Thorne, and Wheeler, published by Freeman, contains a large number of tasteful drawings, from sketches setting the stage to detailed quantitative science plots. More of culinary type, admittedly, are the ink-and-pen drawings in the 1997 edition of The Joy of Cooking by Rombauer et al. published by Scribner; the figures in this book can easily cope with top-notch scientific illustrations in biology. Eric Sloan's Book of Storms and other meteorologically oriented books of the same author contain tasteful ink-and-pen illustrations making his books (reissued by Dover) worthwhile investments. I think there is no chance to catch the flair and the beauty of the illustrations and plots in the above-mentioned example books with any computerized illustration package. Hence, as with any craft: Just to have the tools of the trade at hand does not lead inevitably to high-quality work. Unfortunately, currently not too many people and even worse, most publishers do not seem to be bothered by the deplorable state of scientific illustration in the literature.

Illustrating scientific ideas and the production of figures with scientific data has been a respected profession of its own in the past. The recent streamlining processes of work flow has driven these handcrafters almost to extinction. Even if scientific illustration (the kind we discuss on this page) is not really an art in itself, it is (or at least was) a highly developed handcraft. As it is, handcrafts will never be replaced by technical tools and gadgets; it takes human skills (this is where the art lies) that need to be constantly developed and exercised; the technical tools can make life easier and speed up the preparation process through allowing for extensive experimentation.
We as casual illustrators (most often not even by choice) cannot be expected to reach the level of craftsmanship of the professionals; following a few simple guidelines though can easily and efficiently reduce the number of bad and ugly figures that make it into scientific papers.

Examples are the obvious way to quickly illustrate virtues and aesthetic/conceptual crimes in scientific illustration. As it is much easier to point one's finger to bad examples of others than to do better oneself, this the avenue we choose:

The Good: The illustrations in Icko Iben Jr.'s astrophysical articles were of high class over many years. Seemingly, he had a very good scientific illustrator at his hands. The semi-qualitative figure shown here has a the well balanced placement of labels and lines. The height of the numbers and the weight of the fonts are just right. Another example for semi-qualitative illustration of Iben flavor is the Hertzsprung-Russell (HR) diagram with an extreme range in effective temperature and luminosity. Another HR diagram that appeared in various Iben publications proves that even extensive labeling and actually folding two plots into one can be done in a comprehensible and even attractive way.
To criticize Iben's style just a little bit, one might put the finger on the mostly used thick pencil style of the lines in plots containing direct simulation data as well as in the (semi-)qualitative illustrative figures. This means that it is not always clear, if the thick lines are already some fitting to somewhat noisy numerical data or if the computed results were really of the quality which was presented. As with observational data, also in numerical data plots I prefer to be informed about the uncertainties, even if they are not contributing to the aesthetic value. After all, scientific illustration should inform in an objective way.

This is an example of what I call OldSpringer style of scientific illustration. I confess that I do not know if it really can be attributed to Springer. However, mainly in the first part of the 20th century, most of the important Springer Verlag publications had the same style (such as the physics journals, the Zeitschrift für Astrophysik, the Handbuch der Physik with its dozens of volumes...). Even today many of the published engineering drawings by BOSCH are quite close to this OldSpringer style. The illustration style was not confined, however, to Springer publications, it can also be found in the old technical Soviet literature. So, if anybody knows more than I do on this particular scientific drawing style, I would appreciate a note to complete or change my fragmentary knowledge. The color-magnitude diagram of the Hyades on the left shows clearly that it is hand drawn. The weight of the lines relative to the symbols are well balanced, the same is true for the filled and open symbols. The coordinate grid is slightly less heavy than the coordinate frame both do not dominate or disturb the general picture of the data points that are the important information, they support the quantitative evaluation of the data by a reader. The size of the labeling as well as the size of the axis titles lead to a balanced overall impression. The slanted font adds some personality to the figure which does not really go along with an upright font. I was told by a scientific illustrator, that choosing a slanted font when hand-labeling plots was the way to go as it hides much better small deviations from the same slanting angle between neighboring letters and/or digits. With upright fonts, such small deviations from the vertical jump easily into the eye.
Another nice example of a multi-line plot in OldSpringer style is the wavelength vs. a Balmer-jump parameter diagram. Like the last plot, this one is out of one of the astronomical volumes of the Handbuch der Physik, the 1958 edition. As pointed out, in the last plot, the relative sizes of labels and texts, as well as the relative weight of the lines are just right to result in an harmonious appearance of the admittedly dense information; o.k. the underlying grid could be drawn on a finer line weight, or as of today in a light grey.

A decently planned and executed Hertzsprung-Russell diagram - mainly with references to pulsating variable stars and with selected stars' evolution tracks underlaid. Despite the many lines and many data points (representing observed pulsating variables), the figure does not look overly crowded. The choice of only a few sans-serifed fonts helps to discretely label important features without drowning the plot.

The Bad: The Metallicity - Age relationship for stars plotted in the figure to the left (enlarge to view!) shows an example of an awkward choice of plot symbols for the data. One of the main distractions in this plot is the size of the symbols; does the size mean anything? Is it a measure of the accuracy of the data? The text did not mention anything. Hence, why on earth are the symbols that large? Due to their size and the large number of data points there is considerable overlap between the symbols so that the impression of an overcrowded plot is intensified. Smaller, filled symbols - not really pentagons - would have done a much better service. The labeling of the plot is about equally oversized as the symbols themselves. The labels are so large that the angular outlines of the Hershey fonts are visible. Nowadays, Hershey fonts compare to Postscript fonts the way Hershey chocolate compares to Swiss chocolate; ...well, it's all a matter of taste, in the end; and to be fair, the Hershey fonts served  scientific plotting well in the 80s.
The pale green background of the plot is not the fault of the authors; it is a bad habit enforced by editors on the writers of mostly semipopular articles. Seemingly, the prejudice that color-underlaid figures make articles more interesting gained a foothold; most of the time, however, they are a nuisance and boring articles remain boring.

A SPIE reference book on optomechanical analyses (published in 2002) featured numerous very thick pencil illustrations of the kind shown on the left. The plan of the plot is a good one: show thin grid lines for the reader to get numbers from the curves. But why on earth are the curves that heavy? Furthermore, the serifed font used for the labeling does not make the plot any lighter. A slim and taller font would have improved the overall appearance the figure. Unfortunately, the whole book is full of that kind of figures. Too sad, after all we are living in a desktop publishing age when drawings can be quickly modified with a few mouse clicks rather than hours of laborious and tricky hand (re)drawing.

This is a typical example of a thick pencil plot (as published in a conference proceedings volume on reactor safety). Despite the lines connecting the data being thicker than those of the frame - as advocated - the plot does not stands for what I mean. The lines of the simulation results are so thick that they obscure the situation in regions of overlapping. Arrows with associated numbers to refer to the computation parameters are added to the figure. It remains mostly unclear to which curves these arrows point to. Hence, the whole exercise is useless. The major tick-marks are just fat minor ones, giving the whole presentation a rather clumsy aftertaste. The encircled numbers in the plot are clearly sans serifed, the rest is set in a Times font. Independent of the (to me) inappropriate "Times" labeling, this change of typeface within the plot does not help to beautify it.

Is it conceivable that the author of this figure intended to demonstrate good agreement between two sets of measurements, or the smoothness of the data, or what? In any case, the choice of the ranges of the ordinate in particular and of the abscissa are in no relation to the range of the measured data. If the key had to be placed inside the figure, then this could have been done in a more space-conserving fashion, in particular if the ordinate-range would have been chosen more prudently. Last but not least, the choice of symbols does not seem very clever in this case, the white squares hide the black ones over most of the measured abscissa-range. By the way...this example was not made up; it was really published in peer-reviewed conference proceedings on experimental fluid dynamics.


The Ugly: I think there is not much to be said about this plot; just DO NOT do it this way - your mother will not be proud of you. O.K. the scanning degraded the plot slightly, but only slightly. The letters in the "all bubbles detached" comment were bleeding into each other already in the publication itself (an AIP Conference Proceedings volume on hydrodynamics, by the way). Why on earth is the frame of the coordinate box so heavy, is it that important, or has anybody died? The arrows are much too heavy. The symbols are also too heavy and too fat. The coordinate grid is good per se and in weight, if the author was really interested to provide quantitative information.

The scan on the left shows two contour-plots with labels on the contours. The figure is from the same proceedings volume as mentioned above. The grey-scale is pretty useless, most possibly it was in color on the computer screen. The white boxes around the contour labels are disturbing the picture. The choice of values and the density of labels are both making the whole thing incomprehensible. There is no sign of neither thought nor care having gone into this figure. The final verdict on this one: `just don't spoil your reputation with anything like that'!

This is no fake to pretend new dimensions  of ugliness  -  no, this plot was really published! It appeared in `Laser Techniques for Fluid Mechanics´, Springer Verlag (2002). Ugliness going that deep, does not require detailed analysis of the weak points anymore. The authors must either have been under drugs or having had at least 2 pars pro mille of alcohol in their blood when doing the figure. Even Springer is apparently no guarantee for high-quality publications anymore. Hence, this proofs that the universe is inflationary.

Some DOs and DO NOTs

for appealing science plots. All points are personally biased and far from complete. The following statements, taken from E.R. Tufte's The Visual Display of Quantitative Information , capture the spirit for efficient and elegant technical drawing:


Above all, show the data

Emphasize the data and not the design of the figure.

 

Maximize the data to ink ratio

Avoid graphical features that distract the readers from the data you want to them to learn.

 

No chartjunk

Refrain from shadows under boxes and texts, from crosshatched, hatched, and weird-patterned fillings of areas, colored backgrounds, and other senseless wastes of ink.

 

Revise and edit!

Only looking at the result proves you right or wrong. Computer-assisted drawing makes it easy to iterate through the preperation process.


·  The coordinate frame should never constitute the heaviest lines in a plot. The most important lines come from the science data. A full coordinate frame (two x-axes and two y-axes forming a closed box) looks - most of the time - better than only a single x- and a y-axis each. A coordinate frame produces a desirable closed presentation and helps in the quantitative evaluation of a plot if intended.


·  The tick marks are usually of the same line weight as the coordinate axes. The length of the ticks should not interfere with the data and the length should make it easy for the unaided eye to count the units. Minor tick marks should be about 1/2 to 3/4 of the length of the major ones. The size of the major tick marks should be of the order of 2 - 4 % of the plot size.


·  Grid lines are appropriate if the author is interested in the possibility for the reader to extract quantitative information from the plot easily. The gridlines should be the finest lines in the plot, just barely visible to guide the eye or the ruler.


·  Select symbols that are also legible when the plot is shrunk in press or diminished in size when photocopied. Clustered data destroy any distinction between symbols or even make single symbols unrecognizable when the plot program does not use hiding of partly overlapping symbols. The size of the symbols should not be too large to give a wrong impression of the accuracy of the data but it should be big enough for good legibility. Different symbols to distinguish between different sets should be chosen carefully: Circles, squares, triangles, crosses can be easily distinguished over a broad range of sizes. However, stars (four-armed ones) and asterisks (with 5 arms) are already difficult to distinguish, in particular if the data points are densely sprinkled, and finally not to mention the use of polygons with more than 5 corners.


·  Even nowadays, colors should be used only if all other means to present the data fail. First, not all journals (and especially not proceedings publishers) support color printing. Second, most readers still use black/white printers and copiers to get their personal copies. Hence, most of the time, the benefit of color is lost at the level of the end-user. Most information can be presented quite well in grey-scale figures. It is claimed that a trained eye can distinguish as many as 128 shades of grey (which is more than in any other color). Admittedly, colored figures usually look richer in information and appear definitely more seducing than grey ones, even so if both aspects are unjustified. In any case, grey scales allow for a more objective judgment of the information than color wedges, which include colors of different brilliance that easily mislead our brains. The textbook Physics of the Solar Corona by M. Aschwanden (published by Springer in 2004) is a magnificent example of how mostly black/white graphs and cleverly planned grey-scale figures communicate very elegantly all the pertinent scientific information.
Color is mainly useful for three-dimensional data projected onto two-dimensional sheets. If grey scale is, for whatever reason, out of discussion from the outset, make sure the chosen color scheme translates into something useful when mapped onto grey scale (for a bad example see the graph under `The Ugly' further up).
Colored line graphs are most of the time superfluous. Even if the colors for the different lines are chosen carefully, different line types and/or different line weights are usually at least as useful to distinguish between different curves.


·  Keys of symbols used in a plot should never be described in boxes with shadows. Shadows in general might be (if at all) appropriate in advertisements at your local grocery store but they are definitely out of place in any scientific illustration.


·  The most prominent lines of a graph should be the lines with scientific information. Different data families, i.e. different lines, can be discriminated either with different line types (not too much of them, to maintain lucidity) or different line weights.


·  Use as few different fonts as possible to label and annotate a plot. Express the importance of the various textual parts with the font size not with different font families.


·  Plots look lighter when using sans-serif fonts to label and annotate a figure. Serif fonts are fine for long texts where the eye needs to be guided by the serifs. I find that serifs make most plots clumsy.


·  Never ever use gothic majuscles only. First, it is essentially illegible and second, it proves that the author at best knew how to select the font on the computer but otherwise has not even basic appreciation of lettering, not to mention calligraphy or taste.


Software packages

Software for technical drawing at the computer is abundant, but many have deficiencies in one or the other aspect important to scientific illustration. The software packages mentioned below refer to those open-source tools that also provide capabilities to label plots with decently looking mathematical expressions (i.e. with something that tastes TeX-like). They all run under Linux and many of them also on other platforms.


The stand-alone plotting package called GLE appears probably pre-historic by present-day GUI standards. Since version 4.1.0, GLE offers a GUI called QGLE which can be used either as interactive graph generator or as a previewer of results from scripts. The GLE scripting language is powerful, it even allows for direct postscript coding within GLE scripts. The output from GLE can be piped into various formats, ranging from eps to png, jpg, pdf. Contemplating my working behavior of lately, it seems as if Yorick is slipping in with increasing frequency as a device for high-end plots and not only for pre- and post-processing of data. Yorick is a powerful scripting environment for scientific computing, coming with a wonderful interactive plotting facility. Labeling is not highly developed, and I must defend Yorick immediately: it was never intended for that glamour fiddling stuff. The plots can be stored among others in postscript/eps format. Therefore, the final publication-ready brush-up can be done via any illustrator software that can deal with postscript input files (e.g. GLE, Inkscape, Skencil, Xfig, gimp, or whatever your heart beats for). For example Scribus turns out to be a very interesting alternative to expensive page layout programs. With Scribus, eps or pdf figures can be imported to be supplemented with text for labeling, or annotations or composite figures can be created. At the end, the result can be exported again as a pdf or eps file. But Scribus can do more, it is designed to do DTP on a professional level.

matplotlib is a plotting library for/in the Python scripting language. The syntax resembles that of Matlab. Already the entry page of the matplotlib-homepage convinces us of the high quality plots that can be produced with this package. Since matplotlib is rooted inside a scripting language it is easy for the user to go as complex as necessary.  Second, the scripting nature ensures that plots can be easily reproduced and/or placed in a batch processing environment. For all those who prefer a printed, tutorial-like guide over plain online help, Tosi's book Matplotlib for Python Developers (at Amazon) might prove helpful.


1.XI.11