picture
RJR-logo

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

icon
RJR-3x

Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE



Adding Typeset Sidenotes to a PDF

posted: 22 APR 2015

It’s not every day that you get to mark something as done that has been on your TO DO list for more than a decade.

For many years I have wanted to produce annotated versions of papers, where the annotations would be present as typeset side notes in the original PDF file.

I have recently finished a working system for doing that and here’s a quick explanation:

First, the technical explanation:

My goal was to make it easy to be able to start with a PDF document that looked more or less like this:

image-01

and quickly convert it into something like

image-02

Now I can in fact easily do this and here is how it works: I start by producing a version of the original with ruler marks on it, so that it is easy to specify exactly where the side notes should be located, as

image-03

Here we see that we want to locate the note on absolute page number 4 (the numbers along the edge), next to lines 16-26. The simplest way to specify that would be something like

4:16-26::Mendel notes that no previous work has ever attempted to do a quantitative analysis of the types of progeny produced in controlled breeding experiments.

So, I have developed software that can (1) produce the ruled versions of the original, necessary for specifying note locations, and (2) take a simple text file containing statements like the one above and produce typeset side notes that are then merged with the original PDF to give the final, typeset version.

From an appropriately written text file containing the text for side notes, I can automatically produce a typeset, annotated version of Mendel’s paper in less than a minute.

The system fully supports proportional fonts (my first efforts using mono-spaced fonts produced workable, but unattractive results). I decided that as long as I was going to figure out how to automate the typesetting of proportional fonts, I might as well go ahead and include support for UTF-8 encoded text, which would give me the ability to produce side notes in most any European language. And, when commenting on genetics material, there may be a desire to use italics or to allow the creation of super- or sub-scripts, so I added support for that as well.

Thus, a text file with comments can be written in any language (so long as the file is UTF-8 encoded, which is now standard on most computer systems) and a few HTML-like tags can be used to indicate where italics should go, etc.

For example, the following comments in the source file

4:15-26::Embedding HTML-style tags in the source text makes it possible to produce typeset side notes in <i>italics</i> or in <b>bold</b> or in <b><i>bold italics</i></b>. It is also possible to include superscripts like N<sup>2</sup> or subscripts like F<sub>1</sub>.

4:15-35::The typesetting systems supports generic UTF-8 encoding for nearly all European languages, so that comments may be written in the English language, ή στην ελληνική γλώσσα, nebo v českém jazyce, или на русском языке, ou dans la langue française, vagy a magyar nyelv, lub w języku polskim, eller i det norske språket, or …

will produce this typeset result:

image8

Note: the finished file is fully typeset. The image above looks flaky in the foreign languages, but that’s because I have taken a screenshot of the page and converted it into an image file to include in this document.

Zooming in further in the Acrobat reader shows the actual quality of the typesetting:

image5

To support proofreading of the final version, I have designed the system so that it is trivially easy to produce a document that contains the typeset side notes AND the ruler marks. This allows for a quick verification that the notes have in fact gone where they are supposed to go.

image6

The system tries to set the side notes so that they are vertically centered immediately next to the specified text, but if there is a collision on the placement of the notes, the system will automatically adjust the placement of the notes (shifting them up or down) to make them fit (provided that they can be made to fit, given the length of the notes and the size of the specified font).

Bottom line: the system was designed to do what I consider computers should do, if they are to be maximally useful. That is, it allows the creation of a final product for just the marginal effort necessary to create that specific product. In this case, the actual writing of the side notes is the only inescapable human work to be done. The rest is automated.

The system is pretty flexible, so that the notes may be set in any color and in any size, using either serif or sanserif fonts, or even a font that looks like it is hand drawn. Using a parameter file I can adjust the tracking (spacing between letters) or leading (spacing between the lines), as well as the relative size and relative position of super- and sub-scripted characters. If super- or subscripts are going to be used, the leading needs to be adjusted to make sure that there is room for them. I have found through experimentation, that the readability of some fonts (especially when set small, like 8 point, or smaller) is improved by a bit of increased inter-letter tracking.

Finally, there is no reason this is restricted to Mendel’s paper (as in the examples shown). The system can be used to produce annotated versions of any PDF file. And, the PDF file does not have to be typeset — it can just be scanned pages of some paper original. Nor does the original have to have room in the margins. It is relatively easy with some free tools to modify any PDF to add more space around the edges.

I did not build this system as a user-friendly product that could easily be used by anyone. However, I would be happy to consider working as a collaborator with anyone who might find this a useful tool for some project.

If you are interested in a possible collaboration, please CONTACT ME

 

RJR Experience and Expertise

Researcher

Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.

Educator

Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.

Administrator

Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.

Technologist

Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.

Publisher

While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.

Speaker

Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.

Facilitator

Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.

Designer

Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

963 Red Tail Lane
Bellingham, WA 98226

206-300-3443

E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )