Typography as information

Like everyone who reads and thinks, I have been amused by the puffery surrounding the new AI. Journalism about the old AI often reflected on history and the future of reading. In contrast, journalism on the new AI seems more surprised and frustrated that the past is unruly.

Early computer scientists had seen complicated typesetting in their mathematical textbooks. They had not experienced the limited possibilities of the Hypertext Markup Language (HTML). So of course they knew that proper layout and typography was part of communication. Many of them probably knew specialists who could explain how layout and typography contributed to the meaning of texts.

What has changed for the new generation of computer workers?

Consider a typical example from Ars Technica: Why extracting data from PDFs is still a nightmare for data experts

The article writes that meaning inherent in typography is “locked away” or “unstructured”, because automated systems cannot identify the “elements” of a document. Typography and layout are described as elements, a technical term in the HTML document model. HTML itself is a 32-year-old simplification of Standard Generalized Markup Language (SGML), which was established in 1986 to simplify exchanging office documents between mainframes.

So this article is concerned that,

  1. Firstly, lots of documents are not well represented by an out-of-date simplification of an even more out-of-date model for office documents, itself already overly simplified so it can be automatically processed by mainframes.

  2. Secondly, other people have not made computer programs that have already solved this problem.

  3. Thirdly, technologies designed to help with office work do not necessarily work well when applied to something else.

This isn’t a problem to be solved by computers, but a problem caused by computers. I think of this in terms of the history of the office.

People have long communicated in writing in complicated and sophisticated ways. Very recently, computer manufacturers agreed on a deliberate simplifications of writing. Employers then forced office workers to use these simplifications to be paid. These simplifications turn out to be insufficient, at least in part because people only used them because they were forced.