Wednesday, August 18, 2021

Documents as Code

This post assumes a basic knowledge of Git and GitHub.

I love the work flow of using Git and GitHub in developing code. I have been thinking how cool it would be to use the same tools and processes that I use with Git and GitHub for other disciplines such as the legal field or any job where you create and edit documents. In short, almost all, if not all fields. Part of my motivation is with the fact that I teach programming to Business Informatics students at a local university. Most of them will not be Software Developers when they graduate. However, how great it would be for them if they understood what their the common workflow of their co-working Software Devs? Secondly, I love how the Git and GitHub workflows assist me in better understanding cause and effect of my work as well as other possibilities within counterfactual scenarios.

What is the Git workflow? Per Atlasian, “A Git workflow is a recipe or recommendation for how to use Git to accomplish work in a consistent and productive manner.” Essentially, Git workflows are governed by branches. Using a branch means you deviate from the main stream of development and continue to do work without interfering with the main stream of work (see Git - Branches in a Nutshell (git-scm.com) ). Branches allow different team members to work independently and then combine their work when ready. For more see: https://about.gitlab.com/topics/version-control/what-is-git-workflow/ A commonly utilized Git workflow is the Gitflow Workflow. This workflow was first published and made popular by Vincent Driessen at nvie.

When writing documents, I like to use either Microsoft Word or OpenOffice’s Writer. Both provide spell check along with grammar help and a thesaurus. Here is where the problem emerges. Microsoft Word or OpenOffice’s Writer produce binary files. Git and GitHub do not play well with binary files. Git was written for source code that is in a text based format and therefore doesn’t understand what has changed between two revisions of a binary document. Most enterprises use some office suite such as Microsoft Word which produces binary files. While tools such as MS Word, or OpenOffice’s Writer, which I am using now, work great to produce and read docs, you can’t use Git or GitHub to review the document’s history.

Again, my point is that in order for buy-in from non-coder types, great tools such as Git and GitHub need to function with binary files such as *.docx and *.odt.

When searching for a resource that discuss the treating an enterprises document knowledge base as artifacts to use within the Git workflow to assist in the docx/odt conversion to text, I found the book Docs Like Code by Anne Gentle. This is from Docs Like Code:

When we say docs, we mean streamlined, tightly phrased, and fast-moving information that helps developers understand complex application interfaces. Docs can be anything from a single web page for a startup to an entire developer reference site. Modern docs, with their web and mobile interfaces and supportive user experience, are purposeful, instructive, and even beautiful. When we say treat docs like code, we mean that you: Store the doc source files in a version control system. Build the doc artifacts automatically. Ensure that a trusted set of reviewers meticulously reviews the docs. Publish the artifacts without much human intervention.

The next question, is to what text format is best for Git and GitHub?

See https://blog.front-matter.io/mfenner/using-microsoft-word-with-git as a resource for your Git config, etc.
See Generate PDF invoices from Markdown using Pandoc - DEV Community for markdown to PDF conversion.

1 comment:

TeX'er. said...

(La)TeX. Ultimate control over formatting.