Wednesday, August 25, 2021

What Text Format is Best for Git and GitHub?

For me this question was answered initially by considering what works best in Git and GitHub. Given that the readme file format in GitHub is Markdown, this is the path that I am on.

What are the other formats? One is LaTeX. Per Introduction to LaTeX (latex-project.org) , LaTeX “is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents, but it can be used for almost any form of publishing and provides a powerful platform for layout and format.”

My goal is to write documentation for the software projects/repositories in which I am engaged. I don't need high-quality typesetting to expose scientific formula but rather explain a code's business function and construction. Markdown is easy to learn and well supported.

Moreover, I have started using Hugo, one of the most popular open-source static site generators, to build my own personal website as well as a potential tool for documentation. After years of wrestling HTML/CSS and JavaScript, I am happy to be able to stand up a static site in minutes with Hugo. Hugo also has excellent Markdown support out of the box. In fact, you write your posts in Markdown.

As I stated in my previous post, Documents as Code ,when writing documents, I like to use either Microsoft Word or OpenOffice’s Writer. Both provide spell check along with grammar help and a thesaurus. Here is where the problem emerges. Microsoft Word (*.docx) or OpenOffice’s Writer (*.odt) produce binary files. Git and GitHub do not play well with binary files.

So, what did I do to better accommodate the process of *.docs/*.odt to Markdown? Enter Pandoc. As Pandoc's site states, “If you need to convert files from one markup format into another, pandoc is your swiss-army knife.”

In addition to this, Pandoc is a CLI tool. There is no graphic user interface. Therefore, you have to open a terminal in your Operating System of choice. For example, to convert this *.odt doc to Markdown, I did the following from the CLI:

$ pandoc 'What Text Format is best for Git and GitHub.odt' 
-o BestTextFormatForGit.md

For *.docs/*.odt files that have tables, I use the following providing options for table conversion into Markdown:

$ pandoc 'What Text Format is best for Git and GitHub.odt' -f 
odt -t markdown-simple_tables-multiline_tables-grid_tables
-o BestTextFormatForGit.md

In fact, this post was generated with the intial pandoc command above from an *.odt binary file. I only added the fenced code block sections to the Markdown to highlight the pandoc command output.

Next question to answer, “How do you do a git diff with binary *.docx/*.odt files?”

No comments: