Thursday, August 26, 2021

Git Diff using Pandoc for Binary Documents

As I stated in my Documents as Code post, text formats such as Markdown work well with Git as it was written for source code that is in a text based format and therefore doesn’t understand what has changed between two revisions of a binary document.

So, if others are writing most of their documentation in either Microsoft Word or OpenOffice’s Writer applications, how can you examine the evolving content between the various commits via a git diff in a Git repository?

First, create a git repository:

$ git init binary_diff
$ cd binary_diff/

Then, create a *.odt document and add a simple line of text such as “hello.” Stage the file and commit the doc to the repo:

$ git add file.odt
$ git commit -m "Create file.odt with hello"

Now, change the text in the doc to “Hello Solar System.” Add and commit the updated doc:

$ git commit -am "Update the file.odt file"

Let’s see the git log output:

$ git log --oneline

f14e810 (HEAD -\> main) Update the file.odt file
a2f8e6a Create file.odt with hello

Next, issue a git diff on the first and last commit to show that binary files do not show the differences:

$ git diff a2f8e6a..f14e810

diff --git a/file.odt b/file.odt
index e08debd..02d4dce 100644
Binary files a/file.odt and b/file.odt differ

Not very helpful huh?

In order to enable diffs on binary files, do the following. First, create a .gitattributes file and add the following:

*.docx diff=docx
*.odt diff=odt

Then, add this to the .git/config file:

[diff "docx"]
    textconv = pandoc --to=plain
[diff "odt"]
    textconv = pandoc --to=plain

Now, do a git diff on the first and last commit to show that binary files do show the differences

$ git diff a2f8e6a..f14e810
diff --git a/file.odt b/file.odt
index 02d4dce..e08debd 100644
--- a/file.odt
+++ b/file.odt
@@ -1 +1 @@
-hello
+Hello Solar System

You will find that you can get the same result with *.docx file diffs.

This fix enables you to view how the .docx/.odt files have changed between the various commits.

Wednesday, August 25, 2021

What Text Format is Best for Git and GitHub?

For me this question was answered initially by considering what works best in Git and GitHub. Given that the readme file format in GitHub is Markdown, this is the path that I am on.

What are the other formats? One is LaTeX. Per Introduction to LaTeX (latex-project.org) , LaTeX “is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents, but it can be used for almost any form of publishing and provides a powerful platform for layout and format.”

My goal is to write documentation for the software projects/repositories in which I am engaged. I don't need high-quality typesetting to expose scientific formula but rather explain a code's business function and construction. Markdown is easy to learn and well supported.

Moreover, I have started using Hugo, one of the most popular open-source static site generators, to build my own personal website as well as a potential tool for documentation. After years of wrestling HTML/CSS and JavaScript, I am happy to be able to stand up a static site in minutes with Hugo. Hugo also has excellent Markdown support out of the box. In fact, you write your posts in Markdown.

As I stated in my previous post, Documents as Code ,when writing documents, I like to use either Microsoft Word or OpenOffice’s Writer. Both provide spell check along with grammar help and a thesaurus. Here is where the problem emerges. Microsoft Word (*.docx) or OpenOffice’s Writer (*.odt) produce binary files. Git and GitHub do not play well with binary files.

So, what did I do to better accommodate the process of *.docs/*.odt to Markdown? Enter Pandoc. As Pandoc's site states, “If you need to convert files from one markup format into another, pandoc is your swiss-army knife.”

In addition to this, Pandoc is a CLI tool. There is no graphic user interface. Therefore, you have to open a terminal in your Operating System of choice. For example, to convert this *.odt doc to Markdown, I did the following from the CLI:

$ pandoc 'What Text Format is best for Git and GitHub.odt' 
-o BestTextFormatForGit.md

For *.docs/*.odt files that have tables, I use the following providing options for table conversion into Markdown:

$ pandoc 'What Text Format is best for Git and GitHub.odt' -f 
odt -t markdown-simple_tables-multiline_tables-grid_tables
-o BestTextFormatForGit.md

In fact, this post was generated with the intial pandoc command above from an *.odt binary file. I only added the fenced code block sections to the Markdown to highlight the pandoc command output.

Next question to answer, “How do you do a git diff with binary *.docx/*.odt files?”

Wednesday, August 18, 2021

Documents as Code

This post assumes a basic knowledge of Git and GitHub.

I love the work flow of using Git and GitHub in developing code. I have been thinking how cool it would be to use the same tools and processes that I use with Git and GitHub for other disciplines such as the legal field or any job where you create and edit documents. In short, almost all, if not all fields. Part of my motivation is with the fact that I teach programming to Business Informatics students at a local university. Most of them will not be Software Developers when they graduate. However, how great it would be for them if they understood what their the common workflow of their co-working Software Devs? Secondly, I love how the Git and GitHub workflows assist me in better understanding cause and effect of my work as well as other possibilities within counterfactual scenarios.

What is the Git workflow? Per Atlasian, “A Git workflow is a recipe or recommendation for how to use Git to accomplish work in a consistent and productive manner.” Essentially, Git workflows are governed by branches. Using a branch means you deviate from the main stream of development and continue to do work without interfering with the main stream of work (see Git - Branches in a Nutshell (git-scm.com) ). Branches allow different team members to work independently and then combine their work when ready. For more see: https://about.gitlab.com/topics/version-control/what-is-git-workflow/ A commonly utilized Git workflow is the Gitflow Workflow. This workflow was first published and made popular by Vincent Driessen at nvie.

When writing documents, I like to use either Microsoft Word or OpenOffice’s Writer. Both provide spell check along with grammar help and a thesaurus. Here is where the problem emerges. Microsoft Word or OpenOffice’s Writer produce binary files. Git and GitHub do not play well with binary files. Git was written for source code that is in a text based format and therefore doesn’t understand what has changed between two revisions of a binary document. Most enterprises use some office suite such as Microsoft Word which produces binary files. While tools such as MS Word, or OpenOffice’s Writer, which I am using now, work great to produce and read docs, you can’t use Git or GitHub to review the document’s history.

Again, my point is that in order for buy-in from non-coder types, great tools such as Git and GitHub need to function with binary files such as *.docx and *.odt.

When searching for a resource that discuss the treating an enterprises document knowledge base as artifacts to use within the Git workflow to assist in the docx/odt conversion to text, I found the book Docs Like Code by Anne Gentle. This is from Docs Like Code:

When we say docs, we mean streamlined, tightly phrased, and fast-moving information that helps developers understand complex application interfaces. Docs can be anything from a single web page for a startup to an entire developer reference site. Modern docs, with their web and mobile interfaces and supportive user experience, are purposeful, instructive, and even beautiful. When we say treat docs like code, we mean that you: Store the doc source files in a version control system. Build the doc artifacts automatically. Ensure that a trusted set of reviewers meticulously reviews the docs. Publish the artifacts without much human intervention.

The next question, is to what text format is best for Git and GitHub?

See https://blog.front-matter.io/mfenner/using-microsoft-word-with-git as a resource for your Git config, etc.
See Generate PDF invoices from Markdown using Pandoc - DEV Community for markdown to PDF conversion.

Monday, August 02, 2021

The Basic Git Rebase

I have a bash shell script that creates three commits in the master branch. Then, the script creates and checks out a new branch called feature. In the feature branch two commits are created. Finally, two more commits are created in the master branch.

Here we run the script:

We will take a look at the master branch:

Now, a look at the feature branch:

Note that the feature branch history includes the initial commits from the master branch. In addition, you should note that we diverged our work when making commits on two different branches.

Let’s now take the changes that was introduced in F1 and F2 and reapply it on top of M5. In Git, this is called rebasing. With the rebase command, you can take all the changes that were committed on one branch and replay them on a different branch.

For this example, we are on the feature branch. From here we rebase it onto the master branch as follows:

As per https://git-scm.com/book/en/v2/Git-Branching-Rebasing, “This operation works by going to the common ancestor of the two branches (the one you’re on and the one you’re rebasing onto), getting the diff introduced by each commit of the branch you’re on, saving those diffs to temporary files, resetting the current branch to the same commit as the branch you are rebasing onto, and finally applying each change in turn.”

Now that we have rebased the feature branch commits onto the master branch, here is the following commit history:

Next, you can go back to the master branch, view its history, do a fast-forward merge, and then view its history again to see the .

Here is the result:

Enjoy!

Monday, April 05, 2021

Dotnetrocks, a Comment, Pull Requests, and Music to Code By

On September 24th of 2015, I got my first comment read on the Dotnetrocks podcast show #1201 that resulted in me receiving the coveted Dotnetrocks' mug. Since then, they stopped giving away coffee mugs and started giving away the podcast creator Carl Franklin's set of Music to Code By songs.

Well, in addition to getting the mug I recently got the Music to Code By set from having a comment read from show #1639 on show #1732 just a few weeks ago. Here is my comment:

Another great show guys! When considering home automation or any new technology in one’s busy life the challenge is to not make the customer alter their habits and lifestyles to make the technology work but rather to have the technology work for them in their current life habits. Carl was correct when he stated, "now you're imposing rules on your lifestyle to appease the technology."
 
I think this is the crux that most mortals face when considering a new technology. Of course, they are thinking, “Why should I have to change just to take so-called advantage of something new?” This is the primary challenge for home automation specifically and technology in general.
 
When implementing our ideas, how do we accommodate our users’ existing lifestyle patterns and habits? There is also the social aspect of home automation and tech. Carl's mention of using the song Freebird as a weapon is a prime example! Moreover, the only way that we can make home automation and tech with less friction is trial and error. In short, this topic has others when considering the customer is complex and not easy to navigate.

What was funny was that I was up early on March 25th ready for my run. I got out on my home street in Covington and rounded the corner heading east on Martin Luther King Drive and I hear my name mentioned as providing that show's comment!

At first, I was thinking, there cannot be too many other Mark McFaddens out there, but you never know. As I listened to the comment I realized that it sounded very familiar. When the cohost who normally reads listener comments, Richard Campbell, stated: "...the song Freebird as a weapon...." I knew that was me. Sure enough, when I got home and checked the messages on the show's site, I found a message from Richard that he used that comment on the show.

In any event, the show featured Microsoft's Mads Kristensen, program manager on the Visual Studio team who is an avid extension writer, with over a hundred published extensions to the Visual Studio Marketplace. During the show, Mads and Carl mentioned how nice it would be to have a Music to Code By extension for Visual Studio where you could press a button and start playing the music. Since listening to the show, I have received and downloaded the Music to Code By MP3 collection. 

A week later I ran across the Microsoft Visual Studio's Youtube channel's Writing Visual Studio Extensions with Mads - Music to code to by video! I love it when a plan falls together! After a quick search, I found the GitHub repo from Mads as well as the extension.

Next, I then forked Mads' repo, created a topic branch entitled "add-get-next-track," pushed the local branch to my forked GitHub repo, and then 
submitted one Pull Request (PR) to add the ability to skip to the next track. After that, I created a PR to skip to the previous track as well as PRs to update the README file and associated extension screenshot. The PRs were accepted by Mads no later than the next day and merged into the master branch.


Finally, above is the latest history from my local master branch after syncing with the extension's GitHub repo. 



Friday, March 26, 2021

Tech CEO's Imagined Congressional Hearing Statement



I found this both informative and humorous. 

Mark Levy of Wired imagines what a statement would look like if they reflected what the CEOs were really thinking during yesterdays congressional hearing: 😉

Greetings, chairpeople, ranking members, and just plain rank members waiting to trap me with yes-or-no questions. Thank you for the opportunity to come to the land of Move Slow and Don’t Fix Things to appear before your committee....

Read more at https://link.wired.com/view/5dadee60954fcf02e54e5d28dw0h8.6lez/101d53e5


Tuesday, March 23, 2021

Digidog or Analogdog (a.k.a. Lady)

 



Wired Magazine's website had an interesting article entitled, A New York Lawmaker Wants to Ban Police Use of Armed Robots. In essence, the article discussed the concern of using unarmed robots, such as Boston Robotics’ Digidog, and how that could escalate to the use of weaponized robots within law enforcement. While it is a necessary discussion, I thought about my analog dog, Lady, and how she compares to Boston Robotics’ Digidog. 

Concerning protection for our family Lady would as soon lick a stranger than even bark at or attack them. Here, if my home is broken into and I need a dog to protect us, at least by intimidating the intruder, Digidog is the go-to. 

However, when we went to the local dog rescue to get a new canine companion, it was not for protection. When it comes to hanging around in the living room, watching TV, or even working from home, the Analogdog is the choice. Moreover, both of our sons love her and she is great with the grandkids! 

Perhaps if Digidog would snore like our Analogdog and then let out a big groan as it wakes up (if in fact the Digidog ever goes to sleep) it might be OK. Yet, I doubt that its design, being built for function and efficiency, would be as snuggly as our Lady.