Markdown and GitHub

First Steps Toward learning Modern Digital Practices for Sustainable and Shareable Research

Tuesday January 26, 2018, 12:30-3:30pm PST
UCSB South Hall 2509

Workshop Plan

  1. Discuss principles for sustainable and shareable research.
  2. Introduce the use of Markdown and GitHub for following these principles.
  3. Set up and practice tools for working with Markdown and GitHub.
  4. Establish a common set of tools for the WE1S team.

Advance Preparations

Basic Principles

Documents should be created to conform to the following ideals:

  • Simplicity
  • Clarity
  • Standards

Simplicity

Simplicity is about ease of production—and reproduction.

  • Simple layout and styling
  • Semantic markup where possible

Simple Layout

  • One column and minimal horizontal spacing
  • Minimal graphic elements

Simple Styling

  • Limited number of styling options for text emphasis
  • Avoid applying multiple styles (e.g. bold and italic) to single text elements

Semantic Markup

Markup symbols should describe content or style (but not both).

Clarity

Clarity is about ease of reading.

Clear Markup

  • Minimal tagging
  • Easily-recognized markup symbols

Standards

Standards are about best practices for simplicity and clarity, as well as for easy conversion between formats. Document format, structure, and styling should

  • follow commonly used, rather than project-specific patterns, wherever possible;
  • allow people not connected with your project to read and modify your content;
  • allow digital tools made for general use to process your content.

Reminder

These are principles, not rules.

In some contexts, there are good reasons to set them aside.

Common Types of Markup

  • HTML: Uses tags in angle brackets to (ideally) describe the semantic structure of web pages.
  • CSS: Uses property-value pairs to describe the styling of elements on web pages.
  • XML: Uses tags in angle brackets to describe the semantic structure and styling of any document. For rendering, XML documents are normally transformed into other formats.

These, and similar types of markup are intended to produce “rich” documents, so they contain vocabularies that risk violating the principles of simplicty and clarity.

Markdown

What Is Markdown?

Markdown is a plain text format for writing structured documents. Instead of tags, it uses symbols that were conventional in the early days of email before we had rich text editors.

  • Markdown can be produced in a simple text editor.
  • Markdown has a small number of formatting elements.
  • Markdown is typically converted to HTML for the display on the web but is easily converted to other formats.
  • Sample Markdown Document

Common Uses of Markdown

Increasingly, developers are producing tools that support Markdown (including reveal.js, which was used to produce this slideshow).

People are also using Markdown for general writing because of its ease of use, because it enforces principles of simplicity and clarity, and because it is easy to convert to multiple formats.

Why do I have to learn Markdown?

Can’t I just use Word? I know how to use that.


Even if you know how to use Word well, do you?

Do you really?

Problems with Word

Word hides its (proprietary) markup, which encourages users to be sloppy. As long as it looks all right, who cares?

Word’s powerful features tempt users to violate the principles and best practices enforced by Markdown.

Word would be better if there were a tool to check documents against a schema that describes what a good document should look like. This process is known as “validation”.

There are good reasons for using Word in some contexts, but give Markdown a try, and you'll find yourself using Word less and less.

How to Write Markdown

  • Save your file as plain text. It is conventional to use the extension .md.
  • Any editor will do, but, as we’ll see, a code editor has some advantages.
  • When learning Markdown, it is helpful to use online editors like StackEdit.io.

The Concept of Linting

A linter is any tool that detects and flags errors in programming languages, including stylistic errors...For example, modern lint checkers are often used to find code that does not correspond to certain style guidelines.

—Wikipedia

Markdown Linting

Linting our Markdown (or any other code) is one way that we can ensure that it is well-formed according to the standards of the markup language and valid according to a schema that describes our style practices.

Most online Markdown editors will not correctly render Markdown as HTML if they are not well-formed, but they don’t always tell you what is wrong. A good linting tool will do this.

Quick Aside

A linter for bibliographical citation would be the Holy Grail.

#sigh

Problems with Markdown

  • Markdown does not support some really useful functions like external links and image sizing. For this, you need to fall back on HTML and CSS (that is, you need to know some HTML and CSS).
  • Because Markdown was not initially published as a standard, multiple dialects were developed. This leads to some inconsistency both in the format of Markdown documents and in how parsers render Markdown.

Emerging Standards

A standard known as CommonMark is nearing completion. It is the basis for GitHub-Flavored Markdown, which has some additional extensions used for rendering Markdown on GitHub.

WE1S uses GitHub-Flavored Markdown for all project documents.

Quick Markdown Practice Exercises

Using StackEdit

  1. Go to https://stackedit.io/app.
  2. Click the folder icon at the top left; then the file icon with the plus sign. Enter a name for your Markdown document and hit Enter.
  3. Erase the boilerplate and type your own content. Use the formatting toolbar to introduce formatting and observe the Markdown markup added, as well as how it is rendered in the HTML preview on the right.
  4. Use the Markdown Cheat Sheet (https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) to try adding other types of formatting not included in the toolbar.
  5. We won’t be using StackEdit’s file saving and export features in this workshop.

Using Pandoc

  1. Leave the StackEdit tab open.
  2. In a new tab, go to the Pandoc demo (https://pandoc.org/try/).
  3. Copy some Markdown from your StackEdit document into the Pandoc field and convert it from GitHub-Flavored Markdown to HTML. You can copy the rendered HTML and convert it back to Markdown.
  4. We won’t use other online tools in this workshop, but they work in a similar fashion.

Useful Advice

Online tools are for quick jobs. In a realistic working scenario, it is better to author Markdown documents in a code editor.

Markdown in Visual Studio Code

VS Code provides both code linting and HTML preview for Markdown documents.

Quick Exercise

  • Launch VS Code and open a new file if necessary.
  • Click Plain Text at the bottom right corner of the screen. Type “Markdown” in the search field and select it. This will tell VS Code which language you are using. Saving the file as myfile.md will also switch the language to Markdown.
  • Copy your StackEdit code into your file.
  • Press Ctrl+Shift+V to open a preview. in a new tab. You can view the preview side-by-side (Ctrl+K V) with the file you are editing and see changes reflected in real-time as you edit.

Using the Linter

  • Do you see see squiggly green lines on the screen? Run your mouse over them to see a pop up of the error.
  • If you do not see a light bulb, click a squiggly green line to make one appear. Click the lightbulb to see options. Click the options for more information on the error and how to fix it.

Important: some errors indicate that your code is not well-formed according to the Markdown specification. Other errors are stylistic rules established by the author of the linter extension. The latter are subjective. If you want to turn off individual rules, there are instructions in the markdownlint GitHub repo.

Advanced Markdown

Inline HTML

You can use HTML and CSS in Markdown documents to achieves effects not possible in pure Markdown. Whilst this defeats the purpose of Markdown, there are cases when it is worth it.

Try typing the following into VS Code:


    <b style="color: red;">Some text</b>

How does it display in the preview? Is it acceptable to the linter?

Hard line breaks

A single line break is treated as a space. To force a line break, place two spaces or a “\” at the end of the line.

Try this in VS Code and see what the linting rule is.

Backslash Escapes

Try typing **some text** in VS Code. Notice that it renders as some text in the preview?

But what if you want the asterisks?

For this you will need to “escape” them with backslashes. Type the following in VS Code:


    \*\*some text\*\*

Notice the difference? This works with any Markdown formatting character.

Special Characters

In HTML, entity representations like &thorn; and &#0254; can be used for special characters (both these examples produce þ).

All valid HTML entities are also valid in Markdown.

Tables

It is possible to format complex tables in Markdown, but they are challenging to get right. You can use the Markdown Table Generator to help you out. Table-Magic is also useful for converting to and from CSV format.

Large data tables are not recommended for Markdown documents.

Code and Code Blocks

Inline code is normally represented by back ticks (`). For instance,


  `let x = 3;`

will display with “syntax highlighting”.

If your code includes back ticks, you can use two back ticks to enclose the code.

Code blocks are represented by three back ticks on the line before and after the code snippet. After the first set of back ticks the name of the coding language should be used, e.g.


  ```javascript
  let x = 3;
  ```

Extensions

Some implementations extend the standard. Two useful extensions used on GitHub are “strikethrough” and “task lists”.

Strikethrough is formatted with two tildes: ~~delete this~~ will render as delete this.

Task Lists

Task lists are check boxes:


    - [ ] Uncompleted task
    - [x] Completed task

produces

  • Uncompleted task
  • Completed task

Task lists are only guaranteed to render on GitHub.

Gotchas and Strategies

Images

It is not possible to control the size or alignment with Markdown. Use inline HTML if this is necessary.

GitHub

What is GitHub?

git + cloud storage + social media = Octocat

What is git?

Git is a version control system for tracking changes in computer files and coordinating work on those files among multiple people. It is primarily used for source code management in software development, but it can be used to keep track of changes in any set of files.

From the git source code:

  
  The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your way):
  
    - random three-letter combination that is pronounceable, and not
      actually used by any common UNIX command. The fact that it is a
      mispronunciation of "get" may or may not be relevant.
    - stupid. contemptible and despicable. simple. Take your pick from
      the dictionary of slang.
    - "global information tracker": you're in a good mood, and it
      actually works for you. Angels sing, and a light suddenly fills the room.
    - "goddamn idiotic truckload of shit": when it breaks

The Basic Idea of git

  • Your local workstation computer has folder that clones a repository (“repo”) on the server.
  • You and your collaborators push new versions of your content from your local repository to the remote one after pulling the latest versions from the remote repository.
  • git checks for conflicts in merging content pushed to the repository.
  • git keeps a complete history of the repo, allowing you to roll it back to an earlier state.

GitHub’s 15-minute tryGit tutorial teaches you the command line language but is very good for introducing you to the concepts of git.

GitHub’s Implementation of git

  • The remote repository is stored using GitHub’s cloud-based hosting service. Public repositories (which are free) can be cloned by anyone.
  • GitHub’s website and desktop client allow you to perform functions using git without entering commands in git's archane language.
  • GitHub’s website allows you to associate social-media like features with a repository, such as a discussion forum (called “Issues”) and a wiki.

Uses of GitHub

  • In less than ten years, GitHub has become the dominant platform for hosting open source code in many languages.
  • Although primarily used for programming languages, GitHub is increasingly used for document storage, where the writer wishes to take advantage of git's version control features. The Programming Historian has a tutorial on using GitHub for academic projects.
  • GitHub has also implemented a service called GitHub pages, which allows you to host a website from a GitHub repository. This slideshow is hosted using GitHub pages.

A Quick Tour of GitHub

A repo is like a file system.

GitHub Tour 1

Markdown (and some other) files are rendered as HTML. Click the Raw button to see the actual code.

GitHub Tour 2

The raw code can be copied and pasted or saved with your browser’s Save As function.

GitHub Tour 3

You can clone or download repos directly from the GitHub website.

GitHub Tour 4

Getting Started with GitHub

Concepts

A repository (repo) is stored on both the local machine and the remote GitHub server.

Initially, users clone repos on GitHub. Thereafter, they pull the latest code to keep up to date.

Users perform updates to the repo through a three step process:

  1. They add or modify files in their local folder.
  2. They commit their changes to “stage” them for sending to the remote repo.
  3. They push their commits to GitHub.

The README.md File

A repository typically has a Markdown file called README.md in its root folder. This file describes the content of the repository.

On the GitHub website, the README.md file is automatically rendered on the repo’s web page.

Ways to Interact with GitHub

  • Run git commands on the command line. You can do anything, but the git language is relatively unintuitive.
  • Use the GitHub website. You can perform many, although not all, git functions that modify the remote repo. You cannot push local changes to the server.
  • The GitHub Desktop Client. You can perform many, although not all, git functions, including pulling from and pushing to the remote repo.
  • VS Code. Functions like the GitHub Desktop Client, but you can push and pull from the same environment where you are editing.

Which method do I choose?

Some combination is the most likely scenario. You can make small commits with GitHub’s web editor, but you have to pull the changes to your local repo.

The GitHub Desktop Client is better than VS Code if you are creating or moving around folders, images, PDFs, and so on. VS Code is convenient of you are editing Markdown, HTML, or text files.

Occasionally, you will encounter arcane conflicts which can only be solved by running git from the command line. It is generally necessary to Google solutions to find the appropriate code. Installing the GitHub client will automatically install git on your computer.

Setting up the GitHub Desktop Client

  1. Launch the GitHub Desktop Client.
  2. Select File > Options > Accounts and login. On the Mac, this is under GitHub Desktop > Preferences.
  3. Click the Git tab and enter the username and email you used for your GitHub account.
  4. Click the Advanced tab and select VS Code as your external editor.
  5. Create a folder called GitHub inside your Documents folder. In Windows, this will have the local path C:\Users\YourName\Documents\GitHub. On the Mac it will be ~/Documents/GitHub/.

Cloning a Repo

There are two easy ways to clone a repo.

  • With the GitHub Desktop Client open, click the repo’s Clone or download button on the GitHub website and select Open in Desktop.
  • Alternatively, copy the URL shown there. In the GitHub Desktop client, select File > Clone repository > URL. Enter the URL and the local path to your GitHub folder.

Try this with the workshop sandbox repo: https://github.com/whatevery1says/workshop-sandbox.

Committing and Pushing Changes

  • Open the file my_name.md in VS Code.
  • Make the changes to the file indicated and save the file as your_name.md.
  • Open the GitHub Desktop Client. It should show that you have a new file in the Changes tab. In the Summary section, type a message like “Added your_name.md.” Click Commit to Master.
  • Click the Push Origin tab at the top of the screen. Your file will be pushed to the remote repo.
  • Refresh the webpage on GitHub, and your file should appear.

Pulling Changes

  • In the GitHub Desktop Client, click the Pull Origin tab at the top of the screen to update your local repository folder with the latest changes.
  • Always pull the latest state of the repo before committing your own changes.
  • In VS Code, open the file belonging to the person next to you and add the line “Modified by Your Name”. Save the file.
  • Commit and push your change.

Source Control in VS Code

  • You can work with GitHub directly from VS Code using its Source Control Management features cryptically hidden in View > SCM. You can also click the Source Control icon in VS Code's Activity Bar.

  • VS Code will show a file hierarchy for each file you have open in one of your local repos.

  • Changed files are marked with an “M” (modified). Mouse over the file name and click the Plus icon to stage them for a commit. When all files are staged, type a commit message and click the Check icon.

  • Click the Three Dots icon to see all options, one of which will be Push.

  • Can also pull the latest code from this menu.

Merge Conflicts

Try the exercise in the file resolving-merge-conflicts.md in the workshop sandbox.

Branching and Forking

  • A repo can have multiple branches. A branch is a copy of the repo at a particular point in time, which can be developed separately from the repo’s master and later merged with it.
  • Branches are useful when different collaborators are making extensive changes to a repo, but they are also the most common source of merge conflicts.
  • Users can also fork a repo, which clones the repo in the original GitHub account to their own personal GitHub account.
  • Both branches and forks may be merged with the (original repo’s) master branch with a pull request. This asks the repo’s owner to approve the merged content. Pull requests may be created on the GitHub website or in the GitHub Desktop Client.

Useful Resources

We have only scratched the surface. Here are some useful resources:

Pros and Cons of Using GitHub

Pros:

  • Free public repositories (and private ones with educational accounts)
  • Beautiful rendering of Markdown
  • Powerful version control and social media-like features aid collaboration
  • Wide adoption in the Digital Humanities

Cons:

  • Requires an account from a corporate entity
  • More complex than Google Docs, Dropbox, or other collaboration tools

Considerations for the WE1S Project

WE1S produces content in the following locations:

  • Pages on the WE1S website
  • Blog posts on the WE1S website
  • Archived materials in the WE1S GitHub repo

Markdown First Principle

Wherever possible, content intended only for website pages and blog posts should be authored offline in Markdown and pasted into the Wordpress text editor (click on the Text tab).

The WE1S website uses the Wordpress Jetpack plugin, which automatically converts Markdown to HTML but saves the original Markdown.

Advantages:

  • Encourages simple and consistent formatting across the website.
  • Makes it easy to extract content from Wordpress if it is ever needed for presentation on another platform.

Duplicating Wordpress Content on GitHub

Some content should be archived on GitHub. The best workflow is to push the content to GitHub, copy the Markdown from GitHub, and then paste it into the Wordpress text editor. Updates should be performed on GitHub first and the updated content pasted from there into Wordpress.

Non-textual assets (e.g. images) should be archived on GitHub. The urls used to embed these assets in the web pages should point to the files archived on GitHub, not to the Wordpress media library. The reasoning is that GitHub is a more stable repositor than Wordpress’ media library.

Archiving Content on GitHub

In general, documents should be authored in Markdown, if possible, and then converted to other formats (e.g. Word or PDF).

Documents meant to be readable on the GitHub website should be available in Markdown format. Duplicates in other formats can be stored in the same repo.

More complex resources can be stored as a data package with a manifest.

Manifests

A manifest is a document describing the contents of a repository.

The GitHub README.md file is a type of manifest.

Storing Manifests

Manifests are commonly stored in JSON format. JSON stands for “JavaScript Object Notation” because it is based on the method for storing data in that programming language. The basic format is a keyword-value pair separated by commas and enclosed in curly brackets:


  {
    "name": "Octocat",
    "role": "mascot",
    "company": "GitHub"
  }

VS Code stores its settings in JSON-formatted manifest files, and you need to edit them to configure VS Code.

JSON is an easy format in which to make mistakes. If errors occur, try entering your code an the online JSON Linter.

Other Uses of JSON Manifests

  • WE1S uses a special manifest schema for storing information about its project and workflow in its database.
  • WE1s uses Frictionless Data data packages, which have JSON manifests, for storing content on GitHub.

Structure of a Data Package

  • The GitHub repo may contain a standard README.md file.
  • The GitHub repo MUST contain a JSON file called datapackage.json.
  • The datapackage.json file MUST contain name, title, and resources properties.

The name Property

The value of the name property should be a short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters. It will function as a unique identifier and MUST be the same as name of its container folder.

The title Property

A title or one sentence description of the package.

The resources Property

A list of paths to all files associated with the data package. The list is enclosed in square brackets, and each resource is "path"-value pair enclosed in curly brackets.

Sample Datapackage Manifest


  {
    "name": "how-to-work-with-markdown",
    "title": "How to Work with Markdown",
    "resources": [{
      "path": "how-to-work-with-markdown.md"
    }]
  }

What to Store as a Data Package

See the WE1S Guidelines For Handling Resources.

4Humanities imagemark

The End

Slideshow produced by Scott Kleinman
for the WhatEvery1Says Project.