The Universe of Discourse

Mark Dominus (陶敏修)
mjd@pobox.com

12 recent entries

A puzzle about balancing test tubes in a centrifuge
Proof by insufficient information
Willie Singletary will you please go now?
How our toy octopuses got revenge on a Philadelphia traffic court judge
Does someone really have to do the dirty jobs?
The mathematical past is a foreign country
Baseball on the Moon
Hangeul sign-engraving machine
Claude and Merle Miller let me down
Reflector grids
Jonathan Chait
Claude chokes on graph theory

Archive:

2025: JF M A M
2024: JF M A M J
J ASOND
2023: JF M A M J
J A S O N D
2022: J F M A M J
JAS O N D
2021: J F M AMJ
J A S O N D
2020: J F M A M J
J A S O N D
2019: JFM A M J
J A S O N D
2018: J F M A M J
J A S O N D
2017: J F M A M J
J A S O N D
2016: JF M A M J
JASON D
2015: JFM A M J
J A S O N D
2014: J F M AMJ
JASON D
2013: JFMAMJ
JAS OND
2012: J F MAMJ
JASOND
2011: JFMAM J
JASOND
2010: JFMAMJ
JA S O ND
2009: J F MAM J
JASOND
2008: J F M A M J
JAS O ND
2007: J F M A M J
J A S O N D
2006: J F M A M J
JAS O N D
2005: O N D

Subtopics:

Mathematics 245

Programming 99

Language 95

Miscellaneous 75

Book 50

Tech 49

Etymology 35

Haskell 33

Oops 30

Unix 27

Cosmic Call 25

Math SE 25

Law 22

Physics 21

Perl 17

Biology 15

Brain 15

Calendar 15

Food 15

Comments disabled

Mon, 01 Jan 2018

Converting Google Docs to Markdown

I was on vacation last week and I didn't bring my computer, which has been a good choice in the past. But I did bring my phone, and I spent some quiet time writing various parts of around 20 blog posts on the phone. I composed these in my phone's Google Docs app, which seemed at the time like a reasonable choice.

But when I got back I found that it wasn't as easy as I had expected to get the documents back out. What I really wanted was Markdown. HTML would have been acceptable, since Blosxom accepts that also. I could download a single document in one of several formats, including HTML and ODF, but I had twenty and didn't want to do them one at a time. Google has a bulk download feature, to download a zip file of an entire folder, but upon unzipping I found that all twenty documents had been converted to Microsoft's docx format and I didn't know a good way to handle these. I could not find an option for a bulk download in any other format.

Several tools will compose in Markdown and then export to Google docs, but the only option I found for translating from Google docs to Markdown was Renato Mangini's Google Apps script. I would have had to add the script to each of the 20 files, then run it, and the output appears in email, so for this task, it was even less like what I wanted.

The right answer turned out to be: Accept Google's bulk download of docx files and then use Pandoc to convert the docx to Markdown:

for i in *.docx; do
    echo -n "$i ? ";
    read j; mv -i "$i" $j.docx;
    pandoc --extract-media . -t markdown -o "$(suf "$j" mkdn)" "$j.docx";
done

The read is because I had given the files Unix-unfriendly names like Polyominoes as orthogonal polygons.docx and I wanted to give them shorter names like orthogonal-polyominoes.docx.

The suf command is a little utility that performs the very common task of removing or changing the suffix of a filename. The suf "$j" mkdn command means that if $j is something like foo.docx it should turn into foo.mkdn. Here's the tiny source code:

    #!/usr/bin/perl
    #
    # Usage: suf FILENAME [suffix]
    #
    # If filename ends with a suffix, the suffix is replaced with the given suffix
    # otheriswe, the given suffix is appended
    #
    # For example:
    #   suf foo.bar baz    => foo.baz
    #   suf foo     baz    => foo.baz
    #   suf foo.bar        => foo
    #   suf foo            => foo

    @ARGV == 2 or @ARGV == 1 or usage();
    my ($file, $suf) = @ARGV;
    $file =~ s/\.[^.]*$//;
    if (defined $suf) {
      print "$file.$suf\n";
    } else {
      print "$file\n";
    }

    sub usage {
      print STDERR "Usage: suf filename [newsuffix]\n";
      exit 1;
    }

Often, I feel that I have written too much code, but not this time. Some people might be tempted to add bells and whistles to this: what if the suffix is not delimited by a dot character? What if I only want to change certain suffixes? What if my foot swells up? What if the moon falls out of the sky? Blah blah blah. No, for that we can break out sed.

Next time I go on vacation I will know better and I will not use Google Docs. I don't know yet what instead. StackEdit maybe.

[ Addendum 20180108: Eric Roode pointed out that the program above has a genuine bug: if given a filename like a.b/c.d it truncates the entire b/c.d instead of just the d. The current version fixes this. ]

[Other articles in category /Unix] permanent link