The Universe of Disco


Mon, 01 Jan 2018

Converting Google Docs to Markdown

I was on vacation last week and I didn't bring my computer, which has been a good choice in the past. But I did bring my phone, and I spent some quiet time writing various parts of around 20 blog posts on the phone. I composed these in my phone's Google Docs app, which seemed at the time like a reasonable choice.

But when I got back I found that it wasn't as easy as I had expected to get the documents back out. What I really wanted was Markdown. HTML would have been acceptable, since Blosxom accepts that also. I could download a single document in one of several formats, including HTML and ODF, but I had twenty and didn't want to do them one at a time. Google has a bulk download feature, to download a zip file of an entire folder, but upon unzipping I found that all twenty documents had been converted to Microsoft's docx format and I didn't know a good way to handle these. I could not find an option for a bulk download in any other format.

Several tools will compose in Markdown and then export to Google docs, but the only option I found for translating from Google docs to Markdown was Renato Mangini's Google Apps script. I would have had to add the script to each of the 20 files, then run it, and the output appears in email, so for this task, it was even less like what I wanted.

The right answer turned out to be: Accept Google's bulk download of docx files and then use Pandoc to convert the docx to Markdown:

for i in *.docx; do
    echo -n "$i ? ";
    read j; mv -i "$i" $j.docx;
    pandoc --extract-media . -t markdown -o "$(suf "$j" mkdn)" "$j.docx";
done

The read is because I had given the files Unix-unfriendly names like Polyominoes as orthogonal polygons.docx and I wanted to give them shorter names like orthogonal-polyominoes.docx.

The suf command is a little utility that performs the very common task of removing or changing the suffix of a filename. The suf "$j" mkdn command means that if $j is something like foo.docx it should turn into foo.mkdn. Here's the tiny source code:

    #!/usr/bin/perl
    #
    # Usage: suf FILENAME [suffix]
    #
    # If filename ends with a suffix, the suffix is replaced with the given suffix
    # otheriswe, the given suffix is appended
    #
    # For example:
    #   suf foo.bar baz    => foo.baz
    #   suf foo     baz    => foo.baz
    #   suf foo.bar        => foo
    #   suf foo            => foo

    @ARGV == 2 or @ARGV == 1 or usage();
    my ($file, $suf) = @ARGV;
    $file =~ s/\.[^.]*$//;
    if (defined $suf) {
      print "$file.$suf\n";
    } else {
      print "$file\n";
    }

    sub usage {
      print STDERR "Usage: suf filename [newsuffix]\n";
      exit 1;
    }

Often, I feel that I have written too much code, but not this time. Some people might be tempted to add bells and whistles to this: what if the suffix is not delimited by a dot character? What if I only want to change certain suffixes? What if my foot swells up? What if the moon falls out of the sky? Blah blah blah. No, for that we can break out sed.

Next time I go on vacation I will know better and I will not use Google Docs. I don't know yet what instead. StackEdit maybe.

[ Addendum 20180108: Eric Roode pointed out that the program above has a genuine bug: if given a filename like a.b/c.d it truncates the entire b/c.d instead of just the d. The current version fixes this. ]


[Other articles in category /Unix] permanent link