The Universe of Discourse

Sat, 28 Jul 2007

Lightweight Database Strategies for Perl
Several years ago I got what I thought was a great idea for a three-hour conference tutorial: lightweight data storage techniques. When you don't have enough data to be bothered using a high-performance database, or when your data is simple enough that you don't want to bother with a relational database, you stick it in a flat file and hack up some file code to read it. This is the sort of thing that people do all the time in Perl, and I thought it would be a big seller. I was wrong.

I don't know why. I tried giving the class a snappier title, but that didn't help. I'm really bad at titles. Maybe people are embarrassed to think about all the lightweight data storage hackery they do in Perl, and feel that they "should" be using a relational database, and don't want to commit more resources to lightweight database techniques. Or maybe they just don't think there is very much to know about it.

But there is a lot to know; with a little bit of technique you can postpone the day when you need to go to an RDB, often for quite a long time, and often forever. Many of the techniques fall into the why-didn't-I-think-of-that category, stuff that isn't too weird to write or maintain, but that you might not have thought to try.

I think it's a good class, but since it never sold well, I've decided it would do more good (for me and for everyone else) if I just gave away the materials for free.

Table of Contents

The class is in three sections. The first section is about using plain text files and talks about a bunch of useful techniques, such as how to do binary search on sorted text files (this is nontrivial) and how to replace records in-place, when they might not fit.

The second section is about the Tie::File module, which associates a flat text file with a Perl array.

The third section is about DBM files, with a comparison of the five major implementations. It finishes up with a discussion of some of Berkeley DB's lesser-known useful features, such as its DB_BTREE file type, which offers fast access like a hash but keeps the records in sorted order

  • Text Files
    • Rotating log file; deleting a user
    • Copy the File
      • -i.bak
      • Using -i inside a program
      • Problems with -i
      • Atomicity issues
    • Essential problem with files; fundamental operations; seeking
    • Sorted files
    • In-place modification of records
      • Overwriting records
      • Bytes vs. positions
      • Gappy Files
      • Fixed-length records
      • Numeric indices
      • Case study: lastlog
    • Indexing
      • Void fields
      • Generic text indices
      • Packed offsets
  • Tie::File
    • Tie::File Examples
    • delete_user revisited
    • uppercase_username revisited
    • Rotating log file revisited
    • Most important thing to know about Tie::File
    • Indexing with Tie::File
    • Tie::File Internals
      • Caching
      • Record modification
      • Immediate vs. Deferred Writing
      • Autodeferring
    • Miscellaneous Features
  • DBM
    • Common DBM Implementations
    • What DBM Does
    • Small DBMs: ODBM, NDBM, and SDBM
    • GDBM
    • DB_File
      • Indexing revisited
      • Ordered hashes
      • Partial matching
      • Sequential access
      • Multiple values
      • Filters
      • BerkeleyDB

Online materials

[Other articles in category /prog/perl] permanent link