The Universe of Discourse

Mark Dominus (陶敏修)
mjd@pobox.com

12 recent entries

How are John Waters movies like James Bond movies?
Documentation is a message in a bottle
Bo Diddley
Language models imply world models
John Haugeland on the failure of micro-worlds
Crooked politicians love crab cakes!
Almost-trivial theorems
An anecdote about backward compatibility
My new git utility `what-changed-twice` needs a new name
Mystery of the quincunx's missing quincunx
The fivefold symmetry of the quince
A descriptive theory of seasons in the Mid-Atlantic

Archive:

2026: J F M
2025: JF M A MJ
JASOND
2024: JF M A M J
J ASOND
2023: JF M A M J
J A S O N D
2022: J F M A M J
JAS O N D
2021: J F M AMJ
J A S O N D
2020: J F M A M J
J A S O N D
2019: JFM A M J
J A S O N D
2018: J F M A M J
J A S O N D
2017: J F M A M J
J A S O N D
2016: JF M A M J
JASON D
2015: JFM A M J
J A S O N D
2014: J F M AMJ
JASON D
2013: JFMAMJ
JAS OND
2012: J F MAMJ
JASOND
2011: JFMAM J
JASOND
2010: JFMAMJ
JA S O ND
2009: J F MAM J
JASOND
2008: J F M A M J
JAS O ND
2007: J F M A M J
J A S O N D
2006: J F M A M J
JAS O N D
2005: O N D

Subtopics:

Mathematics 246

Programming 100

Language 95

Miscellaneous 75

Book 50

Tech 49

Etymology 36

Haskell 33

Oops 30

Unix 27

Cosmic Call 25

Math SE 25

Law 23

Physics 21

Perl 17

Biology 16

Brain 15

Calendar 15

Food 15

Comments disabled

Sat, 28 Jul 2007

Lightweight Database Strategies for Perl
Several years ago I got what I thought was a great idea for a three-hour conference tutorial: lightweight data storage techniques. When you don't have enough data to be bothered using a high-performance database, or when your data is simple enough that you don't want to bother with a relational database, you stick it in a flat file and hack up some file code to read it. This is the sort of thing that people do all the time in Perl, and I thought it would be a big seller. I was wrong.

I don't know why. I tried giving the class a snappier title, but that didn't help. I'm really bad at titles. Maybe people are embarrassed to think about all the lightweight data storage hackery they do in Perl, and feel that they "should" be using a relational database, and don't want to commit more resources to lightweight database techniques. Or maybe they just don't think there is very much to know about it.

But there is a lot to know; with a little bit of technique you can postpone the day when you need to go to an RDB, often for quite a long time, and often forever. Many of the techniques fall into the why-didn't-I-think-of-that category, stuff that isn't too weird to write or maintain, but that you might not have thought to try.

I think it's a good class, but since it never sold well, I've decided it would do more good (for me and for everyone else) if I just gave away the materials for free.

The class is in three sections. The first section is about using plain text files and talks about a bunch of useful techniques, such as how to do binary search on sorted text files (this is nontrivial) and how to replace records in-place, when they might not fit.

The second section is about the Tie::File module, which associates a flat text file with a Perl array.

The third section is about DBM files, with a comparison of the five major implementations. It finishes up with a discussion of some of Berkeley DB's lesser-known useful features, such as its DB_BTREE file type, which offers fast access like a hash but keeps the records in sorted order

Text Files
- Rotating log file; deleting a user
- Copy the File
  - -i.bak
  - Using -i inside a program
  - Problems with -i
  - Atomicity issues
- Essential problem with files; fundamental operations; seeking
- Sorted files
- In-place modification of records
  - Overwriting records
  - Bytes vs. positions
  - Gappy Files
  - Fixed-length records
  - Numeric indices
  - Case study: lastlog
- Indexing
  - Void fields
  - Generic text indices
  - Packed offsets
Tie::File
- Tie::File Examples
- delete_user revisited
- uppercase_username revisited
- Rotating log file revisited
- Most important thing to know about Tie::File
- Indexing with Tie::File
- Tie::File Internals
  - Caching
  - Record modification
  - Immediate vs. Deferred Writing
  - Autodeferring
- Miscellaneous Features
DBM
- Common DBM Implementations
- What DBM Does
- Small DBMs: ODBM, NDBM, and SDBM
- GDBM
- DB_File
  - Indexing revisited
  - Ordered hashes
  - Partial matching
  - Sequential access
  - Multiple values
  - Filters
  - BerkeleyDB

Online materials

Class slides:

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.
- PDF files:
- Browse slides online
Sample source code referred to in the class:

Example source code from Lightweight Databases class is licensed under a Creative Commons Public Domain License.
- Browse the directory
- TGZ file
People sometimes ask what use Tie::File is when Berkeley DB has a DB_RECNO option that appears to be the same thing. This document explains why.

[Other articles in category /prog/perl] permanent link

The Universe of Discourse

Table of Contents

Online materials