txt2pml.py
-
Comments:
- here.
I found a heap of Palm TEXt REAd documents, but I don’t like the way they appear as (Doc) in PalmReader. So I wrote a script to convert them to PML, the format used by DropBook to make PNRd PPrs documents.
Basically, I found that on OS X the translation of txt2pdbdoc -d (decode back into text) wasn’t so good; a heap of characters needed to be changed.
I did have them all listed here, but if I edit them, ecto fucks up the encoding!
I also changed the ===
to an 80% horizontal line.
The [[[ - ]]]
blocks were indented, and a footer line [* <Text>]
is indented also.
I assumed the only use of a /
was for italics, and _
for underlining.
I also assume the first non-empty, non ===
line is the Title, and the Author line starts with By. I use this info to create a ‘Title Page’.
The tricky bit was getting the Chapter Heading sorted, I needed to break the text into a list of strings to do this, and scan through. This slows the script down a lot, but it still works okay. I might profile it a bit and see where the slowdown is.
Anyway, here’s the latest version I’ve uploaded: txt2pml.py
I plan to make a version to process Project Gutenberg texts, but that’s on the back burner.