PageToScreen Logo

Text Processing (removing unwanted paragraph breaks)

Note: You will need the Quicktime Plugin for your web browser. You can get this for free from the Apple web site here.

Thursday, February 14, 2008

If we source a public domain text on the web, such as those from the Gutenberg project, we may well find that each line of thext has a fixed length. This is because the text was probably scanned in and then converted to real text through OCR software. The text is laid out exactly as it was in the original source. This facsimile is not appropriate for our needs.

Watch the screencast | Size: 11Mb

Creative Commons License
This work is licensed under a Creative Commons Attribution License.

PageToScreen