The TextReader class

class ferenda.TextReader(filename=None, encoding=None, string=None, linesep=None)[source]

Fancy file-like-class for reading (not writing) text files by line, paragraph, page or any other user-defined unit of text, with support for peeking ahead and looking backwards. It can read files with byte streams using different encodings, but converts/handles everything to real strings (unicode in python 2). Alternatively, it can be initialized from an existing string.

Parameters:
  • filename (str) – The file to read
  • encoding (str) – The encoding used by the file (default ascii)
  • string (str) – Alternatively, a string used for initialization
  • linesep (str) – The line separators used in the file/string
UNIX = '\n'

Unix line endings, for use with the linesep parameter.

DOS = '\r\n'

Dos/Windows line endings, for use with the linesep parameter.

MAC = '\r'

Old-style Mac line endings, for use with the linesep parameter.

eof()[source]

Returns True iff current seek position is at end of file.

bof()[source]

Returns True iff current seek position is at begining of file.

cue(string)[source]

Set seek position at the beginning of string, starting at current seek position. Raises IOError if string not found.

cuepast(string)[source]

Set seek position at the beginning of string, starting at current seek position. Raises IOError if string not found.

readto(string)[source]

Read and return all text between current seek potition and string. Sets new seek position at the start of string. Raises IOError if string not found.

readparagraph()[source]

Reads and returns the next paragraph (all text up to two or more consecutive line separators).

readpage()[source]

Reads and returns the next page (all text up to next form feed, "\f")

readchunk(delimiter)[source]

Reads and returns the next chunk of text up to delimiter

lastread()[source]

Returns the last chunk of data that was actually read (i.e. the peek* and prev* methods do not affect this)

peek(size=0)[source]

Works like read(), but does not affect current seek position.

peekline(times=1)[source]

Works like readline(), but does not affect current seek position. If times is specified, peeks that many lines ahead.

peekparagraph(times=1)[source]

Works like readparagraph(), but does not affect current seek position. If times is specified, peeks that many paragraphs ahead.

peekchunk(delimiter, times=1)[source]

Works like readchunk(), but does not affect current seek position. If times is specified, peeks that many chunks ahead.

prev(size=0)[source]

Works like read(), but reads backwards from current seek position, and does not affect it.

prevline(times=1)[source]

Works like readline(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the line that many times back.

prevparagraph(times=1)[source]

Works like readparagraph(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the paragraph that many times back.

prevchunk(delimiter, times=1)[source]

Works like readchunk(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the chunk that many times back.

getreader(callableObj, *args, **kwargs)[source]

Enables you to treat the result of any single read*, peek* or prev* methods as a new TextReader. Particularly useful to process individual pages in page-oriented documents:

filereader = TextReader("rfc822.txt")
firstpagereader = filereader.getreader(filereader.readpage)
# firstpagereader is now a standalone TextReader that only
# contains the first page of text from rfc822.txt
filereader.seek(0) # reset current seek position
page5reader = filereader.getreader(filereader.peekpage, times=5)
# page5reader now contains the 5th page of text from rfc822.txt
getiterator(callableObj, *args, **kwargs)[source]

Returns an iterator:

filereader = TextReader(“dashed.txt”) # dashed.txt contains paragraphs separated by “—-” for para in filereader.getiterator(filereader.readchunk, “—-“):

print(para)
flush()[source]

See io.IOBase.flush(). This is a no-op.

read(size=0)[source]

See io.TextIOBase.read().

readline(size=None)[source]

See io.TextIOBase.readline().

Note

The size parameter is not supported.

seek(offset, whence=0)[source]

See io.TextIOBase.seek().

Note

The whence parameter is not supported.

tell()[source]

See io.TextIOBase.tell().

write()[source]

See io.TextIOBase.write().

Note

Always raises IOError, as TextReader is a read-only object.

writelines()[source]

See io.IOBase.writelines().

Note

Always raises IOError, as TextReader is a read-only object.

next()

Backwards-compatibility alias for iterating over a file in python 2. Use getiterator() to make iteration work over anything other than lines (eg paragraphs, pages, etc).