The TextReader class

class ferenda.TextReader(filename=None, encoding=None, string=None, linesep=None)

Fancy file-like-class for reading (not writing) text files by line, paragraph, page or any other user-defined unit of text, with support for peeking ahead and looking backwards. It can read files with byte streams using different encodings, but converts/handles everything to real strings (unicode in python 2). Alternatively, it can be initialized from an existing string.

Parameters:
  • filename (str) – The file to read
  • encoding (str) – The encoding used by the file (default ascii)
  • string (str) – Alternatively, a string used for initialization
  • linesep (str) – The line separators used in the file/string
UNIX = '\n'

Unix line endings, for use with the linesep parameter.

DOS = '\r\n'

Dos/Windows line endings, for use with the linesep parameter.

MAC = '\r'

Old-style Mac line endings, for use with the linesep parameter.

eof()

Returns True iff current seek position is at end of file.

bof()

Returns True iff current seek position is at begining of file.

cue(string)

Set seek position at the beginning of string, starting at current seek position. Raises IOError if string not found.

cuepast(string)

Set seek position at the beginning of string, starting at current seek position. Raises IOError if string not found.

readto(string)

Read and return all text between current seek potition and string. Sets new seek position at the start of string. Raises IOError if string not found.

readparagraph()

Reads and returns the next paragraph (all text up to two or more consecutive line separators).

readpage()

Reads and returns the next page (all text up to next form feed, "\f")

readchunk(delimiter)

Reads and returns the next chunk of text up to delimiter

lastread()

Returns the last chunk of data that was actually read (i.e. the peek* and prev* methods do not affect this)

peek(size=0)

Works like read(), but does not affect current seek position.

peekline(times=1)

Works like readline(), but does not affect current seek position. If times is specified, peeks that many lines ahead.

peekparagraph(times=1)

Works like readparagraph(), but does not affect current seek position. If times is specified, peeks that many paragraphs ahead.

peekchunk(delimiter, times=1)

Works like readchunk(), but does not affect current seek position. If times is specified, peeks that many chunks ahead.

prev(size=0)

Works like read(), but reads backwards from current seek position, and does not affect it.

prevline(times=1)

Works like readline(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the line that many times back.

prevparagraph(times=1)

Works like readparagraph(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the paragraph that many times back.

prevchunk(delimiter, times=1)

Works like readchunk(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the chunk that many times back.

getreader(callableObj, *args, **kwargs)

Enables you to treat the result of any single read*, peek* or prev* methods as a new TextReader. Particularly useful to process individual pages in page-oriented documents:

filereader = TextReader("rfc822.txt")
firstpagereader = filereader.getreader(filereader.readpage)
# firstpagereader is now a standalone TextReader that only
# contains the first page of text from rfc822.txt
filereader.seek(0) # reset current seek position
page5reader = filereader.getreader(filereader.peekpage, times=5)
# page5reader now contains the 5th page of text from rfc822.txt
getiterator(callableObj, *args, **kwargs)

Returns an iterator:

filereader = TextReader(“dashed.txt”) # dashed.txt contains paragraphs separated by “—-” for para in filereader.getiterator(filereader.readchunk, “—-”):

print(para)
flush()

See io.IOBase.flush(). This is a no-op.

read(size=0)

See io.TextIOBase.read().

readline(size=None)

See io.TextIOBase.readline().

Note

The size parameter is not supported.

seek(offset, whence=0)

See io.TextIOBase.seek().

Note

The whence parameter is not supported.

tell()

See io.TextIOBase.tell().

write(str)

See io.TextIOBase.write().

Note

Always raises IOError, as TextReader is a read-only object.

writelines(sequence)

See io.IOBase.writelines().

Note

Always raises IOError, as TextReader is a read-only object.

next()

Backwards-compatibility alias for iterating over a file in python 2. Use getiterator() to make iteration work over anything other than lines (eg paragraphs, pages, etc).