The TextReader class¶
- class ferenda.TextReader(filename=None, encoding=None, string=None, linesep=None)¶
Fancy file-like-class for reading (not writing) text files by line, paragraph, page or any other user-defined unit of text, with support for peeking ahead and looking backwards. It can read files with byte streams using different encodings, but converts/handles everything to real strings (unicode in python 2). Alternatively, it can be initialized from an existing string.
Parameters: - UNIX = '\n'¶
Unix line endings, for use with the linesep parameter.
- DOS = '\r\n'¶
Dos/Windows line endings, for use with the linesep parameter.
- MAC = '\r'¶
Old-style Mac line endings, for use with the linesep parameter.
- eof()¶
Returns True iff current seek position is at end of file.
- bof()¶
Returns True iff current seek position is at begining of file.
- cue(string)¶
Set seek position at the beginning of string, starting at current seek position. Raises IOError if string not found.
- cuepast(string)¶
Set seek position at the beginning of string, starting at current seek position. Raises IOError if string not found.
- readto(string)¶
Read and return all text between current seek potition and string. Sets new seek position at the start of string. Raises IOError if string not found.
- readparagraph()¶
Reads and returns the next paragraph (all text up to two or more consecutive line separators).
- readpage()¶
Reads and returns the next page (all text up to next form feed, "\f")
- readchunk(delimiter)¶
Reads and returns the next chunk of text up to delimiter
- lastread()¶
Returns the last chunk of data that was actually read (i.e. the peek* and prev* methods do not affect this)
- peekline(times=1)¶
Works like readline(), but does not affect current seek position. If times is specified, peeks that many lines ahead.
- peekparagraph(times=1)¶
Works like readparagraph(), but does not affect current seek position. If times is specified, peeks that many paragraphs ahead.
- peekchunk(delimiter, times=1)¶
Works like readchunk(), but does not affect current seek position. If times is specified, peeks that many chunks ahead.
- prev(size=0)¶
Works like read(), but reads backwards from current seek position, and does not affect it.
- prevline(times=1)¶
Works like readline(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the line that many times back.
- prevparagraph(times=1)¶
Works like readparagraph(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the paragraph that many times back.
- prevchunk(delimiter, times=1)¶
Works like readchunk(), but reads backwards from current seek position, and does not affect it. If times is specified, reads the chunk that many times back.
- getreader(callableObj, *args, **kwargs)¶
Enables you to treat the result of any single read*, peek* or prev* methods as a new TextReader. Particularly useful to process individual pages in page-oriented documents:
filereader = TextReader("rfc822.txt") firstpagereader = filereader.getreader(filereader.readpage) # firstpagereader is now a standalone TextReader that only # contains the first page of text from rfc822.txt filereader.seek(0) # reset current seek position page5reader = filereader.getreader(filereader.peekpage, times=5) # page5reader now contains the 5th page of text from rfc822.txt
- getiterator(callableObj, *args, **kwargs)¶
Returns an iterator:
filereader = TextReader(“dashed.txt”) # dashed.txt contains paragraphs separated by “—-” for para in filereader.getiterator(filereader.readchunk, “—-”):
print(para)
- flush()¶
See io.IOBase.flush(). This is a no-op.
- read(size=0)¶
See io.TextIOBase.read().
- readline(size=None)¶
-
Note
The size parameter is not supported.
- seek(offset, whence=0)¶
See io.TextIOBase.seek().
Note
The whence parameter is not supported.
- tell()¶
See io.TextIOBase.tell().
- write(str)¶
-
Note
Always raises IOError, as TextReader is a read-only object.
- writelines(sequence)¶
-
Note
Always raises IOError, as TextReader is a read-only object.
- next()¶
Backwards-compatibility alias for iterating over a file in python 2. Use getiterator() to make iteration work over anything other than lines (eg paragraphs, pages, etc).