The WordReader
class¶
-
class
ferenda.
WordReader
[source]¶ Reads .docx and .doc-files (the latter with support from antiword) and converts them to a XML form that is slightly easier to deal with.
-
log
= <logging.Logger object>¶
-
read
(wordfile, intermediatefp, simplify=True)[source]¶ Converts the word file to a more easily parsed format.
Parameters: - wordfile – Path to original docfile
- intermediatefp – An open filehandle to write the more parseable file to
Returns: filetype (either “doc” or “docx”)
Return type:
-