The WordReader class¶
-
class
ferenda.WordReader[source]¶ Reads .docx and .doc-files (the latter with support from antiword) and converts them to a XML form that is slightly easier to deal with.
-
log= <logging.Logger object>¶
-
read(wordfile, intermediatefile)[source]¶ Converts the word file to a more easily parsed format.
Parameters: - wordfile – Path to original docfile
- intermediatefile – Where to store the more parseable file
Returns: name of parseable file, filetype (either “doc” or “docx”)
Return type:
-