The WordReader class¶
- class ferenda.WordReader¶
Reads .docx and .doc-files (the latter with support from antiword) and presents a slightly easier API for dealing with them.
- read(wordfile, intermediatefile)¶
Converts the word file to a more easily parsed format.
Parameters: - wordfile – Path to original docfile
- intermediatefile – Where to store the more parseable file
Returns: name of parseable file, filetype (either “doc” or “docx”)
Return type: tuple
- word_to_docbook(indoc, outdoc)¶
Convert a old Word document (.doc) to a pseudo-docbook file through antiword.
- word_to_ooxml(indoc, outdoc)¶
Extracts the raw OOXML file from a modern Word document (.docx).