Reads .docx and .doc-files (the latter with support from antiword) and converts them to a XML form that is slightly easier to deal with.
log= <logging.Logger object>¶
read(wordfile, intermediatefp, simplify=True)¶
Converts the word file to a more easily parsed format.
- wordfile – Path to original docfile
- intermediatefp – An open filehandle to write the more parseable file to
filetype (either “doc” or “docx”)
Convert a old Word document (.doc) to a pseudo-docbook file through antiword.
word_to_ooxml(indoc, outfp, simplify)¶
Extracts the raw OOXML file from a modern Word document (.docx).