The CitationParser
class¶
-
class
ferenda.
CitationParser
(*grammars)[source]¶ Finds citations to documents and other resources in text strings. Each type of citation is specified by a pyparsing grammar, and for each found citation a URI can be constructed using a
URIFormatter
object.Parameters: grammars (list of pyparsing.ParserElement
objects) – The grammar(s) for the citations that this parser should find, in order of priority.Usage:
>>> from pyparsing import Word,nums >>> rfc_grammar = ("RFC " + Word(nums).setResultsName("rfcnumber")).setResultsName("rfccite") >>> pep_grammar = ("PEP" + Word(nums).setResultsName("pepnumber")).setResultsName("pepcite") >>> citparser = CitationParser(rfc_grammar, pep_grammar) >>> res = citparser.parse_string("The WSGI spec (PEP 333) references RFC 2616 (The HTTP spec)") >>> # res is a list of strings and/or pyparsing.ParseResult objects >>> from ferenda import URIFormatter >>> from ferenda.elements import Link >>> f = URIFormatter(('rfccite', ... lambda p: "http://www.rfc-editor.org/rfc/rfc%(rfcnumber)s" % p), ... ('pepcite', ... lambda p: "http://www.python.org/dev/peps/pep-0%(pepnumber)s/" % p)) >>> citparser.set_formatter(f) >>> res = citparser.parse_recursive(["The WSGI spec (PEP 333) references RFC 2616 (The HTTP spec)"]) >>> res == ['The WSGI spec (', Link('PEP 333',uri='http://www.python.org/dev/peps/pep-0333/'), ') references ', Link('RFC 2616',uri='http://www.rfc-editor.org/rfc/rfc2616'), ' (The HTTP spec)'] True
-
set_formatter
(formatter)[source]¶ Specify how found citations are to be formatted when using
parse_recursive()
Parameters: formatter ( URIFormatter
) – The formatter object to use for all citations
-
add_grammar
(grammar)[source]¶ Add another grammar.
Parameters: grammar ( pyparsing.ParserElement
) – The grammar to add
-
parse_string
(string, predicate='dcterms:references')[source]¶ Find any citations in a text string, using the configured grammars.
Parameters: string (str) – Text to parse for citations Returns: strings (for parts of the input text that do not contain any citation) and/or tuples (for found citation) consisting of (string, pyparsing.ParseResult
)Return type: list
-
parse_recursive
(part, predicate='dcterms:references')[source]¶ Traverse a nested tree of elements, finding citations in any strings contained in the tree. Found citations are marked up as
Link
elements with the uri constructed by theURIFormatter
set byset_formatter()
.Parameters: part (list) – The root element of the structure to parse Returns: a correspondingly nested structure. Return type: list
-