The CitationParser class

class ferenda.CitationParser(*grammars)[source]

Finds citations to documents and other resources in text strings. Each type of citation is specified by a pyparsing grammar, and for each found citation a URI can be constructed using a URIFormatter object.

Parameters:grammars (list of pyparsing.ParserElement objects) – The grammar(s) for the citations that this parser should find, in order of priority.

Usage:

>>> from pyparsing import Word,nums
>>> rfc_grammar = ("RFC " + Word(nums).setResultsName("rfcnumber")).setResultsName("rfccite")
>>> pep_grammar = ("PEP" +  Word(nums).setResultsName("pepnumber")).setResultsName("pepcite")
>>> citparser = CitationParser(rfc_grammar, pep_grammar)
>>> res = citparser.parse_string("The WSGI spec (PEP 333) references RFC 2616 (The HTTP spec)")
>>> # res is a list of strings and/or pyparsing.ParseResult objects
>>> from ferenda import URIFormatter
>>> from ferenda.elements import Link
>>> f = URIFormatter(('rfccite',
...                   lambda p: "http://www.rfc-editor.org/rfc/rfc%(rfcnumber)s" % p),
...                  ('pepcite',
...                   lambda p: "http://www.python.org/dev/peps/pep-0%(pepnumber)s/" % p))
>>> citparser.set_formatter(f)
>>> res = citparser.parse_recursive(["The WSGI spec (PEP 333) references RFC 2616 (The HTTP spec)"])
>>> res == ['The WSGI spec (', Link('PEP 333',uri='http://www.python.org/dev/peps/pep-0333/'), ') references ', Link('RFC 2616',uri='http://www.rfc-editor.org/rfc/rfc2616'), ' (The HTTP spec)']
True
set_formatter(formatter)[source]

Specify how found citations are to be formatted when using parse_recursive()

Parameters:formatter (URIFormatter) – The formatter object to use for all citations
add_grammar(grammar)[source]

Add another grammar.

Parameters:grammar (pyparsing.ParserElement) – The grammar to add
parse_string(string, predicate='dcterms:references')[source]

Find any citations in a text string, using the configured grammars.

Parameters:string (str) – Text to parse for citations
Returns:strings (for parts of the input text that do not contain any citation) and/or tuples (for found citation) consisting of (string, pyparsing.ParseResult)
Return type:list
parse_recursive(part, predicate='dcterms:references')[source]

Traverse a nested tree of elements, finding citations in any strings contained in the tree. Found citations are marked up as Link elements with the uri constructed by the URIFormatter set by set_formatter().

Parameters:part (list) – The root element of the structure to parse
Returns:a correspondingly nested structure.
Return type:list