The DocumentRepository class

class ferenda.DocumentRepository(**kwargs)

Base class for downloading, parsing and generating HTML versions of a repository of documents.

Start building your application by subclassing this class, and then override methods in order to customize the downloading, parsing and generation behaviour.

Parameters:**kwargs – Any named argument overrides any similarly-named configuration file parameter.

Example:

>>> class MyRepo(DocumentRepository):
...     alias="myrepo"
...
>>> d = MyRepo(datadir="/tmp/ferenda")
>>> d.store.downloaded_path("mybasefile").replace(os.sep,'/') == '/tmp/ferenda/myrepo/downloaded/mybasefile.html'
True

Note

This class has a ridiculous amount of methods that you can override to control most of ferendas behaviour in all stages. For basic usage, you need only a fraction of them. Please don’t be intimidated/horrified.

downloaded_suffix = u'.html'

File suffix for the main document format. Determines the suffix of downloaded files.

storage_policy = u'file'

Some repositories have documents in several formats, documents split amongst several files or embedded resources. If storage_policy is set to dir, then each document gets its own directory (the default filename being index +suffix), otherwise each doc gets stored as a file in a directory with other files. Affects ferenda.DocumentStore.path() (and therefore all other *_path methods)

alias = u'base'

A short name for the class, used by the command line ferenda-build.py tool. Also determines where to store downloaded, parsed and generated files. When you subclass DocumentRepository you must override this.

namespaces = [u'rdf', u'rdfs', u'xsd', u'xsi', u'dct', u'skos', u'foaf', u'xhv', u'owl', u'prov', u'bibo']

The namespaces that are included in the XHTML and RDF files generated by parse(). This can be a list of strings, in which case the strings are assumed to be well-known prefixes to established namespaces, or a list of (prefix, namespace) tuples. All well-known prefixes are available in ferenda.util.ns.

required_predicates = [rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type')]

A list of RDF predicates that should be present in the outdata. If any of these are missing, a warning is logged.

start_url = u'http://example.org/'

The main entry page for the remote web store of documents. May be a list of documents, a search form or whatever. If it’s something more complicated than a simple list of documents, you need to override download() in order to tell which documents are to be downloaded.

document_url_template = u'http://example.org/docs/%(basefile)s.html'

A string template for creating URLs for individual documents on the remote web server. Directly used by remote_url() and indirectly by download_single().

document_url_regex = u'http://example.org/docs/(?P<basefile>\\w+).html'

A regex that matches URLs for individual documents – the reverse of what document_url_template is used for. Used by download() to find suitable links if basefile_regex doesn’t match. Must define the named group basefile using the (?P<basefile>...) syntax

basefile_regex = u'^ID: ?(?P<basefile>[\\w\\d\\:\\/]+)$'

A regex for matching document names in link text, as used by download(). Must define a named group basefile, just like document_url_template.

rdf_type = rdflib.term.URIRef(u'http://xmlns.com/foaf/0.1/Document')

The RDF type of the documents you are handling (expressed as a rdflib.term.URIRef object).

source_encoding = u'utf-8'

The character set that the source HTML documents use (if applicable).

lang = u'en'

The language which the source documents are assumed to be written in (unless otherwise specified), and the language which output document should use.

parse_content_selector = u'body'

CSS selector used to select the main part of the document content by the default parse() implementation.

parse_filter_selectors = [u'script']

CSS selectors used to filter/remove certain parts of the document content by the default parse() implementation.

xslt_template = u'res/xsl/generic.xsl'

A template used by generate() to transform the XML file into browser-ready HTML. If your document type is complex, you might want to override this (and write your own XSLT transform). You should include base.xslt in that template, though.

sparql_annotations = u'res/sparql/annotations.rq'

A template SPARQL CONSTRUCT query for document annotations.

documentstore_class

alias of DocumentStore

config
get_default_options()

Returns the class’ configuration default configuration properties. These can be overridden by a configution file, or by named arguments to __init__(). See Configuration for a list of standard configuration properties (your subclass is free to define and use additional configuration properties).

Returns:default configuration properties
Return type:dict
classmethod setup(action, config)

Runs before any of the *_all methods starts executing. It just calls the appropriate setup method, ie if action is parse, then this method calls parse_all_setup (if defined) with the config object as single parameter.

classmethod teardown(action, config)

Runs after any of the *_all methods has finished executing. It just calls the appropriate teardown method, ie if action is parse, then this method calls parse_all_teardown (if defined) with the config object as single parameter.

get_archive_version(basefile)

Get a version identifier for the current version of the document identified by basefile.

The default implementation simply increments most recent archived version identifier, starting at “1”. If versions in your docrepo are normally identified in some other way (such as SCM revision numbers, dates or similar) you should override this method to return those identifiers.

Parameters:basefile (str) – The basefile of the document to archive
Returns:The version identifier for the current version of the document.
Return type:str
qualified_class_name()

The qualified class name of this class

Returns:class name (e.g. ferenda.DocumentRepository)
Return type:str
canonical_uri(basefile)

The canonical URI for the document identified by basefile.

Returns:The canonical URI
Return type:str
dataset_uri(param=None, value=None)

Returns the URI that identifies the dataset that this docrepository provides. The default implementation is based on the url config parameter and the alias attribute of the class, c.f. http://localhost:8000/dataset/base.

Parameters:
  • param – An optional parameter name represeting a way of createing a subset of the dataset (eg. all document whose title starts with a particular letter)
  • value – A value for param (eg. “a”)
>>> d = DocumentRepository()
>>> d.alias == 'base'
True
>>> d.config.url = "http://example.org/"
>>> d.dataset_uri() == 'http://example.org/dataset/base'
True
>>> d.dataset_uri("title","a") == 'http://example.org/dataset/base?title=a'
True
basefile_from_uri(uri)
The reverse of
canonical_uri(). Returns None if the uri doesn’t map to a basefile in this repo.
>>> d = DocumentRepository()
>>> d.alias == "base"
True
>>> d.config.url = "http://example.org/"
>>> d.basefile_from_uri("http://example.org/res/base/123/a") == "123/a"
True
>>> d.basefile_from_uri("http://example.org/res/base/123/a#S1") == "123/a"
True
>>> d.basefile_from_uri("http://example.org/res/other/123/a") # None
dataset_params_from_uri(uri)
Given a parametrized dataset URI, return the parameter and value
used (or an empty tuple, if it is a dataset URI handled by this repo, but without any parameters).
>>> d = DocumentRepository()
>>> d.alias == 'base'
True
>>> d.config.url = "http://example.org/"
>>> d.dataset_params_from_uri("http://example.org/dataset/base?title=a") == ('title', 'a')
True
>>> d.dataset_params_from_uri("http://example.org/dataset/base") == ()
True
download(*args, **kwargs)

Downloads all documents from a remote web service.

The default generic implementation assumes that all documents are linked from a single page (which has the url of start_url), that they all have URLs matching the document_url_regex or that the link text is always equal to basefile (as determined by basefile_regex). If these assumptions don’t hold, you need to override this method.

If you do override it, your download method should read and set the lastdownload parameter to either the datetime of the last download or any other module-specific string (id number or similar).

You should also read the refresh parameter. If it is True (the default), then you should call download_single() for every basefile you encounter, even though they may already exist in some form on disk. download_single() will normally be using conditional GET to see if there is a newer version available.

See Writing your own download implementation for more details.

Returns:True if any document was downloaded, False otherwise.
Return type:bool
download_get_basefiles(params)

Given source (a iterator that provides (element, attribute, link, pos) tuples, like lxml.etree.iterlinks()), generate tuples (basefile, link) for all document links found in source.

download_single(basefile, url=None)

Downloads the document from the web (unless explicitly specified, the URL to download is determined by document_url_template combined with basefile, the location on disk is determined by the function downloaded_path()).

If the document exists on disk, but the version on the web is unchanged (determined using a conditional GET), the file on disk is left unchanged (i.e. the timestamp is not modified).

Parameters:
  • basefile (string) – The basefile of the document to download
  • url (str) – The URL to download (optional)
Returns:

True if the document was downloaded and stored on disk, False if the file on disk was not updated.

download_if_needed(url, basefile, archive=True, filename=None, sleep=1)

Downloads a remote resource to a local file. If a different version is already in place, archive that old version.

Parameters:
  • url (str) – The url to download
  • basefile (str) – The basefile of the document to download
  • archive (bool) – Whether to archive existing older versions of the document, or just delete the previously downloaded file.
  • filename (str) – The filename to download to. If not provided, the filename is derived from the supplied basefile
Returns:

True if the local file was updated (and archived), False otherwise.

Return type:

bool

download_is_different(existing, new)

Returns True if the new file is semantically different from the existing file.

remote_url(basefile)

Get the URL of the source document at it’s remote location, unless the source document is fetched by other means or if it cannot be computed from basefile only. The default implementation uses document_url_template to calculate the url.

Example:

>>> d = DocumentRepository()
>>> d.remote_url("123/a") == 'http://example.org/docs/123/a.html'
True
>>> d.document_url_template = "http://mysite.org/archive/%(basefile)s/"
>>> d.remote_url("123/a") == 'http://mysite.org/archive/123/a/'
True
Parameters:basefile (str) – The basefile of the source document
Returns:The remote url where the document can be fetched, or None.
Return type:str
generic_url(basefile, maindir, suffix)

Analogous to ferenda.DocumentStore.path(), calculate the full local url for the given basefile and stage of processing.

Parameters:
  • basefile (str) – The basefile for which to calculate the local url
  • maindir (str) – The processing stage directory (normally downloaded, parsed, or generated)
  • suffix (str) – The file extension including period (i.e. .txt, not txt)
Returns:

The local url

Return type:

str

downloaded_url(basefile)

Get the full local url for the downloaded file for the given basefile.

Parameters:basefile (str) – The basefile for which to calculate the local url
Returns:The local url
Return type:str
>>> d = DocumentRepository()
>>> d.downloaded_url("123/a") == "http://localhost:8000/base/downloaded/123/a.html"
True
classmethod parse_all_setup(config)

Runs any action needed prior to parsing all documents in a docrepo. The default implementation does nothing.

Note

This is a classmethod for now (and that’s why a config object is passsed as an argument), but might change to a instance method.

classmethod parse_all_teardown(config)

Runs any cleanup action needed after parsing all documents in a docrepo. The default implementation does nothing.

Note

Like parse_all_setup() this might change to a instance method.

parseneeded(basefile)

Returns True iff there is a need to parse the given basefile. If the resulting parsed file exists and is newer than the downloaded file, there is typically no reason to parse the file.

parse(basefile)

Parse downloaded documents into structured XML and RDF.

It will also save the same RDF statements in a separate RDF/XML file.

You will need to provide your own parsing logic, but often it’s easier to just override parse_from_soup (assuming your indata is in a HTML format parseable by BeautifulSoup) and let the base class read and write the files.

If your data is not in a HTML format, or BeautifulSoup is not an appropriate parser to use, override this method.

Parameters:doc (ferenda.Document) – The document object to fill in.
soup_from_basefile(basefile, encoding=u'utf-8', parser=u'lxml')

Load the downloaded document for basefile into a BeautifulSoup object

Parameters:
  • basefile (str) – The basefile for the downloaded document to parse
  • encoding (str) – The encoding of the downloaded document
Returns:

The parsed document as a BeautifulSoup object

Note

Helper function. You probably don’t need to override it.

parse_metadata_from_soup(soup, doc)

Given a BeautifulSoup document, retrieve all document-level metadata from it and put it into the given doc object’s meta property.

Note

The default implementation sets rdf:type, dct:title, dct:identifier and prov:wasGeneratedBy properties in doc.meta, as well as setting the language of the document in doc.lang.

Parameters:
  • soup – A parsed document, as BeautifulSoup object
  • doc (ferenda.Document) – Our document
Returns:

None

parse_document_from_soup(soup, doc)

Given a BeautifulSoup document, convert it into the provided doc object’s body property as suitable ferenda.elements objects.

Note

The default implementation respects parse_content_selector and parse_filter_selectors.

Parameters:
  • soup – A parsed document as a BeautifulSoup object
  • doc (ferenda.Document) – Our document
Returns:

None

patch_if_needed(basefile, text)

Given basefile and the entire text of the downloaded or intermediate document, find if there exists a patch file under self.config.patchdir, and if so, applies it. Returns (patchedtext, patchdescription) if so, (text,None) otherwise.

make_document(basefile=None)

Create a Document objects with basic initialized fields.

Note

Helper method used by the makedocument() decorator.

Parameters:basefile (str) – The basefile for the document
Return type:ferenda.Document
make_graph()

Initialize a rdflib Graph object with proper namespace prefix bindings (as determined by namespaces)

Return type:rdflib.Graph
create_external_resources(doc)

Optionally create external files that go together with the parsed file (stylesheets, images, etc).

The default implementation does nothing.

Parameters:doc (ferenda.Document) – The document
render_xhtml(doc, outfile=None)

Renders the parsed object structure as a XHTML file with RDFa attributes (also returns the same XHTML as a string).

Parameters:
  • doc (ferenda.Document) – The document to render
  • outfile (str) – The file name for the XHTML document
Returns:

The XHTML document

Return type:

str

render_xhtml_tree(doc)
parsed_url(basefile)

Get the full local url for the parsed file for the given basefile.

Parameters:basefile (str) – The basefile for which to calculate the local url
Returns:The local url
Return type:str
distilled_url(basefile)

Get the full local url for the distilled RDF/XML file for the given basefile.

Parameters:basefile (str) – The basefile for which to calculate the local url
Returns:The local url
Return type:str
classmethod relate_all_setup(config)

Runs any cleanup action needed prior to relating all documents in a docrepo. The default implementation clears the corresponsing context (see dataset_uri()) in the triple store.

Note

Like parse_all_setup() this might change to a instance method.

Returns False if no relation needs to be done (as determined by the timestamp on the dump nt file)

classmethod relate_all_teardown(config)

Runs any cleanup action needed after relating all documents in a docrepo. The default implementation does nothing.

Note

Like parse_all_setup() this might change to a instance method.

relate(basefile, otherrepos=[])

Runs various indexing operations for the document represented by basefile: insert RDF statements into a triple store, add this document to the dependency list to all documents that it refers to, and put the text of the document into a fulltext index.

relate_triples(basefile)

Insert the (previously distilled) RDF statements into the triple store.

Parameters:basefile (str) – The basefile for the document containing the RDF statements.
Returns:None
relate_dependencies(basefile, repos=[])

For each document that the basefile document refers to, attempt to find this document in the current or any other docrepo, and add the parsed document path to that documents dependency file.

add_dependency(basefile, dependencyfile)

Add the dependencyfile to basefile s dependency file. Returns True if anything new was added, False otherwise

relate_fulltext(basefile)

Index the text of the document into fulltext index.

Parameters:basefile (str) – The basefile for the document to be indexed.
Returns:None
classmethod generate_all_setup(config)

Runs any action needed prior to generating all documents in a docrepo. The default implementation does nothing.

Note

Like parse_all_setup() this might change to a instance method.

classmethod generate_all_teardown(config)

Runs any cleanup action needed after generating all documents in a docrepo. The default implementation does nothing.

Note

Like parse_all_setup() this might change to a instance method.

generate(basefile, otherrepos=[])

Generate a browser-ready HTML file from structured XML and RDF.

Uses the XML and RDF files constructed by ferenda.DocumentRepository.parse().

The generation is done by XSLT, and normally you won’t need to override this, but you might want to provide your own xslt file and set ferenda.DocumentRepository.xslt_template to the name of that file.

If you want to generate your browser-ready HTML by any other means than XSLT, you should override this method.

Parameters:basefile (str) – The basefile for which to generate HTML
Returns:None
get_url_transform_func(repos, basedir)
prep_annotation_file(basefile)

Helper function used by generate() – prepares a RDF/XML file containing statements that in some way annotates the information found in the document that generate handles, like URI/title of other documents that refers to this one.

Parameters:basefile (str) – The basefile for which to collect annotating statements.
Returns:The full path to the prepared RDF/XML file
Return type:str
construct_annotations(uri)

Construct a RDF graph containing metadata relating to uri in some way, using the query template specified by sparql_annotations

construct_sparql_query(uri)
graph_to_annotation_file(graph)

Converts a RDFLib graph into a XML file with the same statements, ordered using the Grit format (https://code.google.com/p/oort/wiki/Grit) for easier XSLT inclusion.

Parameters:graph (rdflib.graph.Graph) – The graph to convert
Returns:A serialized XML document with the RDF statements
Return type:str
annotation_file_to_graph(annotation_file)

Converts a annotation file (using the Grit format) back into an RDFLib graph.

Parameters:graph (str) – The filename of a serialized XML document with RDF statements
Returns:The RDF statements as a regular graph
Return type:rdflib.Graph
generated_url(basefile)

Get the full local url for the generated file for the given basefile.

Parameters:basefile (str) – The basefile for which to calculate the local url
Returns:The local url
Return type:str
toc(otherrepos=[])

Creates a set of pages that together acts as a table of contents for all documents in the repository. For smaller repositories a single page might be enough, but for repositoriees with a few hundred documents or more, there will usually be one page for all documents starting with A, starting with B, and so on. There might be different ways of browseing/drilling down, i.e. both by title, publication year, keyword and so on.

The default implementation calls toc_select() to get all data from the triple store, toc_criteria() to find out the criteria for ordering, toc_pagesets() to calculate the total set of TOC html files, toc_select_for_pages() to create a list of documents for each TOC html file, and finally toc_generate_pages() to create the HTML files. The default implemention assumes that documents have a title (in the form of a dct:title property) and a publication date (in the form of a dct:issued property).

You can override any of these methods to customize any part of the toc generation process. Often overriding toc_criteria() to specify other document properties will be sufficient.

toc_select(context=None)

Select all data from the triple store needed to make up all TOC pages.

Parameters:context (str) – The context (named graph) to restrict the query to. If None, search entire triplestore.
Returns:The results of the query, as python objects
Return type:set of dicts
toc_query(context=None)

Constructs a SPARQL SELECT query that fetches all information needed to construct the complete set of TOC pages in the form of a single list of result rows.

Override this method if you need to customize the query.

Parameters:context (str) – The context (named graph) to which to limit the query. If None, query the entire triplestore.
Returns:The SPARQL query
Return type:str

Example:

>>> d = DocumentRepository()
>>> expected = 'PREFIX bibo: <http://purl.org/ontology/bibo/> PREFIX dct: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX prov: <http://www.w3.org/ns/prov-o/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX xhv: <http://www.w3.org/1999/xhtml/vocab#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX xsi: <http://www.w3.org/2001/XMLSchema-instance> SELECT DISTINCT ?uri ?title ?issued FROM <http://example.org/ctx/base> WHERE {?uri rdf:type foaf:Document ; dct:title ?title . OPTIONAL { ?uri dct:issued ?issued . }  }'
>>> d.toc_query("http://example.org/ctx/base") == expected
True
toc_criteria(predicates=None)

Create the criteria used to organize the documents in the repository into different pagesets.

Parameters:predicates (list) – The URIRef terms to use as base for criteria
Returns:TocCriteria objects, each representing a particular way of organizing the documents, and each corresponding to a TocPageset object (constructed by toc_pagesets())
Return type:list
toc_predicates()

Return a list of predicates (as URIRef objects that each should be used to organize a table of contents of documents in this docrepo).

Is used by toc_criteria, must match results from sparql query in toc_query.

toc_pagesets(data, criteria)

Calculate the set of needed TOC pages based on the result rows

Parameters:
  • data – list of dicts, each dict containing metadata about a single document
  • criteria – list of TocCriteria objects
Returns:

The link text, page title and base file for each needed TOC page, structured by selection criteria.

Return type:

3-dimensional named tuple

Example:

>>> d = DocumentRepository()
>>> rows = [{'uri':'http://ex.org/1','title':'Abc','issued':'2009-04-02'},
...         {'uri':'http://ex.org/2','title':'Abcd','issued':'2010-06-30'},
...         {'uri':'http://ex.org/3','title':'Dfg','issued':'2010-08-01'}]
>>> from operator import itemgetter
>>> criteria = (TocCriteria(binding='title',
...                         label='By title',
...                         pagetitle='Documents starting with "%(select)s"',
...                         selector=lambda x: x['title'][0].lower(),
...                         key=itemgetter('title')),
...             TocCriteria(binding='issued',
...                         label='By publication year',
...                         pagetitle='Documents published in %(select)s',
...                         selector=lambda x: x['issued'][:4],
...                         key=itemgetter('issued')))
>>> # Note: you can get a suitable tuple of TocCriteria
>>> # objects by calling toc_criteria() as well
>>> pagesets=d.toc_pagesets(rows,criteria)
>>> pagesets[0].label == 'By title'
True
>>> pagesets[0].pages[0] == TocPage(linktext='a', title='Documents starting with "a"', binding='title', value='a')
True
>>> pagesets[0].pages[0].linktext == 'a'
True
>>> pagesets[0].pages[0].title == 'Documents starting with "a"'
True
>>> pagesets[0].pages[0].binding == 'title'
True
>>> pagesets[0].pages[0].value == 'a'
True
>>> pagesets[1].label == 'By publication year'
True
>>> pagesets[1].pages[0] == TocPage(linktext='2009', title='Documents published in 2009', binding='issued', value='2009')
True
toc_select_for_pages(data, pagesets, criteria)

Go through all data rows (each row representing a document) and, for each toc page, select those documents that are to appear in a particular page.

Example:

>>> d = DocumentRepository()
>>> rows = [{'uri':'http://ex.org/1','title':'Abc','issued':'2009-04-02'},
...         {'uri':'http://ex.org/2','title':'Abcd','issued':'2010-06-30'},
...         {'uri':'http://ex.org/3','title':'Dfg','issued':'2010-08-01'}]
>>> from rdflib import Namespace
>>> dct = Namespace("http://purl.org/dc/terms/")
>>> criteria = d.toc_criteria([dct.title,dct.issued])
>>> pagesets=d.toc_pagesets(rows,criteria)
>>> expected={('title','a'):[[Link('Abc',uri='http://ex.org/1')],
...                          [Link('Abcd',uri='http://ex.org/2')]],
...           ('title','d'):[[Link('Dfg',uri='http://ex.org/3')]],
...           ('issued','2009'):[[Link('Abc',uri='http://ex.org/1')]],
...           ('issued','2010'):[[Link('Abcd',uri='http://ex.org/2')],
...                              [Link('Dfg',uri='http://ex.org/3')]]}
>>> d.toc_select_for_pages(rows, pagesets, criteria) == expected
True
Parameters:
Returns:

mapping between toc basefile and documentlist for that basefile

Return type:

dict

toc_item(binding, row)

Returns a formatted version of row, using Element objects

toc_generate_pages(pagecontent, pagesets, otherrepos=[])
Creates a set of TOC pages by calling
toc_generate_page().
Parameters:
toc_generate_first_page(pagecontent, pagesets, otherrepos=[])

Generate the main page of TOC pages.

toc_generate_page(binding, value, documentlist, pagesets, effective_basefile=None, otherrepos=[])

Generate a single TOC page.

Parameters:
  • binding – The binding used (eg. ‘title’ or ‘issued’)
  • value – The value for the used binding (eg. ‘a’ or ‘2013’
  • documentlist – Result from toc_select_for_pages()
  • pagesets – Result from toc_pagesets()
  • effective_basefile – Place the resulting page somewhere else than toc/*binding*/*value*.html
  • otherrepos – A list of document repository instances
news(otherrepos=[])

Create a set of Atom feeds and corresponding HTML pages for new/updated documents in different categories in the repository.

news_criteria()

Returns a list of NewsCriteria objects.

news_entries()

Return a generator of all available entries, represented as tuples of (DocumentEntry, rdflib.Graph) objects. The Graph contains all distilled metadata about the document.

news_write_atom(entries, title, basefile, archivesize=1000)

Given a list of Atom entry-like objects, including links to RDF and PDF files (if applicable), create a rinfo-compatible Atom feed, optionally splitting into archives.

frontpage_content(primary=False)

If the module wants to provide any particular content on the frontpage, it can do so by returning a XHTML fragment (in text form) here. If primary is true, the caller wants the module to take primary responsibility for the frontpage content. If primary is false, the caller only expects a smaller amount of content (like a smaller presentation of the repository and the document it contains).

status(basefile=None, samplesize=3)

Prints out some basic status information about this repository.

get_status()

Returns basic data about the state about this repository, used by status().

tabs()

Get the navigation menu segment(s) provided by this docrepo

Returns a list of tuples, where each tuple will be rendered as a tab in the main UI. First element of the tuple is the link text, and the second is the link destination. Normally, a module will only return a single tab.

Returns:List of tuples
footer()

Get a list of resources provided by this repo for publication in the site footer.

Works like tabs(), but normally returns an empty list. The repo ferenda.sources.general.Static is an exception.

http_handle(environ)

Used by the WSGI support to indicate if this repo can provide a response to a particular request. If so, returns a tuple (fp, length, memtype), where fp is an open file of the document to be returned.