The CompositeRepository class

class ferenda.CompositeRepository(config=None, **kwargs)[source]

Acts as a proxy for a list of sub-repositories.

Calls the download() method for each of the included subrepos. Parse calls each subrepos parse() method in order until one succeeds, unless config.failfast is True. In that case any errors from the first subrepo is re-raised.


alias of CompositeStore

extrabases = ()

List of mixin classes to add to each subrepo class.

supress_subrepo_logging = True
subrepos = ()

List of respository classes to use.

classmethod get_default_options()[source]

Returns the class’ configuration default configuration properties. These can be overridden by a configution file, or by named arguments to __init__(). See Configuration for a list of standard configuration properties (your subclass is free to define and use additional configuration properties).

Returns:default configuration properties
Return type:dict

The LayeredConfig object that contains the current configuration for this docrepo instance. You can read or write individual properties of this object, or replace it with a new LayeredConfig object entirely.


Downloads all documents from a remote web service.

The default generic implementation assumes that all documents are linked from a single page (which has the url of start_url), that they all have URLs matching the document_url_regex or that the link text is always equal to basefile (as determined by basefile_regex). If these assumptions don’t hold, you need to override this method.

If you do override it, your download method should read and set the lastdownload parameter to either the datetime of the last download or any other module-specific string (id number or similar).

You should also read the refresh parameter. If it is True (the default), then you should call download_single() for every basefile you encounter, even though they may already exist in some form on disk. download_single() will normally be using conditional GET to see if there is a newer version available.

See Writing your own download implementation for more details.

Returns:True if any document was downloaded, False otherwise.
Return type:bool

Parse downloaded documents into structured XML and RDF.

It will also save the same RDF statements in a separate RDF/XML file.

You will need to provide your own parsing logic, but often it’s easier to just override parse_{metadata, document}_from_soup (assuming your indata is in a HTML format parseable by BeautifulSoup) and let the base class read and write the files.

If your data is not in a HTML format, or BeautifulSoup is not an appropriate parser to use, override this method.

Parameters:doc (ferenda.Document) – The document object to fill in.
copy_parsed(basefile, instance)[source]
class ferenda.CompositeStore(datadir, storage_policy='file', compression=None, docrepo_instances=None)[source]

Custom store for CompositeRepository objects.

list_basefiles_for(action, basedir=None, force=True)[source]

Get all available basefiles that can be used for the specified action.

  • action (str) – The action for which to get available basefiles (parse, relate, generate or news)
  • basedir (str) – The base directory in which to search for available files. If not provided, defaults to self.datadir.

All available basefiles

Return type: