The CompositeRepository
class¶
-
class
ferenda.
CompositeRepository
(config=None, **kwargs)[source]¶ Acts as a proxy for a list of sub-repositories.
Calls the download() method for each of the included subrepos. Parse calls each subrepos parse() method in order until one succeeds, unless config.failfast is True. In that case any errors from the first subrepo is re-raised.
-
documentstore_class
¶ alias of
CompositeStore
-
extrabases
= ()¶ List of mixin classes to add to each subrepo class.
-
supress_subrepo_logging
= True¶
-
subrepos
= ()¶ List of respository classes to use.
-
classmethod
get_default_options
()[source]¶ Returns the class’ configuration default configuration properties. These can be overridden by a configution file, or by named arguments to
__init__()
. See Configuration for a list of standard configuration properties (your subclass is free to define and use additional configuration properties).Returns: default configuration properties Return type: dict
-
config
¶ The
LayeredConfig
object that contains the current configuration for this docrepo instance. You can read or write individual properties of this object, or replace it with a newLayeredConfig
object entirely.
-
download
(basefile=None)[source]¶ Downloads all documents from a remote web service.
The default generic implementation assumes that all documents are linked from a single page (which has the url of
start_url
), that they all have URLs matching thedocument_url_regex
or that the link text is always equal to basefile (as determined bybasefile_regex
). If these assumptions don’t hold, you need to override this method.If you do override it, your download method should read and set the
lastdownload
parameter to either the datetime of the last download or any other module-specific string (id number or similar).You should also read the
refresh
parameter. If it isTrue
(the default), then you should calldownload_single()
for every basefile you encounter, even though they may already exist in some form on disk.download_single()
will normally be using conditional GET to see if there is a newer version available.See Writing your own download implementation for more details.
Returns: True if any document was downloaded, False otherwise. Return type: bool
-
parse
(basefile)[source]¶ Parse downloaded documents into structured XML and RDF.
It will also save the same RDF statements in a separate RDF/XML file.
You will need to provide your own parsing logic, but often it’s easier to just override parse_{metadata, document}_from_soup (assuming your indata is in a HTML format parseable by BeautifulSoup) and let the base class read and write the files.
If your data is not in a HTML format, or BeautifulSoup is not an appropriate parser to use, override this method.
Parameters: doc (ferenda.Document) – The document object to fill in.
-
-
class
ferenda.
CompositeStore
(datadir, storage_policy='file', compression=None, docrepo_instances=None)[source]¶ Custom store for CompositeRepository objects.