The Facet class

class ferenda.Facet(rdftype=rdflib.term.URIRef('http://purl.org/dc/terms/title'), label=None, pagetitle=None, indexingtype=None, selector=None, key=None, identificator=None, toplevel_only=None, use_for_toc=None, use_for_feed=None, selector_descending=None, key_descending=None, multiple_values=None, dimension_type=None, dimension_label=None)[source]

Create a facet from the given rdftype and some optional parameters.

Parameters:
  • rdftype (rdflib.term.URIRef) – The type of facet being created
  • label (str) – A template for the label property of TocPageset objects created from this facet
  • pagetitle (str) – A template for the title property of TocPage objects created from this facet
  • indexingtype (ferenda.fulltext.IndexedType) – Object specifying how to store the data selected by this facet in the fulltext index
  • selector (callable) – A function that takes (row, binding, resource_graph) and returns a string acting as a category of some kind
  • key (callable) – A function that takes (row, binding, resource_graph) and returns a string usable for sorting
  • toplevel_only (bool) – Whether this facet should be applied to documents only, or any named (ie. given an URI) fragment of a document.
  • use_for_toc (bool) – Whether this facet should be used for TOC generation
  • use_for_feed (bool) – Whether this facet should be used for newsfeed generation
  • selector_descending (bool) – Whether the values returned by selector should be presented in lexical descending order
  • key_descending (bool) – Whether documents, when sorted through the key function, should be presented in reverse order.
  • multiple_values (bool) – Whether more than one instance of the rdftype value should be processed (such as multiple keywords each specified by one dcterms:subject triple).
  • dimension_type (str) – The general type of this facet – can be "type" (values are rdf:type), "ref" (values are URIs), "year" (values are xsd:datetime or similar), or "value" (values are string literals).
  • dimension_label (str) – An alternate label for this facet to be used if the selector logic is more transformative than selectional (ie. if it transforms dates to True or False values depending on whether they’re April 1st, you might set this to “aprilfirst”)
  • identificator (callable) – A function that takes (row, binding, resource_graph) and returns an identifier-like string usable as an id string or URL segment.

If optional parameters aren’t provided, then appropriate values are selected if rdfrtype is one of some common rdf properties:

facet description
rdf:type Grouped by qname() of the rdf:type of the document, eg. foaf:Document. Not used for toc
dcterms:title Grouped by first “sortable” letter, eg for a document titled “The Little Prince” returns “l”. Is used as a facet for the API, but it’s debatable if it’s useful
dcterms:identifier Also grouped by first sortable letter. When indexing, the resulting fulltext index field has a high boost value, which increases the chances of this document ranking high when one searches for its identifier.
dcterms:abstract Not used for toc
dc:creator Should be a free-test (string literal) value
dcterms:publisher Should be a URIRef
dcterms:references  
dcterms:issued Used for grouping documents published/issued in the same year
dc:subject A document can have multiple dc:subjects and all are indexed/processed
dcterms:subject Works like dc:subject, but the value should be a URIRef
schema:free A boolean value

This module contains a number of classmethods that can be used as arguments to selector and key, eg

>>> from rdflib import Namespace
>>> MYVOCAB = Namespace("http://example.org/vocab/")
>>> f = Facet(MYVOCAB.enactmentDate, selector=Facet.year)
>>> f.selector({'myvocab_enactmentDate': '2014-07-06'},
...            'myvocab_enactmentDate')
'2014'
classmethod defaultselector(row, binding, resource_graph=None)[source]

This returns row[binding] without any transformation.

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> Facet.defaultselector(row, "dcterms_title")
'A Tale of Two Cities'
classmethod defaultidentificator(row, binding, resource_graph=None)[source]

This returns row[binding] run through a simple slug-like transformation.

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> Facet.defaultidentificator(row, "dcterms_title")
'a-tale-of-two-cities'
classmethod year(row, binding='dcterms_issued', resource_graph=None)[source]

This returns the the year part of row[binding].

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> Facet.year(row, "dcterms_issued")
'1859'
classmethod booleanvalue(row, binding='schema_free', resource_graph=None)[source]

Returns True iff row[binding] == “true”, False otherwise.

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> Facet.booleanvalue(row, "schema_free")
True
classmethod titlesortkey(row, binding='dcterms_title', resource_graph=None)[source]

Returns a version of row[binding] suitable for sorting. The function title_sortkey() is used for string transformation.

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> Facet.titlesortkey(row, "dcterms_title")
'ataleoftwocities'
classmethod firstletter(row, binding='dcterms_title', resource_graph=None)[source]

Returns the first letter of row[binding], transformed into a sortable string.

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> Facet.firstletter(row, "dcterms_title")
'a'
classmethod resourcelabel(row, binding='dcterms_publisher', resource_graph=None)[source]

Lookup a suitable text label for row[binding] in resource_graph.

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> import rdflib
>>> resources = rdflib.Graph().parse(format="turtle", data="""
... @prefix foaf: <http://xmlns.com/foaf/0.1/> .
...
... <http://example.org/chapman_hall> a foaf:Organization;
...     foaf:name "Chapman & Hall" .
...
... """)
>>> Facet.resourcelabel(row, "dcterms_publisher", resources)
'Chapman & Hall'
classmethod sortresource(row, binding='dcterms_publisher', resource_graph=None)[source]

Returns a sortable version of the resource label for row[binding].

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> import rdflib
>>> resources = rdflib.Graph().parse(format="turtle", data="""
... @prefix foaf: <http://xmlns.com/foaf/0.1/> .
...
... <http://example.org/chapman_hall> a foaf:Organization;
...     foaf:name "Chapman & Hall" .
...
... """)
>>> Facet.sortresource(row, "dcterms_publisher", resources)
'chapmanhall'
classmethod term(row, binding='dcterms_publisher', resource_graph=None)[source]

Returns the leaf part of the URI found in row[binding].

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> Facet.term(row, "dcterms_publisher")
'chapman_hall'
classmethod qname(row, binding='rdf_type', resource_graph=None)[source]

Returns the qname of the rdf URIref contained in row[binding], as determined by the namespace prefixes registered in resource_graph.

>>> row = {"rdf_type": "http://purl.org/ontology/bibo/Book",
...        "dcterms_title": "A Tale of Two Cities",
...        "dcterms_issued": "1859-04-30",
...        "dcterms_publisher": "http://example.org/chapman_hall",
...        "schema_free": "true"}
>>> import rdflib
>>> resources = rdflib.Graph()
>>> resources.bind("bibo", "http://purl.org/ontology/bibo/")
>>> Facet.qname(row, "rdf_type", resources)
'bibo:Book'
classmethod resourcelabel_or_qname(row, binding='rdf_type', resource_graph=None)[source]