The FulltextIndex
class¶
Abstracts access to full text indexes (right now only Whoosh and ElasticSearch is supported, but maybe later, Solr, Xapian and/or Sphinx will be supported).
-
class
ferenda.
FulltextIndex
(location, repos)[source]¶ This is the abstract base class for a fulltext index. You use it by calling the static method FulltextIndex.connect, passing a string representing the underlying fulltext engine you wish to use. It returns a subclass on which you then call further methods.
-
indextypes
= {'ELASTICSEARCH': <class 'ferenda.fulltextindex.ElasticSearchIndex'>, 'ELASTICSEARCH2': <class 'ferenda.fulltextindex.ElasticSearch2x'>, 'WHOOSH': <class 'ferenda.fulltextindex.WhooshIndex'>}¶
-
classmethod
connect
(indextype, location, repos)[source]¶ Open a fulltext index (creating it if it doesn’t already exists).
Parameters: - location (str) – Type of fulltext index (“WHOOSH” or “ELASTICSEARCH”)
- location – The file path of the fulltext index.
-
schema
()[source]¶ Returns the schema that actually is in use. A schema is a dict where the keys are field names and the values are any subclass of
ferenda.fulltextindex.IndexedType
-
update
(uri, repo, basefile, text, **kwargs)[source]¶ Insert (or update) a resource in the fulltext index. A resource may be an entire document, but it can also be any part of a document that is referenceable (i.e. a document node that has
@typeof
and@about
attributes). A document with 100 sections can be stored as 100 independent resources, as long as each section has a unique key in the form of a URI.Parameters: - uri (str) – URI for the resource
- repo (str) – The alias for the document repository that the resource is part of
- basefile (str) – The basefile which contains resource
- title (str) – User-displayable title of resource (if applicable).
Should not contain the same information as
identifier
. - identifier (str) – User-displayable short identifier for resource (if applicable)
-
query
(q=None, pagenum=1, pagelen=10, ac_query=False, exclude_types=None, **kwargs)[source]¶ - Perform a free text query against the full text index, optionally
- restricted with parameters for individual fields.
Parameters: Returns: matching documents, each document as a dict of fields
Return type: Note
The kwargs parameters do not yet do anything – only simple full text queries are possible.
-
fieldmapping
= ()¶ A tuple of
(abstractfield, nativefield)
tuples. Eachabstractfield
should be a instance of a IndexedType-derived class. Eachnativefield
should be whatever kind of object that is used with the native fullltextindex API.The methods
to_native_field()
andfrom_native_field()
uses this tuple of tuples to convert fields.
-
Datatype field classes¶
-
class
ferenda.fulltextindex.
IndexedType
(**kwargs)[source]¶ Base class for a fulltext searchengine-independent representation of indexed data. By using IndexType-derived classes to represent the schema, it becomes possible to switch out search engines without affecting the rest of the code.
-
class
ferenda.fulltextindex.
Identifier
(**kwargs)[source]¶ An identifier is a string, normally in the form of a URI, which uniquely identifies an indexed document.
-
class
ferenda.fulltextindex.
Keyword
(**kwargs)[source]¶ A keyword is a single string from a controlled vocabulary.