The FulltextIndex class¶
Abstracts access to full text indexes (right now only Whoosh and ElasticSearch is supported, but maybe later, Solr, Xapian and/or Sphinx will be supported).
- class ferenda.FulltextIndex(location, repos)¶
This is the abstract base class for a fulltext index. You use it by calling the static method FulltextIndex.connect, passing a string representing the underlying fulltext engine you wish to use. It returns a subclass on which you then call further methods.
- static connect(indextype, location, repos)¶
Open a fulltext index (creating it if it doesn’t already exists).
Parameters: - location (str) – Type of fulltext index (“WHOOSH” or “ELASTICSEARCH”)
- location – The file path of the fulltext index.
- make_schema(repos)¶
- get_default_schema()¶
- exists()¶
Whether the fulltext index exists.
- create(repos)¶
Creates a fulltext index using the provided schema.
- destroy()¶
Destroys the index, if created.
- open()¶
Opens the index so that it can be queried.
- schema()¶
Returns the schema that actually is in use. A schema is a dict where the keys are field names and the values are any subclass of ferenda.fulltextindex.IndexedType
- update(uri, repo, basefile, text, **kwargs)¶
Insert (or update) a resource in the fulltext index. A resource may be an entire document, but it can also be any part of a document that is referenceable (i.e. a document node that has @typeof and @about attributes). A document with 100 sections can be stored as 100 independent resources, as long as each section has a unique key in the form of a URI.
Parameters: - uri (str) – URI for the resource
- repo (str) – The alias for the document repository that the resource is part of
- basefile (str) – The basefile which contains resource
- title (str) – User-displayable title of resource (if applicable). Should not contain the same information as identifier.
- identifier (str) – User-displayable short identifier for resource (if applicable)
- commit()¶
Commit all pending updates to the fulltext index.
- close()¶
Commits all pending updates and closes the index.
- doccount()¶
Returns the number of currently indexed (non-deleted) documents.
- query(q=None, pagenum=1, pagelen=10, **kwargs)¶
- Perform a free text query against the full text index, optionally
- restricted with parameters for individual fields.
Parameters: Returns: matching documents, each document as a dict of fields
Return type: list
Note
The kwargs parameters do not yet do anything – only simple full text queries are possible.
- fieldmapping = ()¶
A tuple of (abstractfield, nativefield) tuples. Each abstractfield should be a instance of a IndexedType-derived class. Each nativefield should be whatever kind of object that is used with the native fullltextindex API.
The methods to_native_field() and from_native_field() uses this tuple of tuples to convert fields.
- to_native_field(fieldobject)¶
Given a abstract field (an instance of a IndexedType-derived class), convert to the corresponding native type for the fulltextindex in use.
- from_native_field(fieldobject)¶
Given a fulltextindex native type, convert to the corresponding IndexedType object.
Datatype field classes¶
- class ferenda.fulltextindex.IndexedType(**kwargs)[source]¶
Base class for a fulltext searchengine-independent representation of indexed data. By using IndexType-derived classes to represent the schema, it becomes possible to switch out search engines without affecting the rest of the code.
- class ferenda.fulltextindex.Identifier(**kwargs)[source]¶
An identifier is a string, normally in the form of a URI, which uniquely identifies an indexed document.
- class ferenda.fulltextindex.Keyword(**kwargs)[source]¶
A keyword is a single string from a controlled vocabulary.