The FulltextIndex class

Abstracts access to full text indexes (right now only Whoosh and ElasticSearch is supported, but maybe later, Solr, Xapian and/or Sphinx will be supported).

class ferenda.FulltextIndex(location, repos)[source]

This is the abstract base class for a fulltext index. You use it by calling the static method FulltextIndex.connect, passing a string representing the underlying fulltext engine you wish to use. It returns a subclass on which you then call further methods.

static connect(indextype, location, repos)[source]

Open a fulltext index (creating it if it doesn’t already exists).

Parameters:
  • location (str) – Type of fulltext index (“WHOOSH” or “ELASTICSEARCH”)
  • location – The file path of the fulltext index.
make_schema(repos)[source]
get_default_schema()[source]
exists()[source]

Whether the fulltext index exists.

create(repos)[source]

Creates a fulltext index using the provided schema.

destroy()[source]

Destroys the index, if created.

open()[source]

Opens the index so that it can be queried.

schema()[source]

Returns the schema that actually is in use. A schema is a dict where the keys are field names and the values are any subclass of ferenda.fulltextindex.IndexedType

update(uri, repo, basefile, text, **kwargs)[source]

Insert (or update) a resource in the fulltext index. A resource may be an entire document, but it can also be any part of a document that is referenceable (i.e. a document node that has @typeof and @about attributes). A document with 100 sections can be stored as 100 independent resources, as long as each section has a unique key in the form of a URI.

Parameters:
  • uri (str) – URI for the resource
  • repo (str) – The alias for the document repository that the resource is part of
  • basefile (str) – The basefile which contains resource
  • title (str) – User-displayable title of resource (if applicable). Should not contain the same information as identifier.
  • identifier (str) – User-displayable short identifier for resource (if applicable)

Note

Calling this method may not directly update the fulltext index – you need to call commit() or close() for that.

commit()[source]

Commit all pending updates to the fulltext index.

close()[source]

Commits all pending updates and closes the index.

doccount()[source]

Returns the number of currently indexed (non-deleted) documents.

query(q=None, pagenum=1, pagelen=10, **kwargs)[source]
Perform a free text query against the full text index, optionally
restricted with parameters for individual fields.
Parameters:
  • q (str) – Free text query, using the selected full text index’s prefered query syntax
  • **kwargs (dict) – any parameter will be used to match a similarly-named field
Returns:

matching documents, each document as a dict of fields

Return type:

list

Note

The kwargs parameters do not yet do anything – only simple full text queries are possible.

fieldmapping = ()

A tuple of (abstractfield, nativefield) tuples. Each abstractfield should be a instance of a IndexedType-derived class. Each nativefield should be whatever kind of object that is used with the native fullltextindex API.

The methods to_native_field() and from_native_field() uses this tuple of tuples to convert fields.

to_native_field(fieldobject)[source]

Given a abstract field (an instance of a IndexedType-derived class), convert to the corresponding native type for the fulltextindex in use.

from_native_field(fieldobject)[source]

Given a fulltextindex native type, convert to the corresponding IndexedType object.

Datatype field classes

class ferenda.fulltextindex.IndexedType(**kwargs)[source]

Base class for a fulltext searchengine-independent representation of indexed data. By using IndexType-derived classes to represent the schema, it becomes possible to switch out search engines without affecting the rest of the code.

class ferenda.fulltextindex.Identifier(**kwargs)[source]

An identifier is a string, normally in the form of a URI, which uniquely identifies an indexed document.

class ferenda.fulltextindex.Datetime(**kwargs)[source]
class ferenda.fulltextindex.Text(**kwargs)[source]
class ferenda.fulltextindex.Label(**kwargs)[source]
class ferenda.fulltextindex.Keyword(**kwargs)[source]

A keyword is a single string from a controlled vocabulary.

class ferenda.fulltextindex.Boolean(**kwargs)[source]
class ferenda.fulltextindex.URI(**kwargs)[source]

Any URI (except the URI that identifies a indexed document – use Identifier for that).

class ferenda.fulltextindex.Resource(**kwargs)[source]

A fulltextindex.Resource is a URI that also has a human-readable label.

Search field classes

class ferenda.fulltextindex.SearchModifier(*values)[source]
class ferenda.fulltextindex.Less(max)[source]
class ferenda.fulltextindex.More(min)[source]
class ferenda.fulltextindex.Between(min, max)[source]