The FulltextIndex class

Abstracts access to full text indexes (right now only Whoosh and ElasticSearch is supported, but maybe later, Solr, Xapian and/or Sphinx will be supported).

class ferenda.FulltextIndex(location, repos)

This is the abstract base class for a fulltext index. You use it by calling the static method FulltextIndex.connect, passing a string representing the underlying fulltext engine you wish to use. It returns a subclass on which you then call further methods.

static connect(indextype, location, repos)

Open a fulltext index (creating it if it doesn’t already exists).

Parameters:
  • location (str) – Type of fulltext index (“WHOOSH” or “ELASTICSEARCH”)
  • location – The file path of the fulltext index.
make_schema(repos)
get_default_schema()
exists()

Whether the fulltext index exists.

create(repos)

Creates a fulltext index using the provided schema.

destroy()

Destroys the index, if created.

open()

Opens the index so that it can be queried.

schema()

Returns the schema that actually is in use. A schema is a dict where the keys are field names and the values are any subclass of ferenda.fulltextindex.IndexedType

update(uri, repo, basefile, text, **kwargs)

Insert (or update) a resource in the fulltext index. A resource may be an entire document, but it can also be any part of a document that is referenceable (i.e. a document node that has @typeof and @about attributes). A document with 100 sections can be stored as 100 independent resources, as long as each section has a unique key in the form of a URI.

Parameters:
  • uri (str) – URI for the resource
  • repo (str) – The alias for the document repository that the resource is part of
  • basefile (str) – The basefile which contains resource
  • title (str) – User-displayable title of resource (if applicable). Should not contain the same information as identifier.
  • identifier (str) – User-displayable short identifier for resource (if applicable)

Note

Calling this method may not directly update the fulltext index – you need to call commit() or close() for that.

commit()

Commit all pending updates to the fulltext index.

close()

Commits all pending updates and closes the index.

doccount()

Returns the number of currently indexed (non-deleted) documents.

query(q=None, pagenum=1, pagelen=10, **kwargs)
Perform a free text query against the full text index, optionally
restricted with parameters for individual fields.
Parameters:
  • q (str) – Free text query, using the selected full text index’s prefered query syntax
  • **kwargs (dict) – any parameter will be used to match a similarly-named field
Returns:

matching documents, each document as a dict of fields

Return type:

list

Note

The kwargs parameters do not yet do anything – only simple full text queries are possible.

fieldmapping = ()

A tuple of (abstractfield, nativefield) tuples. Each abstractfield should be a instance of a IndexedType-derived class. Each nativefield should be whatever kind of object that is used with the native fullltextindex API.

The methods to_native_field() and from_native_field() uses this tuple of tuples to convert fields.

to_native_field(fieldobject)

Given a abstract field (an instance of a IndexedType-derived class), convert to the corresponding native type for the fulltextindex in use.

from_native_field(fieldobject)

Given a fulltextindex native type, convert to the corresponding IndexedType object.

Datatype field classes

class ferenda.fulltextindex.IndexedType(**kwargs)[source]

Base class for a fulltext searchengine-independent representation of indexed data. By using IndexType-derived classes to represent the schema, it becomes possible to switch out search engines without affecting the rest of the code.

class ferenda.fulltextindex.Identifier(**kwargs)[source]

An identifier is a string, normally in the form of a URI, which uniquely identifies an indexed document.

class ferenda.fulltextindex.Datetime(**kwargs)[source]
class ferenda.fulltextindex.Text(**kwargs)[source]
class ferenda.fulltextindex.Label(**kwargs)[source]
class ferenda.fulltextindex.Keyword(**kwargs)[source]

A keyword is a single string from a controlled vocabulary.

class ferenda.fulltextindex.Boolean(**kwargs)[source]
class ferenda.fulltextindex.URI(**kwargs)[source]

Any URI (except the URI that identifies a indexed document – use Identifier for that).

class ferenda.fulltextindex.Resource(**kwargs)[source]

A fulltextindex.Resource is a URI that also has a human-readable label.

Search field classes

class ferenda.fulltextindex.SearchModifier(*values)[source]
class ferenda.fulltextindex.Less(max)[source]
class ferenda.fulltextindex.More(min)[source]
class ferenda.fulltextindex.Between(min, max)[source]