The `FulltextIndex` class¶

Abstracts access to full text indexes (right now only Whoosh and ElasticSearch is supported, but maybe later, Solr, Xapian and/or Sphinx will be supported).

class ferenda.FulltextIndex(location, repos)[source]¶

This is the abstract base class for a fulltext index. You use it by calling the static method FulltextIndex.connect, passing a string representing the underlying fulltext engine you wish to use. It returns a subclass on which you then call further methods.

indextypes = {'ELASTICSEARCH': <class 'ferenda.fulltextindex.ElasticSearchIndex'>, 'ELASTICSEARCH2': <class 'ferenda.fulltextindex.ElasticSearch2x'>, 'WHOOSH': <class 'ferenda.fulltextindex.WhooshIndex'>}¶

classmethod connect(indextype, location, repos)[source]¶

Open a fulltext index (creating it if it doesn’t already exists).

Parameters:	location (str) – Type of fulltext index (“WHOOSH” or “ELASTICSEARCH”) location – The file path of the fulltext index.

make_schema(repos)[source]¶

get_default_schema()[source]¶

exists()[source]¶: Whether the fulltext index exists.

create(repos)[source]¶: Creates a fulltext index using the provided schema.

destroy()[source]¶: Destroys the index, if created.

open()[source]¶: Opens the index so that it can be queried.

schema()[source]¶: Returns the schema that actually is in use. A schema is a dict where the keys are field names and the values are any subclass of ferenda.fulltextindex.IndexedType

update(uri, repo, basefile, text, **kwargs)[source]¶

Insert (or update) a resource in the fulltext index. A resource may be an entire document, but it can also be any part of a document that is referenceable (i.e. a document node that has @typeof and @about attributes). A document with 100 sections can be stored as 100 independent resources, as long as each section has a unique key in the form of a URI.

Parameters:

uri (str) – URI for the resource
repo (str) – The alias for the document repository that the resource is part of
basefile (str) – The basefile which contains resource
title (str) – User-displayable title of resource (if applicable). Should not contain the same information as identifier.
identifier (str) – User-displayable short identifier for resource (if applicable)

Note

Calling this method may not directly update the fulltext index – you need to call commit() or close() for that.

commit()[source]¶: Commit all pending updates to the fulltext index.

close()[source]¶: Commits all pending updates and closes the index.

doccount()[source]¶: Returns the number of currently indexed (non-deleted) documents.

query(q=None, pagenum=1, pagelen=10, ac_query=False, exclude_types=None, **kwargs)[source]¶

Perform a free text query against the full text index, optionally: restricted with parameters for individual fields.

Parameters:	q (str) – Free text query, using the selected full text index’s prefered query syntax *kwargs (dict*) – any parameter will be used to match a similarly-named field
Returns:	matching documents, each document as a dict of fields
Return type:	list

Note

The kwargs parameters do not yet do anything – only simple full text queries are possible.

fieldmapping = ()¶

A tuple of (abstractfield, nativefield) tuples. Each abstractfield should be a instance of a IndexedType-derived class. Each nativefield should be whatever kind of object that is used with the native fullltextindex API.

The methods to_native_field() and from_native_field() uses this tuple of tuples to convert fields.

to_native_field(fieldobject)[source]¶: Given a abstract field (an instance of a IndexedType-derived class), convert to the corresponding native type for the fulltextindex in use.

from_native_field(fieldobject)[source]¶: Given a fulltextindex native type, convert to the corresponding IndexedType object.

Datatype field classes¶

class ferenda.fulltextindex.IndexedType(**kwargs)[source]¶: Base class for a fulltext searchengine-independent representation of indexed data. By using IndexType-derived classes to represent the schema, it becomes possible to switch out search engines without affecting the rest of the code.

class ferenda.fulltextindex.Identifier(**kwargs)[source]¶: An identifier is a string, normally in the form of a URI, which uniquely identifies an indexed document.

class ferenda.fulltextindex.Datetime(**kwargs)[source]¶

class ferenda.fulltextindex.Text(**kwargs)[source]¶

class ferenda.fulltextindex.Label(**kwargs)[source]¶

class ferenda.fulltextindex.Keyword(**kwargs)[source]¶: A keyword is a single string from a controlled vocabulary.

class ferenda.fulltextindex.Boolean(**kwargs)[source]¶

class ferenda.fulltextindex.URI(**kwargs)[source]¶: Any URI (except the URI that identifies a indexed document – use Identifier for that).

class ferenda.fulltextindex.Resource(**kwargs)[source]¶: A fulltextindex.Resource is a URI that also has a human-readable label.

Search field classes¶

class ferenda.fulltextindex.SearchModifier(*values)[source]¶

class ferenda.fulltextindex.Less(max)[source]¶

class ferenda.fulltextindex.More(min)[source]¶

class ferenda.fulltextindex.Between(min, max)[source]¶

The FulltextIndex class¶

Datatype field classes¶

Search field classes¶

The `FulltextIndex` class¶