Jekyll Algolia Plugin

Gem Version Build
Status Coverage
Status Code Climate Jekyll >= 3.6.2 Ruby >= 2.2.8

Jekyll plugin to automatically index your content into Algolia.

⚠ Unreleased beta version

This plugin is not yet released on Rubygems. If you want to try it, you should clone the repository and then update your Gemfile to point to the path on disk like this:

group :jekyll_plugins do
  gem "jekyll-algolia", :path => "/path/to/the/cloned/repo"
end

Feedback very welcome!

Usage

$ jekyll algolia

This will push the content of your Jekyll website to your Algolia index.

Installation

The plugin requires a minimum version of Jekyll of 3.6.2 and a Ruby version of 2.2.8 (which are the current versions deployed on GitHub Pages at the time of writing).

First, add the jekyll-algolia gem to your Gemfile, in the :jekyll_plugins section.

If you do not yet have a Gemfile, here is the minimal content to get your started. You will also need Bundler to be able to use the Gemfile.

source 'https://rubygems.org'

gem 'jekyll', '~> 3.6'

group :jekyll_plugins do
  gem 'jekyll-algolia'
end

Once this is done, download all dependencies with bundle install.

If everything went well, you should be able to run jekyll help and see the algolia subcommand listed.

Basic configuration

Add your Algolia credentials under the algolia section of your _config.yml file like this:

algolia:
  application_id: 'your_application_id'
  index_name:     'your_index_name'

If you don't yet have an Algolia account, you can open a free Community plan here. If you already have an account, you can get your credentials from your dashboard.

Your API key will be read from the ALGOLIA_API_KEY environment variable. You can define it on the same line as your command, allowing you to type ALGOLIA_API_KEY='your_api_key' jekyll algolia.

⚠ Other, unsecure, method ⚠

You can also store your API key in a file named _algolia_api_key, in your source directory. If you do this we very, very, very strongly encourage you to make sure the file is not tracked in your versioning system.

How it works

The plugin will work like a jekyll build run, but instead of writing .html files to disk, it will push content to Algolia.

It will split each page of your website into small chunks (by default, one per <p> paragraph) and then push each chunk as a new record to Algolia. Splitting records that way yields a better relevance of results even on long pages.

The placement of each paragraph in the page heading hierarchy (title, subtitles through <h1> to <h6>) is also taken into account to further improve relevance of results.

Each record will also contain metadata about the page it was extracted from (including slug, url, tags, categories, collection and any custom field added to the front-matter).

Every time you run jekyll algolia, a full build of the website is run locally, but only records that were changed since your last build will be updated in your index.

Advanced configuration

The plugin should work out of the box for most websites, but there are options you can tweak if needed. All the options should be added under the algolia section of your _config.yml file.

nodes_to_index

By default, each page of your website will be split into chunks based on this CSS selector. The default value of p means that one record will be created for each <p> in your generated content.

If you would like to index other elements, like <blockquote>, <li> or a custom <div class="paragraph">. If so, you should edit the value like this:

algolia:
  # Also index quotes, list items and custom paragraphs
  nodes_to_index: 'p,blockquote,li,div.paragraph'

extensions_to_index

By default, HTML and Markdown files will be indexed. If you are using another markup language (such as AsciiDoc or Textile, then you should overwrite this option.

algolia:
  # Also index AsciiDoc and Textile files
  extensions_to_index: 'html,md,adoc,textile'

files_to_exclude

The plugin will try to be smart in the pages it should not index. Some files will always be excluded from the indexing (static assets, custom 404 and pagination pages). Others are handled by the files_to_exclude option.

By default it will exclude all the index.html and index.md files. Those files are usually not containing much text (landing pages) or containing redundant text (latest blog articles) so we decided to exclude them by default.

If you actually want to index those files, you should set the value to an empty array.

algolia:
  # Actually index the index.html/index.md pages
  files_to_exclude: []

If you want to exclude more files, you should add them to the array:

algolia:
  # Exclude more files from indexing
  files_to_exclude:
    - index.html
    - index.md
    - excluded-file.html
    - /_posts/2017-01-20-date-to-forget.md

settings

By default the plugin will configure your Algolia index with settings tailored to the format of the extracted records. You are of course free to overwrite them or configure them as best suits your needs. Every option passed to the settings entry will passed to a call to set_settings.

For example if you want to change the HTML tag used for the highlighting, you can overwrite it like this:

algolia:
  settings:
    highlightPreTag: '<em class="custom_highlight">'
    highlightPostTag: '</em>'

indexing_batch_size

The Algolia API allows you to send batches of changes to add or update several records at once, instead of doing one HTTP call per record. The plugin will batch updates by groups of 1000 records.

If you are on an unstable internet connection, you might want to decrease the value. You will send more batches, but each will be smaller in size.

algolia:
  # Send fewer records per batch
  indexing_batch_size: 500

indexing_mode

Synchronizing your local data with your Algolia index can be done in different ways. By default, the plugin will use the diff indexing mode but you might also be interested in the atomic mode.

diff (default)

By default, the plugin will try to be smart when pushing content to your index: it will only push new records and delete old ones insted of overwriting everything.

To do so, we first need to grab the list of all records residing in your index, then comparing them with the one generated locally. We then delete the old records that no longer exists, and then add the newly created record.

The main advantage is that it will consume very few operations in your Algolia quota. The drawback is that it will put your index into an inconsistent state for a few seconds (records were deleted, but new one were not yet added). Users doing a search on your website at that time might have incomplete results.

atomic

Using the atomic indexing mode, your users will never search into an inconsistent index. They will either be searching into the index containing the old data, or the one containing the new data, but never in an intermediate state.

To do so, the plugin will actually push all data to a temporary index first. Once everything is copied and configured, it will then overwrite the old index with the temporary one.

The main advantage is that it will be completly transparent for your users. The drawback is that it will consume much more operations as you will have to push all your records to a new index each time.

Thanks

Thanks to Anatoliy Yastreb for a great tutorial on creating Jekyll plugins.

Description
Add fast and relevant search to your Jekyll site
Readme 33 MiB
Languages
Ruby 65.1%
SCSS 14%
JavaScript 8.6%
CSS 7.1%
Pug 3.9%
Other 1.2%