Jekyll Algolia Plugin
Jekyll plugin to automatically index your content into Algolia.
⚠ Unreleased beta version
This plugin is not yet released on Rubygems. If you want to try it, you
should clone the repository and then update your Gemfile
to point to the path
on disk like this:
group :jekyll_plugins do
gem "jekyll-algolia", :path => "/path/to/the/cloned/repo"
end
Feedback very welcome!
Usage
$ jekyll algolia
This will push the content of your Jekyll website to your Algolia index.
Installation
The plugin requires a minimum version of Jekyll of 3.6.2 and a Ruby version of 2.2.8 (which are the current versions deployed on GitHub Pages at the time of writing).
First, add the jekyll-algolia
gem to your Gemfile
, in the :jekyll_plugins
section.
If you do not yet have a Gemfile
, here is the minimal content to get
your started. You will also need Bundler to be able to use the Gemfile
.
source 'https://rubygems.org'
gem 'jekyll', '~> 3.6'
group :jekyll_plugins do
gem 'jekyll-algolia'
end
Once this is done, download all dependencies with bundle install
.
If everything went well, you should be able to run jekyll help
and see the
algolia
subcommand listed.
Basic configuration
Add your Algolia credentials under the algolia
section of your
_config.yml
file like this:
algolia:
application_id: 'your_application_id'
index_name: 'your_index_name'
If you don't yet have an Algolia account, you can open a free Community plan here. If you already have an account, you can get your credentials from your dashboard.
Your API key will be read from the ALGOLIA_API_KEY
environment variable.
You can define it on the same line as your command, allowing you to type
ALGOLIA_API_KEY='your_api_key' jekyll algolia
.
⚠ Other, unsecure, method ⚠
You can also store your API key in a file named _algolia_api_key
, in
your source directory. If you do this we very, very, very strongly encourage
you to make sure the file is not tracked in your versioning system.
How it works
The plugin will work like a jekyll build
run, but instead of writing .html
files to disk, it will push content to Algolia.
It will split each page of your website into small chunks (by default, one per
<p>
paragraph) and then push each chunk as a new record to Algolia. Splitting
records that way yields a better relevance of results even on long pages.
The placement of each paragraph in the page heading hierarchy (title, subtitles
through <h1>
to <h6>
) is also taken into account to further improve
relevance of results.
Each record will also contain metadata about the page it was extracted from
(including slug
, url
, tags
, categories
, collection
and any custom
field added to the front-matter).
Every time you run jekyll algolia
, a full build of the website is run locally,
but only records that were changed since your last build will be updated in your
index.
Advanced configuration
The plugin should work out of the box for most websites, but there are options
you can tweak if needed. All the options should be added under the algolia
section of your _config.yml
file.
nodes_to_index
By default, each page of your website will be split into chunks based on this
CSS selector. The default value of p
means that one record will be created for
each <p>
in your generated content.
If you would like to index other elements, like <blockquote>
,
<li>
or a custom <div class="paragraph">
. If so, you should edit the value
like this:
algolia:
# Also index quotes, list items and custom paragraphs
nodes_to_index: 'p,blockquote,li,div.paragraph'
extensions_to_index
By default, HTML and Markdown files will be indexed. If you are using another markup language (such as AsciiDoc or Textile, then you should overwrite this option.
algolia:
# Also index AsciiDoc and Textile files
extensions_to_index: 'html,md,adoc,textile'
files_to_exclude
The plugin will try to be smart in the pages it should not index. Some files
will always be excluded from the indexing (static assets, custom 404 and
pagination pages). Others are handled by the files_to_exclude
option.
By default it will exclude all the index.html
and index.md
files. Those
files are usually not containing much text (landing pages) or containing
redundant text (latest blog articles) so we decided to exclude them by default.
If you actually want to index those files, you should set the value to an empty array.
algolia:
# Actually index the index.html/index.md pages
files_to_exclude: []
If you want to exclude more files, you should add them to the array:
algolia:
# Exclude more files from indexing
files_to_exclude:
- index.html
- index.md
- excluded-file.html
- /_posts/2017-01-20-date-to-forget.md
settings
By default the plugin will configure your Algolia index with settings tailored
to the format of the extracted records. You are of course free to overwrite
them or configure them as best suits your needs. Every option passed to the
settings
entry will passed to a call to set_settings.
For example if you want to change the HTML tag used for the highlighting, you can overwrite it like this:
algolia:
settings:
highlightPreTag: '<em class="custom_highlight">'
highlightPostTag: '</em>'
indexing_batch_size
The Algolia API allows you to send batches of changes to add or update several records at once, instead of doing one HTTP call per record. The plugin will batch updates by groups of 1000 records.
If you are on an unstable internet connection, you might want to decrease the value. You will send more batches, but each will be smaller in size.
algolia:
# Send fewer records per batch
indexing_batch_size: 500
indexing_mode
Synchronizing your local data with your Algolia index can be done in different
ways. By default, the plugin will use the diff
indexing mode but you might
also be interested in the atomic
mode.
diff
(default)
By default, the plugin will try to be smart when pushing content to your index: it will only push new records and delete old ones insted of overwriting everything.
To do so, we first need to grab the list of all records residing in your index, then comparing them with the one generated locally. We then delete the old records that no longer exists, and then add the newly created record.
The main advantage is that it will consume very few operations in your Algolia quota. The drawback is that it will put your index into an inconsistent state for a few seconds (records were deleted, but new one were not yet added). Users doing a search on your website at that time might have incomplete results.
atomic
Using the atomic
indexing mode, your users will never search into an
inconsistent index. They will either be searching into the index containing the
old data, or the one containing the new data, but never in an intermediate
state.
To do so, the plugin will actually push all data to a temporary index first. Once everything is copied and configured, it will then overwrite the old index with the temporary one.
The main advantage is that it will be completly transparent for your users. The drawback is that it will consume much more operations as you will have to push all your records to a new index each time.
Thanks
Thanks to Anatoliy Yastreb for a great tutorial on creating Jekyll plugins.