update badges and documentation on supported Ruby and Jekyll versions
Jekyll Algolia Plugin
Jekyll plugin to automatically index your content on Algolia.
⚠ Unreleased beta version
This plugin has not yet been released on Rubygems. If you wish to try it,
simply point your Gemfile
to the develop
branch of this repo:
group :jekyll_plugins do
gem "jekyll-algolia", git: "https://github.com/algolia/jekyll-algolia", branch: "develop"
end
Alternatively, clone the repository first and then update your site's Gemfile
to point to the path on disk like this:
group :jekyll_plugins do
gem "jekyll-algolia", :path => "/path/to/the/cloned/repo"
end
Feedback very welcome!
Usage
$ bundle exec jekyll algolia
This will push the content of your Jekyll website to your Algolia index.
Installation
The plugin requires a minimum version of Jekyll 3.6.0 and a minimum Ruby version of 2.3.0.
First, add the jekyll-algolia
gem to your Gemfile
, in the :jekyll_plugins
section.
If you do not have a Gemfile
already, here is the minimal content to get
your started. You will also need Bundler to be able to use the Gemfile
.
source 'https://rubygems.org'
gem 'jekyll', '~> 3.6'
group :jekyll_plugins do
gem 'jekyll-algolia'
end
Once this is done, download all dependencies with bundle install
.
If everything went well, you should be able to run jekyll help
and see the
algolia
subcommand listed.
Basic configuration
You need to provide certain Algolia credentials for this plugin to successfully index your site.
If you don't yet have an Algolia account, you can open a free Community plan here. Once signed in, you can get your credentials from your dashboard.
The plugin will try to fetch the credentials from your environment-variables hash and fallback to your Jekyll configuration if not found.
To pass the credentials as ENV variables, you can do so at the same time when
you run the jekyll algolia
command
# for example
ALGOLIA_APPLICATION_ID='your_application_id' jekyll algolia
The valid ENV variables are:
key | value |
---|---|
ALGOLIA_APPLICATION_ID | your_application_id |
ALGOLIA_API_KEY | your_api_key |
ALGOLIA_INDEX_NAME | your_index_name |
As a fallback measure, the plugin also checks if application_id
and
index_name
are available under algolia
key in your _config.yml
file like this:
# _config.yml
algolia:
application_id: 'your_application_id'
index_name: 'your_index_name'
⚠ Other, unsecure, method ⚠
You can also store your confidential API key in a file named _algolia_api_key
,
in your source directory. If you do this we very, very, very strongly
encourage you to make sure the file is not tracked in your versioning system.
How it works
The plugin will work like a jekyll build
run, but instead of writing .html
files to disk, it will push content to Algolia.
It will split each page of your website into small chunks (by default, one per
<p>
paragraph) and then push each chunk as a new record to Algolia. Splitting
records that way yields a better relevance of results even on long pages.
The placement of each paragraph in the page heading hierarchy (title, subtitles
through <h1>
to <h6>
) is also taken into account to further improve
relevance of results.
Each record will also contain metadata about the page it was extracted from
(including slug
, url
, tags
, categories
, collection
and any custom
field added to the front-matter).
Every time you run jekyll algolia
, a full build of the website is run locally,
but only records that were changed since your last build will be updated in your
index.
Advanced configuration
The plugin should work out of the box for most websites, but there are options
you can tweak if needed. All the options should be added under the algolia
section of your _config.yml
file.
nodes_to_index
By default, each page of your website will be split into chunks based on this
CSS selector. The default value of p
means that one record will be created for
each <p>
in your generated content.
If you would like to index other elements, like <blockquote>
,
<li>
or a custom <div class="paragraph">
. If so, you should edit the value
like this:
algolia:
# Also index quotes, list items and custom paragraphs
nodes_to_index: 'p,blockquote,li,div.paragraph'
extensions_to_index
By default, pages for which the source are HTML or Markdown files will be indexed. If you are using another markup language (such as AsciiDoc or Textile, then you should overwrite this option.
For example, the md
extension here means that *.md
source files will be computed
to generate their HTML version, which will be indexed.
algolia:
# Also index AsciiDoc and Textile files
extensions_to_index: 'html,md,adoc,textile'
files_to_exclude
The plugin will try to be smart in the pages it should not index. Some files
will always be excluded from the indexing (static assets, custom 404 and
pagination pages). Others are handled by the files_to_exclude
option.
By default it will exclude all the index.html
and index.md
files. Those
files are usually not containing much text (landing pages) or containing
redundant text (latest blog articles) so we decided to exclude them by default.
If you actually want to index those files, you should set the value to an empty array.
algolia:
# Actually index the index.html/index.md pages
files_to_exclude: []
If you want to exclude more files, you should add them to the array. Note that you can use glob patterns to exclude several files at once.
algolia:
# Exclude more files from indexing
files_to_exclude:
- index.html
- index.md
- excluded-file.html
- _posts/2017-01-20-date-to-forget.md
- subdirectory/*.html
settings
By default the plugin will configure your Algolia index with settings tailored
to the format of the extracted records. You are of course free to overwrite
them or configure them as best suits your needs. Every option passed to the
settings
entry will passed to a call to set_settings.
For example if you want to change the HTML tag used for the highlighting, you can overwrite it like this:
algolia:
settings:
highlightPreTag: '<em class="custom_highlight">'
highlightPostTag: '</em>'
indexing_batch_size
The Algolia API allows you to send batches of changes to add or update several records at once, instead of doing one HTTP call per record. The plugin will batch updates by groups of 1000 records.
If you are on an unstable internet connection, you might want to decrease the value. You will send more batches, but each will be smaller in size.
algolia:
# Send fewer records per batch
indexing_batch_size: 500
indexing_mode
Synchronizing your local data with your Algolia index can be done in different
ways. By default, the plugin will use the diff
indexing mode but you might
also be interested in the atomic
mode.
diff
(default)
By default, the plugin will try to be smart when pushing content to your index: it will only push new records and delete old ones insted of overwriting everything.
To do so, we first need to grab the list of all records residing in your index, then comparing them with the one generated locally. We then delete the old records that no longer exists, and then add the newly created record.
The main advantage is that it will consume very few operations in your Algolia quota. The drawback is that it will put your index into an inconsistent state for a few seconds (records were deleted, but new one were not yet added). Users doing a search on your website at that time might have incomplete results.
atomic
Using the atomic
indexing mode, your users will never search into an
inconsistent index. They will either be searching into the index containing the
old data, or the one containing the new data, but never in an intermediate
state.
To do so, the plugin will actually push all data to a temporary index first. Once everything is copied and configured, it will then overwrite the old index with the temporary one.
The main advantage is that it will be completly transparent for your users. The drawback is that it will consume much more operations as you will have to push all your records to a new index each time.
Thanks
Thanks to Anatoliy Yastreb for a great tutorial on creating Jekyll plugins.