Splitting README into doc pages
This commit is contained in:
parent
c8272bc341
commit
71ac2c9398
335
README.md
335
README.md
@ -8,28 +8,6 @@ Status][3]](https://coveralls.io/github/algolia/jekyll-algolia?branch=master)
|
||||
|
||||
Jekyll plugin to automatically index your content on Algolia.
|
||||
|
||||
## ⚠ Unreleased beta version
|
||||
|
||||
This plugin has not **yet** been released on Rubygems. If you wish to try it,
|
||||
simply point your `Gemfile` to the `develop` branch of this repo:
|
||||
|
||||
```ruby
|
||||
group :jekyll_plugins do
|
||||
gem "jekyll-algolia", git: "https://github.com/algolia/jekyll-algolia", branch: "develop"
|
||||
end
|
||||
```
|
||||
|
||||
Alternatively, clone the repository first and then update your site's `Gemfile`
|
||||
to point to the path on disk like this:
|
||||
|
||||
```ruby
|
||||
group :jekyll_plugins do
|
||||
gem "jekyll-algolia", :path => "/path/to/the/cloned/repo"
|
||||
end
|
||||
```
|
||||
|
||||
Feedback very welcome!
|
||||
|
||||
## Usage
|
||||
|
||||
```shell
|
||||
@ -38,21 +16,20 @@ $ bundle exec jekyll algolia
|
||||
|
||||
This will push the content of your Jekyll website to your Algolia index.
|
||||
|
||||
## Documentation
|
||||
|
||||
Official documentation can be found on
|
||||
[https://community.algolia.com/jekyll-algolia/](https://community.algolia.com/jekyll-algolia/)
|
||||
|
||||
## Installation
|
||||
|
||||
The plugin requires a minimum version of Jekyll 3.6.0 and a minimum Ruby
|
||||
version of 2.3.0.
|
||||
The plugin requires at least Jekyll 3.6.0 and Ruby 2.3.0.
|
||||
|
||||
First, add the `jekyll-algolia` gem to your `Gemfile`, in the `:jekyll_plugins`
|
||||
section.
|
||||
|
||||
If you do not have a `Gemfile` already, here is the minimal content to get
|
||||
your started. You will also need [Bundler][8] to be able to use the `Gemfile`.
|
||||
|
||||
```ruby
|
||||
source 'https://rubygems.org'
|
||||
|
||||
gem 'jekyll', '~> 3.6'
|
||||
# Gemfile
|
||||
|
||||
group :jekyll_plugins do
|
||||
gem 'jekyll-algolia'
|
||||
@ -61,40 +38,17 @@ end
|
||||
|
||||
Once this is done, download all dependencies with `bundle install`.
|
||||
|
||||
If everything went well, you should be able to run `jekyll help` and see the
|
||||
`algolia` subcommand listed.
|
||||
|
||||
## Basic configuration
|
||||
|
||||
You need to provide certain Algolia credentials for this plugin to successfully
|
||||
*index* your site.
|
||||
You need to provide certain Algolia credentials for this plugin to *index* your
|
||||
site.
|
||||
|
||||
*If you don't yet have an Algolia account, you can open a free [Community plan
|
||||
here][9]. Once signed in, you can get your credentials from
|
||||
[your dashboard][10].*
|
||||
|
||||
The plugin will try to fetch the credentials from your environment-variables
|
||||
hash and fallback to your Jekyll configuration if not found.
|
||||
|
||||
To pass the credentials as ENV variables, you can do so at the same time when
|
||||
you run the `jekyll algolia` command
|
||||
|
||||
```sh
|
||||
# for example
|
||||
|
||||
ALGOLIA_APPLICATION_ID='your_application_id' jekyll algolia
|
||||
```
|
||||
The valid ENV variables are:
|
||||
|
||||
key | value
|
||||
---------------------- | ----------------------
|
||||
ALGOLIA_APPLICATION_ID | `your_application_id`
|
||||
ALGOLIA_API_KEY | `your_api_key`
|
||||
ALGOLIA_INDEX_NAME | `your_index_name`
|
||||
|
||||
As a fallback measure, the plugin also checks if `application_id` and
|
||||
`index_name` are available under `algolia` key in your `_config.yml`
|
||||
file like this:
|
||||
Once you have your credentials, you should define your `application_id` and
|
||||
`index_name` inside your `_config.yml` file like this:
|
||||
|
||||
```yaml
|
||||
# _config.yml
|
||||
@ -104,271 +58,16 @@ algolia:
|
||||
index_name: 'your_index_name'
|
||||
```
|
||||
|
||||
### ⚠ Other, unsecure, method ⚠
|
||||
## Run it
|
||||
|
||||
You can also store your confidential API key in a file named `_algolia_api_key`,
|
||||
in your source directory. If you do this we __very, very, very strongly__
|
||||
encourage you to make sure the file is not tracked in your versioning system.
|
||||
Once your credentials are setup, you can run the indexing by running the
|
||||
following command:
|
||||
|
||||
## How it works
|
||||
|
||||
The plugin will work like a `jekyll build` run, but instead of writing `.html`
|
||||
files to disk, it will push content to Algolia.
|
||||
|
||||
It will split each page of your website into small chunks (by default, one per
|
||||
`<p>` paragraph) and then push each chunk as a new record to Algolia. Splitting
|
||||
records that way yields a better relevance of results even on long pages.
|
||||
|
||||
The placement of each paragraph in the page heading hierarchy (title, subtitles
|
||||
through `<h1>` to `<h6>`) is also taken into account to further improve
|
||||
relevance of results.
|
||||
|
||||
Each record will also contain metadata about the page it was extracted from
|
||||
(including `slug`, `url`, `tags`, `categories`, `collection` and any custom
|
||||
field added to the front-matter).
|
||||
|
||||
Every time you run `jekyll algolia`, a full build of the website is run locally,
|
||||
but only records that were changed since your last build will be updated in your
|
||||
index.
|
||||
|
||||
## Advanced configuration
|
||||
|
||||
The plugin should work out of the box for most websites, but there are options
|
||||
you can tweak if needed. All the options should be added under the `algolia`
|
||||
section of your `_config.yml` file.
|
||||
|
||||
### `nodes_to_index`
|
||||
|
||||
By default, each page of your website will be split into chunks based on this
|
||||
CSS selector. The default value of `p` means that one record will be created for
|
||||
each `<p>` in your generated content.
|
||||
|
||||
If you would like to index other elements, like `<blockquote>`,
|
||||
`<li>` or a custom `<div class="paragraph">`. If so, you should edit the value
|
||||
like this:
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Also index quotes, list items and custom paragraphs
|
||||
nodes_to_index: 'p,blockquote,li,div.paragraph'
|
||||
```shell
|
||||
ALGOLIA_API_KEY='{your_admin_api_key}' bundle exec jekyll algolia
|
||||
```
|
||||
|
||||
### `extensions_to_index`
|
||||
|
||||
By default, pages for which the source are HTML or Markdown files will be indexed.
|
||||
If you are using another markup language (such as [AsciiDoc][11]
|
||||
or [Textile][12], then you should overwrite this option.
|
||||
|
||||
For example, the `md` extension here means that `*.md` source files will be computed
|
||||
to generate their HTML version, which will be indexed.
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Also index AsciiDoc and Textile files
|
||||
extensions_to_index: 'html,md,adoc,textile'
|
||||
```
|
||||
|
||||
### `files_to_exclude`
|
||||
|
||||
The plugin will try to be smart in the pages it should __not__ index. Some files
|
||||
will always be excluded from the indexing (static assets, custom 404 and
|
||||
pagination pages). Others are handled by the `files_to_exclude` option.
|
||||
|
||||
By default it will exclude all the `index.html` and `index.md` files. Those
|
||||
files are usually not containing much text (landing pages) or containing
|
||||
redundant text (latest blog articles) so we decided to exclude them by default.
|
||||
|
||||
If you actually want to index those files, you should set the value to an empty
|
||||
array.
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Actually index the index.html/index.md pages
|
||||
files_to_exclude: []
|
||||
```
|
||||
|
||||
If you want to exclude more files, you should add them to the array. Note that
|
||||
you can use glob patterns to exclude several files at once.
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Exclude more files from indexing
|
||||
files_to_exclude:
|
||||
- index.html
|
||||
- index.md
|
||||
- excluded-file.html
|
||||
- _posts/2017-01-20-date-to-forget.md
|
||||
- subdirectory/*.html
|
||||
```
|
||||
|
||||
### `settings`
|
||||
|
||||
By default the plugin will configure your Algolia index with settings tailored
|
||||
to the format of the extracted records. You are of course free to overwrite
|
||||
them or configure them as best suits your needs. Every option passed to the
|
||||
`settings` entry will passed to a call to [set_settings][13].
|
||||
|
||||
For example if you want to change the HTML tag used for the highlighting, you
|
||||
can overwrite it like this:
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
settings:
|
||||
highlightPreTag: '<em class="custom_highlight">'
|
||||
highlightPostTag: '</em>'
|
||||
```
|
||||
|
||||
### `indexing_batch_size`
|
||||
|
||||
The Algolia API allows you to send batches of changes to add or update several
|
||||
records at once, instead of doing one HTTP call per record. The plugin will
|
||||
batch updates by groups of 1000 records.
|
||||
|
||||
If you are on an unstable internet connection, you might want to decrease the
|
||||
value. You will send more batches, but each will be smaller in size.
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Send fewer records per batch
|
||||
indexing_batch_size: 500
|
||||
```
|
||||
|
||||
### `indexing_mode`
|
||||
|
||||
Synchronizing your local data with your Algolia index can be done in different
|
||||
ways. By default, the plugin will use the `diff` indexing mode but you might
|
||||
also be interested in the `atomic` mode.
|
||||
|
||||
#### `diff` (default)
|
||||
|
||||
By default, the plugin will try to be smart when pushing content to your index:
|
||||
it will only push new records and delete old ones insted of overwriting
|
||||
everything.
|
||||
|
||||
To do so, we first need to grab the list of all records residing in
|
||||
your index, then comparing them with the one generated locally. We then delete
|
||||
the old records that no longer exists, and then add the newly created record.
|
||||
|
||||
The main advantage is that it will consume very few operations in your Algolia
|
||||
quota. The drawback is that it will put your index into an inconsistent state
|
||||
for a few seconds (records were deleted, but new one were not yet added). Users
|
||||
doing a search on your website at that time might have incomplete results.
|
||||
|
||||
#### `atomic`
|
||||
|
||||
Using the `atomic` indexing mode, your users will never search into an
|
||||
inconsistent index. They will either be searching into the index containing the
|
||||
old data, or the one containing the new data, but never in an intermediate
|
||||
state.
|
||||
|
||||
To do so, the plugin will actually push all data to a temporary index first.
|
||||
Once everything is copied and configured, it will then overwrite the old index
|
||||
with the temporary one.
|
||||
|
||||
The main advantage is that it will be completly transparent for your users. The
|
||||
drawback is that it will consume much more operations as you will have to push
|
||||
all your records to a new index each time.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<!-- ## Custom hooks -->
|
||||
<!-- -->
|
||||
<!-- -->
|
||||
<!-- def self.hook_should_be_excluded?(_filepath) -->
|
||||
<!-- def self.hook_before_indexing_each(record, _node) -->
|
||||
<!-- def self.hook_before_indexing_all(records) -->
|
||||
|
||||
<!-- ## Command line -->
|
||||
<!-- -->
|
||||
<!-- Here is the list of command line options you can pass to the `jekyll algolia -->
|
||||
<!-- push` command: -->
|
||||
<!-- -->
|
||||
<!-- | Flag | Description | -->
|
||||
<!-- | ---- | ----- | -->
|
||||
<!-- | `--config ./_config.yml` | You can here specify the config file to use. Default is `_config.yml` | -->
|
||||
<!-- | `--future` | With this flag, the command will also index posts with a future date | -->
|
||||
<!-- | `--limit_posts 10` | Limits the number of posts to parse and index | -->
|
||||
<!-- | `--drafts` | Index drafts in the `_drafts` folder as well | -->
|
||||
<!-- | `--dry-run` or `-n` | Do a dry run, do not actually push anything to your index | -->
|
||||
<!-- | `--verbose` | Display more information about what is going to be indexed | -->
|
||||
|
||||
|
||||
<!-- ## Searching -->
|
||||
<!-- -->
|
||||
<!-- This plugin will index your data in your Algolia index. Building the front-end -->
|
||||
<!-- search is of the scope of this plugin, but you can follow [our tutorials][14] or -->
|
||||
<!-- use our forked version of the popular [Hyde theme][15]. -->
|
||||
<!-- -->
|
||||
<!-- ## GitHub Pages -->
|
||||
<!-- -->
|
||||
<!-- The initial goal of the plugin was to allow anyone to have access to great -->
|
||||
<!-- search, even on a static website hosted on GitHub pages. -->
|
||||
<!-- -->
|
||||
<!-- But GitHub does not allow custom plugins to be run on GitHub Pages. -->
|
||||
<!-- This means that you'll either have to run `bundle exec jekyll algolia push` -->
|
||||
<!-- manually, or configure a CI environment (like [Travis][16] to do it for you. -->
|
||||
<!-- -->
|
||||
<!-- [Travis CI][17] is an hosted continuous integration -->
|
||||
<!-- service, and it's free for open-source projects. Properly configured, it can -->
|
||||
<!-- automatically reindex your data whenever you push to `gh-pages`. -->
|
||||
<!-- -->
|
||||
<!-- For it to work, you'll have 3 steps to perform. -->
|
||||
<!-- -->
|
||||
<!-- ### 1. Create a `.travis.yml` file -->
|
||||
<!-- -->
|
||||
<!-- Create a file named `.travis.yml` at the root of your project, with the -->
|
||||
<!-- following content: -->
|
||||
<!-- -->
|
||||
<!-- ```yml -->
|
||||
<!-- language: ruby -->
|
||||
<!-- cache: bundler -->
|
||||
<!-- branches: -->
|
||||
<!-- only: -->
|
||||
<!-- - gh-pages -->
|
||||
<!-- script: -->
|
||||
<!-- - bundle exec jekyll algolia push -->
|
||||
<!-- rvm: -->
|
||||
<!-- - 2.2 -->
|
||||
<!-- ``` -->
|
||||
<!-- -->
|
||||
<!-- This file will be read by Travis and instruct it to fetch all dependencies -->
|
||||
<!-- defined in the `Gemfile`, then run `jekyll algolia push`. This will be -->
|
||||
<!-- triggered when data is pushed to the `gh-pages` branch. -->
|
||||
<!-- -->
|
||||
<!-- ### 2. Update your `_config.yml` file to exclude `vendor` -->
|
||||
<!-- -->
|
||||
<!-- Travis will download all you `Gemfile` dependencies into a directory named -->
|
||||
<!-- `vendor`. You have to tell Jekyll to ignore this directory, otherwise Jekyll -->
|
||||
<!-- will try to parse it (and fail). -->
|
||||
<!-- -->
|
||||
<!-- Doing so is easy, add the following line to your `_config.yml` file: -->
|
||||
<!-- -->
|
||||
<!-- ```yml -->
|
||||
<!-- exclude: [vendor] -->
|
||||
<!-- ``` -->
|
||||
<!-- -->
|
||||
<!-- ### 3. Configure Travis -->
|
||||
<!-- -->
|
||||
<!-- In order for Travis to be able to push data to your index on your behalf, you -->
|
||||
<!-- have to give it your write API Key. This is achieved by defining an -->
|
||||
<!-- `ALGOLIA_API_KEY` [environment variable][18] in Travis settings. -->
|
||||
<!-- -->
|
||||
<!-- You should also uncheck the "Build pull requests" option, otherwise any pull -->
|
||||
<!-- request targeting `gh-pages` will trigger the reindexing. -->
|
||||
<!-- -->
|
||||
<!-- ![Travis Configuration][19] -->
|
||||
<!-- -->
|
||||
<!-- ### Done -->
|
||||
<!-- -->
|
||||
<!-- Commit all the changes to the files, and then push to `gh-pages`. Travis will -->
|
||||
<!-- catch the event and trigger your indexing for you. You can follow the Travis job -->
|
||||
<!-- execution directly on [their website][20]. -->
|
||||
<!-- -->
|
||||
<!-- ## FAQS -->
|
||||
Note that `ALGOLIA_API_KEY` should be set to your admin API key.
|
||||
|
||||
# Thanks
|
||||
|
||||
|
@ -10,3 +10,54 @@ they do. Includes ENV variables as well.
|
||||
|
||||
|
||||
|
||||
## Command line
|
||||
|
||||
Here is the list of command line options you can pass to the `jekyll algolia
|
||||
push` command:
|
||||
|
||||
| Flag | Description |
|
||||
| ---- | ----- |
|
||||
| `--config ./_config.yml` | You can here specify the config file to use. Default is `_config.yml` |
|
||||
| `--future` | With this flag, the command will also index posts with a future date |
|
||||
| `--limit_posts 10` | Limits the number of posts to parse and index |
|
||||
| `--drafts` | Index drafts in the `_drafts` folder as well |
|
||||
| `--dry-run` or `-n` | Do a dry run, do not actually push anything to your index |
|
||||
| `--verbose` | Display more information about what is going to be indexed |
|
||||
|
||||
|
||||
## Environment variables
|
||||
|
||||
The recommended place to store your Algolia application ID and index name are in
|
||||
the `_config.yml` file but there are a few environment variables your can define
|
||||
to overwrite those values.
|
||||
|
||||
key | value
|
||||
---------------------- | ----------------------
|
||||
ALGOLIA_APPLICATION_ID | `your_application_id`
|
||||
ALGOLIA_API_KEY | `your_api_key`
|
||||
ALGOLIA_INDEX_NAME | `your_index_name`
|
||||
|
||||
|
||||
## `_algolia_api_key` file
|
||||
|
||||
The recommended way to define your Algolia admin API key is to use the
|
||||
`ALGOLIA_API_KEY` environment variable. Because this key should be kept secret,
|
||||
its better if it's loaded directly from the environment.
|
||||
|
||||
But the plugin can also load the key from another source. **Note that this
|
||||
method trades off security for convenience, so be very careful when using it.**
|
||||
|
||||
You can create a file named `_algolia_api_key` in your source directory that
|
||||
contains your admin API key. If no `ALGOLIA_API_KEY` environment variable is
|
||||
defined, the plugin will fallback to the value set in the `_algolia_api_key`
|
||||
file.
|
||||
|
||||
**Do not commit this file in your versioning system**. This API key has write
|
||||
access to your index, so you have to keep it secret. For example, you should
|
||||
add `_algolia_api_key` to your `.gitignore` file. It contains your private API
|
||||
key
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -61,7 +61,7 @@
|
||||
},
|
||||
{
|
||||
"name": "Documentation",
|
||||
"url": "configuration.html"
|
||||
"url": "options.html"
|
||||
},
|
||||
{
|
||||
"name": "Examples",
|
||||
|
9
docs-src/src/examples.md
Normal file
9
docs-src/src/examples.md
Normal file
@ -0,0 +1,9 @@
|
||||
---
|
||||
title: Examples
|
||||
layout: content-with-menu.pug
|
||||
---
|
||||
|
||||
- Dropdown
|
||||
- Blog (Hyde)
|
||||
- Jekyll plugins
|
||||
|
@ -41,18 +41,40 @@ If everything went well, you should be able to run `jekyll help` and see the
|
||||
|
||||
## Configuration
|
||||
|
||||
You need to provide certain Algolia credentials for this plugin to *index* your
|
||||
You need to provide your Algolia credentials for this plugin to *index* your
|
||||
site.
|
||||
|
||||
*If you don't yet have an Algolia account, you can open a free [Community plan
|
||||
here][4]. Once signed in, you can get your credentials from
|
||||
[your dashboard][5].*
|
||||
|
||||
The plugin will try to fetch the credentials from your environment-variables
|
||||
hash and fallback to your Jekyll configuration if not found.
|
||||
Once you have your credentials, you should define your `application_id` and
|
||||
`index_name` inside your `_config.yml` file like this:
|
||||
|
||||
To pass the credentials as ENV variables, you can do so at the same time when
|
||||
you run the `jekyll algolia` command
|
||||
```yaml
|
||||
# _config.yml
|
||||
|
||||
algolia:
|
||||
application_id: 'your_application_id'
|
||||
index_name: 'your_index_name'
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Once your credentials are setup, you can run the indexing by running the
|
||||
following command:
|
||||
|
||||
```shell
|
||||
ALGOLIA_API_KEY='{your_admin_api_key}' bundle exec jekyll algolia
|
||||
```
|
||||
|
||||
Note that `ALGOLIA_API_KEY` should be set to your admin API key. This key has
|
||||
write access to your index so will be able to push new data. This is also why
|
||||
you have to set it on the command line and not in the `_config.yml` file: you
|
||||
want to keep this key secret and not commit it to your versioning system.
|
||||
|
||||
_Note that the method can be simplified to `jekyll algolia` by using an
|
||||
[alternative way][6] of loading the API key and using [rubygems-bundler][7]._
|
||||
|
||||
|
||||
[1]: https://jekyllrb.com/
|
||||
@ -60,3 +82,5 @@ you run the `jekyll algolia` command
|
||||
[3]: http://bundler.io/
|
||||
[4]: https://www.algolia.com/users/sign_up/hacker
|
||||
[5]: https://www.algolia.com/licensing
|
||||
[6]: ./commandline.html#algolia-api-key-file
|
||||
[7]: https://github.com/rvm/rubygems-bundler
|
||||
|
@ -13,3 +13,68 @@ and put all the info so it builds automatically each time.
|
||||
|
||||
|
||||
|
||||
## GitHub Pages
|
||||
|
||||
The initial goal of the plugin was to allow anyone to have access to great
|
||||
search, even on a static website hosted on GitHub pages.
|
||||
|
||||
But GitHub does not allow custom plugins to be run on GitHub Pages.
|
||||
This means that you'll either have to run `bundle exec jekyll algolia push`
|
||||
manually, or configure a CI environment (like [Travis][16] to do it for you.
|
||||
|
||||
[Travis CI][17] is an hosted continuous integration
|
||||
service, and it's free for open-source projects. Properly configured, it can
|
||||
automatically reindex your data whenever you push to `gh-pages`.
|
||||
|
||||
For it to work, you'll have 3 steps to perform.
|
||||
|
||||
### 1. Create a `.travis.yml` file
|
||||
|
||||
Create a file named `.travis.yml` at the root of your project, with the
|
||||
following content:
|
||||
|
||||
```yml
|
||||
language: ruby
|
||||
cache: bundler
|
||||
branches:
|
||||
only:
|
||||
- gh-pages
|
||||
script:
|
||||
- bundle exec jekyll algolia push
|
||||
rvm:
|
||||
- 2.2
|
||||
```
|
||||
|
||||
This file will be read by Travis and instruct it to fetch all dependencies
|
||||
defined in the `Gemfile`, then run `jekyll algolia push`. This will be
|
||||
triggered when data is pushed to the `gh-pages` branch.
|
||||
|
||||
### 2. Update your `_config.yml` file to exclude `vendor`
|
||||
|
||||
Travis will download all you `Gemfile` dependencies into a directory named
|
||||
`vendor`. You have to tell Jekyll to ignore this directory, otherwise Jekyll
|
||||
will try to parse it (and fail).
|
||||
|
||||
Doing so is easy, add the following line to your `_config.yml` file:
|
||||
|
||||
```yml
|
||||
exclude: [vendor]
|
||||
```
|
||||
|
||||
### 3. Configure Travis
|
||||
|
||||
In order for Travis to be able to push data to your index on your behalf, you
|
||||
have to give it your write API Key. This is achieved by defining an
|
||||
`ALGOLIA_API_KEY` [environment variable][18] in Travis settings.
|
||||
|
||||
You should also uncheck the "Build pull requests" option, otherwise any pull
|
||||
request targeting `gh-pages` will trigger the reindexing.
|
||||
|
||||
![Travis Configuration][19]
|
||||
|
||||
### Done
|
||||
|
||||
Commit all the changes to the files, and then push to `gh-pages`. Travis will
|
||||
catch the event and trigger your indexing for you. You can follow the Travis job
|
||||
execution directly on [their website][20].
|
||||
|
||||
|
@ -11,3 +11,11 @@ a list of what can be done, then explain each hook and how it works
|
||||
|
||||
|
||||
|
||||
## Custom hooks
|
||||
|
||||
|
||||
```ruby
|
||||
def self.hook_should_be_excluded?(_filepath)
|
||||
def self.hook_before_indexing_each(record, _node)
|
||||
def self.hook_before_indexing_all(records)
|
||||
```
|
||||
|
@ -3,12 +3,29 @@ title: How does this work?
|
||||
layout: content-with-menu.pug
|
||||
---
|
||||
|
||||
# How does this work?
|
||||
## How does this work?
|
||||
|
||||
More tehcnical explanation, to explain the underlying principles of the plugin.
|
||||
That it acts as a an actual build, but instead of creating HTML files, it will
|
||||
pus records to Algolia.
|
||||
The plugin will work like a `jekyll build` run, but instead of writing `.html`
|
||||
files to disk, it will push content to Algolia. It will go through each file
|
||||
Jekyll would have processed in a regular build: pages, posts and collections.
|
||||
|
||||
What is a record? We will take each page and split it into several JSON objects
|
||||
and push thoses.
|
||||
It will not push the whole content of each page to Algolia as one record.
|
||||
Instead, it will split each page into small chunks (by default, one per
|
||||
`<p>` paragraph) and then push each chunk as a new record to Algolia. Splitting
|
||||
records that way allows for a more fine-tuned relevance even on long pages.
|
||||
|
||||
Each record created that way will contain a mix of specific data and shared
|
||||
data. Specific data will be the paragraph content, and information about its
|
||||
position in the page (where its situated in the hierarchy of headings in the
|
||||
page). Shared data is the metadata of the page it was extracted from (`slug`,
|
||||
`url`, `tags`, etc, as well as any custom field added to the front-matter).
|
||||
|
||||
Once displayed, results are grouped so only the best matching paragraph of each
|
||||
page is returned for a specific query. This greatly improves the perceived
|
||||
relevance of the search results.
|
||||
|
||||
Because the plugin is splitting each page into smaller chunks, it can be hard to get
|
||||
an estimate of how many records will actually be pushed. The plugin tries to be
|
||||
smart and consume as less operations as possible, but you can always run it in
|
||||
`--dry-run` mode to better understand what it would do.
|
||||
|
||||
|
@ -17,6 +17,42 @@ Show pros and cons of each
|
||||
|
||||
|
||||
|
||||
Synchronizing your local data with your Algolia index can be done in different
|
||||
ways. By default, the plugin will use the `diff` indexing mode but you might
|
||||
also be interested in the `atomic` mode.
|
||||
|
||||
#### `diff` (default)
|
||||
|
||||
By default, the plugin will try to be smart when pushing content to your index:
|
||||
it will only push new records and delete old ones insted of overwriting
|
||||
everything.
|
||||
|
||||
To do so, we first need to grab the list of all records residing in
|
||||
your index, then comparing them with the one generated locally. We then delete
|
||||
the old records that no longer exists, and then add the newly created record.
|
||||
|
||||
The main advantage is that it will consume very few operations in your Algolia
|
||||
quota. The drawback is that it will put your index into an inconsistent state
|
||||
for a few seconds (records were deleted, but new one were not yet added). Users
|
||||
doing a search on your website at that time might have incomplete results.
|
||||
|
||||
#### `atomic`
|
||||
|
||||
Using the `atomic` indexing mode, your users will never search into an
|
||||
inconsistent index. They will either be searching into the index containing the
|
||||
old data, or the one containing the new data, but never in an intermediate
|
||||
state.
|
||||
|
||||
To do so, the plugin will actually push all data to a temporary index first.
|
||||
Once everything is copied and configured, it will then overwrite the old index
|
||||
with the temporary one.
|
||||
|
||||
The main advantage is that it will be completly transparent for your users. The
|
||||
drawback is that it will consume much more operations as you will have to push
|
||||
all your records to a new index each time.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -5,10 +5,123 @@ layout: content-with-menu.pug
|
||||
|
||||
# Options
|
||||
|
||||
List of all options that can be passed in the \_config.yml file. A subheading
|
||||
for each of them, explaining what it does. Some of those options might require
|
||||
a more precise page to explain.
|
||||
The plugin should work out of the box for most websites, but there are options
|
||||
you can tweak if needed. All the options should be added under the `algolia`
|
||||
section of your `_config.yml` file.
|
||||
|
||||
This will be content heavy
|
||||
You should be familiar with [how this plugin works][1] under
|
||||
the hood to better understand what some options are doing.
|
||||
|
||||
### `extensions_to_index`
|
||||
|
||||
This options defines which source files should be indexed, based on their file
|
||||
extension. If an extension is not in the list, then the file will not be
|
||||
indexed.
|
||||
|
||||
By default, all HTML and markdown source files will be indexed.
|
||||
|
||||
If you are using another markup language (such as [AsciiDoc][2] or [Textile][3],
|
||||
you might want to update the value like this:
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Also index AsciiDoc and Textile files
|
||||
extensions_to_index: 'html,md,adoc,textile'
|
||||
```
|
||||
|
||||
### `files_to_exclude`
|
||||
|
||||
This option lets you define a blacklist of source files you don't want to index.
|
||||
|
||||
By default it will exclude all the `index.html` and `index.md` files. Those
|
||||
files are usually not containing much text (landing pages) or containing
|
||||
redundant text (latest blog articles) so they are not included by default.
|
||||
|
||||
If you want to index those files, you should set the value to an empty array.
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Actually index the index.html/index.md pages
|
||||
files_to_exclude: []
|
||||
```
|
||||
|
||||
If you want to exclude more files, you should add them to the array. Note that
|
||||
you can use glob patterns to exclude several files at once.
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Exclude more files from indexing
|
||||
files_to_exclude:
|
||||
- index.html
|
||||
- index.md
|
||||
- excluded-file.html
|
||||
- _posts/2017-01-20-date-to-forget.md
|
||||
- subdirectory/*.html
|
||||
```
|
||||
|
||||
_Note that some files (pagination pages, static assets, etc) will **always** be
|
||||
excluded and you don't have to specify them._
|
||||
|
||||
### `nodes_to_index`
|
||||
|
||||
This options defines how each page is split into chunks. It expects
|
||||
a CSS selector that will be applied on the HTML content generated by Jekyll.
|
||||
Each matching node will be indexed as a different record.
|
||||
|
||||
The default value is `p`, meaning that one record will be created for each `<p>`
|
||||
paragraph of content.
|
||||
|
||||
If you would like to index other elements, like `<blockquote>`, `<li>` or
|
||||
a custom `<div class="paragraph">`, you should edit the value like this:
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Also index quotes, list items and custom paragraphs
|
||||
nodes_to_index: 'p,blockquote,li,div.paragraph'
|
||||
```
|
||||
|
||||
### `indexing_mode`
|
||||
|
||||
This option will let you choose the strategy used to sync your data with your
|
||||
Algolia index. The default value should work for most cases, but feel free to
|
||||
[read the pros and cons][4] of each and pick the one best suited for your needs.
|
||||
|
||||
### `settings`
|
||||
|
||||
This option let you pass specific settings to your Algolia index.
|
||||
|
||||
By default the plugin will configure your Algolia index with settings tailored
|
||||
to the format of the extracted records. You are of course free to overwrite
|
||||
them or configure them as best suits your needs. Every option passed to the
|
||||
`settings` entry will be set as [settings to your index][5].
|
||||
|
||||
For example if you want to change the HTML tag used for the highlighting, you
|
||||
can overwrite it like this:
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
settings:
|
||||
highlightPreTag: '<em class="custom_highlight">'
|
||||
highlightPostTag: '</em>'
|
||||
```
|
||||
|
||||
### `indexing_batch_size`
|
||||
|
||||
The Algolia API allows you to send batches of changes to add or update several
|
||||
records at once, instead of doing one HTTP call per record. The plugin will
|
||||
batch updates by groups of 1000 records by default.
|
||||
|
||||
If you are on an unstable internet connection, you might want to decrease the
|
||||
value. You will send more batches, but each will be smaller in size.
|
||||
|
||||
```yml
|
||||
algolia:
|
||||
# Send fewer records per batch
|
||||
indexing_batch_size: 500
|
||||
```
|
||||
|
||||
[1]: ./how-it-works.html
|
||||
[2]: http://www.methods.co.nz/asciidoc/
|
||||
[3]: https://github.com/textile
|
||||
[4]: ./indexing-modes.html
|
||||
[5]: https://www.algolia.com/doc/api-reference/api-methods/set-settings/?language=ruby#set-settings
|
||||
|
Loading…
x
Reference in New Issue
Block a user