Splitting README into doc pages

This commit is contained in:
Pixelastic 2017-12-14 15:12:44 +01:00
parent c8272bc341
commit 71ac2c9398
10 changed files with 356 additions and 334 deletions

335
README.md
View File

@ -8,28 +8,6 @@ Status][3]](https://coveralls.io/github/algolia/jekyll-algolia?branch=master)
Jekyll plugin to automatically index your content on Algolia.
## ⚠ Unreleased beta version
This plugin has not **yet** been released on Rubygems. If you wish to try it,
simply point your `Gemfile` to the `develop` branch of this repo:
```ruby
group :jekyll_plugins do
gem "jekyll-algolia", git: "https://github.com/algolia/jekyll-algolia", branch: "develop"
end
```
Alternatively, clone the repository first and then update your site's `Gemfile`
to point to the path on disk like this:
```ruby
group :jekyll_plugins do
gem "jekyll-algolia", :path => "/path/to/the/cloned/repo"
end
```
Feedback very welcome!
## Usage
```shell
@ -38,21 +16,20 @@ $ bundle exec jekyll algolia
This will push the content of your Jekyll website to your Algolia index.
## Documentation
Official documentation can be found on
[https://community.algolia.com/jekyll-algolia/](https://community.algolia.com/jekyll-algolia/)
## Installation
The plugin requires a minimum version of Jekyll 3.6.0 and a minimum Ruby
version of 2.3.0.
The plugin requires at least Jekyll 3.6.0 and Ruby 2.3.0.
First, add the `jekyll-algolia` gem to your `Gemfile`, in the `:jekyll_plugins`
section.
If you do not have a `Gemfile` already, here is the minimal content to get
your started. You will also need [Bundler][8] to be able to use the `Gemfile`.
```ruby
source 'https://rubygems.org'
gem 'jekyll', '~> 3.6'
# Gemfile
group :jekyll_plugins do
gem 'jekyll-algolia'
@ -61,40 +38,17 @@ end
Once this is done, download all dependencies with `bundle install`.
If everything went well, you should be able to run `jekyll help` and see the
`algolia` subcommand listed.
## Basic configuration
You need to provide certain Algolia credentials for this plugin to successfully
*index* your site.
You need to provide certain Algolia credentials for this plugin to *index* your
site.
*If you don't yet have an Algolia account, you can open a free [Community plan
here][9]. Once signed in, you can get your credentials from
[your dashboard][10].*
The plugin will try to fetch the credentials from your environment-variables
hash and fallback to your Jekyll configuration if not found.
To pass the credentials as ENV variables, you can do so at the same time when
you run the `jekyll algolia` command
```sh
# for example
ALGOLIA_APPLICATION_ID='your_application_id' jekyll algolia
```
The valid ENV variables are:
key | value
---------------------- | ----------------------
ALGOLIA_APPLICATION_ID | `your_application_id`
ALGOLIA_API_KEY | `your_api_key`
ALGOLIA_INDEX_NAME | `your_index_name`
As a fallback measure, the plugin also checks if `application_id` and
`index_name` are available under `algolia` key in your `_config.yml`
file like this:
Once you have your credentials, you should define your `application_id` and
`index_name` inside your `_config.yml` file like this:
```yaml
# _config.yml
@ -104,271 +58,16 @@ algolia:
index_name: 'your_index_name'
```
### ⚠ Other, unsecure, method ⚠
## Run it
You can also store your confidential API key in a file named `_algolia_api_key`,
in your source directory. If you do this we __very, very, very strongly__
encourage you to make sure the file is not tracked in your versioning system.
Once your credentials are setup, you can run the indexing by running the
following command:
## How it works
The plugin will work like a `jekyll build` run, but instead of writing `.html`
files to disk, it will push content to Algolia.
It will split each page of your website into small chunks (by default, one per
`<p>` paragraph) and then push each chunk as a new record to Algolia. Splitting
records that way yields a better relevance of results even on long pages.
The placement of each paragraph in the page heading hierarchy (title, subtitles
through `<h1>` to `<h6>`) is also taken into account to further improve
relevance of results.
Each record will also contain metadata about the page it was extracted from
(including `slug`, `url`, `tags`, `categories`, `collection` and any custom
field added to the front-matter).
Every time you run `jekyll algolia`, a full build of the website is run locally,
but only records that were changed since your last build will be updated in your
index.
## Advanced configuration
The plugin should work out of the box for most websites, but there are options
you can tweak if needed. All the options should be added under the `algolia`
section of your `_config.yml` file.
### `nodes_to_index`
By default, each page of your website will be split into chunks based on this
CSS selector. The default value of `p` means that one record will be created for
each `<p>` in your generated content.
If you would like to index other elements, like `<blockquote>`,
`<li>` or a custom `<div class="paragraph">`. If so, you should edit the value
like this:
```yml
algolia:
# Also index quotes, list items and custom paragraphs
nodes_to_index: 'p,blockquote,li,div.paragraph'
```shell
ALGOLIA_API_KEY='{your_admin_api_key}' bundle exec jekyll algolia
```
### `extensions_to_index`
By default, pages for which the source are HTML or Markdown files will be indexed.
If you are using another markup language (such as [AsciiDoc][11]
or [Textile][12], then you should overwrite this option.
For example, the `md` extension here means that `*.md` source files will be computed
to generate their HTML version, which will be indexed.
```yml
algolia:
# Also index AsciiDoc and Textile files
extensions_to_index: 'html,md,adoc,textile'
```
### `files_to_exclude`
The plugin will try to be smart in the pages it should __not__ index. Some files
will always be excluded from the indexing (static assets, custom 404 and
pagination pages). Others are handled by the `files_to_exclude` option.
By default it will exclude all the `index.html` and `index.md` files. Those
files are usually not containing much text (landing pages) or containing
redundant text (latest blog articles) so we decided to exclude them by default.
If you actually want to index those files, you should set the value to an empty
array.
```yml
algolia:
# Actually index the index.html/index.md pages
files_to_exclude: []
```
If you want to exclude more files, you should add them to the array. Note that
you can use glob patterns to exclude several files at once.
```yml
algolia:
# Exclude more files from indexing
files_to_exclude:
- index.html
- index.md
- excluded-file.html
- _posts/2017-01-20-date-to-forget.md
- subdirectory/*.html
```
### `settings`
By default the plugin will configure your Algolia index with settings tailored
to the format of the extracted records. You are of course free to overwrite
them or configure them as best suits your needs. Every option passed to the
`settings` entry will passed to a call to [set_settings][13].
For example if you want to change the HTML tag used for the highlighting, you
can overwrite it like this:
```yml
algolia:
settings:
highlightPreTag: '<em class="custom_highlight">'
highlightPostTag: '</em>'
```
### `indexing_batch_size`
The Algolia API allows you to send batches of changes to add or update several
records at once, instead of doing one HTTP call per record. The plugin will
batch updates by groups of 1000 records.
If you are on an unstable internet connection, you might want to decrease the
value. You will send more batches, but each will be smaller in size.
```yml
algolia:
# Send fewer records per batch
indexing_batch_size: 500
```
### `indexing_mode`
Synchronizing your local data with your Algolia index can be done in different
ways. By default, the plugin will use the `diff` indexing mode but you might
also be interested in the `atomic` mode.
#### `diff` (default)
By default, the plugin will try to be smart when pushing content to your index:
it will only push new records and delete old ones insted of overwriting
everything.
To do so, we first need to grab the list of all records residing in
your index, then comparing them with the one generated locally. We then delete
the old records that no longer exists, and then add the newly created record.
The main advantage is that it will consume very few operations in your Algolia
quota. The drawback is that it will put your index into an inconsistent state
for a few seconds (records were deleted, but new one were not yet added). Users
doing a search on your website at that time might have incomplete results.
#### `atomic`
Using the `atomic` indexing mode, your users will never search into an
inconsistent index. They will either be searching into the index containing the
old data, or the one containing the new data, but never in an intermediate
state.
To do so, the plugin will actually push all data to a temporary index first.
Once everything is copied and configured, it will then overwrite the old index
with the temporary one.
The main advantage is that it will be completly transparent for your users. The
drawback is that it will consume much more operations as you will have to push
all your records to a new index each time.
<!-- ## Custom hooks -->
<!-- -->
<!-- -->
<!-- def self.hook_should_be_excluded?(_filepath) -->
<!-- def self.hook_before_indexing_each(record, _node) -->
<!-- def self.hook_before_indexing_all(records) -->
<!-- ## Command line -->
<!-- -->
<!-- Here is the list of command line options you can pass to the `jekyll algolia -->
<!-- push` command: -->
<!-- -->
<!-- | Flag | Description | -->
<!-- | ---- | ----- | -->
<!-- | `--config ./_config.yml` | You can here specify the config file to use. Default is `_config.yml` | -->
<!-- | `--future` | With this flag, the command will also index posts with a future date | -->
<!-- | `--limit_posts 10` | Limits the number of posts to parse and index | -->
<!-- | `--drafts` | Index drafts in the `_drafts` folder as well | -->
<!-- | `--dry-run` or `-n` | Do a dry run, do not actually push anything to your index | -->
<!-- | `--verbose` | Display more information about what is going to be indexed | -->
<!-- ## Searching -->
<!-- -->
<!-- This plugin will index your data in your Algolia index. Building the front-end -->
<!-- search is of the scope of this plugin, but you can follow [our tutorials][14] or -->
<!-- use our forked version of the popular [Hyde theme][15]. -->
<!-- -->
<!-- ## GitHub Pages -->
<!-- -->
<!-- The initial goal of the plugin was to allow anyone to have access to great -->
<!-- search, even on a static website hosted on GitHub pages. -->
<!-- -->
<!-- But GitHub does not allow custom plugins to be run on GitHub Pages. -->
<!-- This means that you'll either have to run `bundle exec jekyll algolia push` -->
<!-- manually, or configure a CI environment (like [Travis][16] to do it for you. -->
<!-- -->
<!-- [Travis CI][17] is an hosted continuous integration -->
<!-- service, and it's free for open-source projects. Properly configured, it can -->
<!-- automatically reindex your data whenever you push to `gh-pages`. -->
<!-- -->
<!-- For it to work, you'll have 3 steps to perform. -->
<!-- -->
<!-- ### 1. Create a `.travis.yml` file -->
<!-- -->
<!-- Create a file named `.travis.yml` at the root of your project, with the -->
<!-- following content: -->
<!-- -->
<!-- ```yml -->
<!-- language: ruby -->
<!-- cache: bundler -->
<!-- branches: -->
<!-- only: -->
<!-- - gh-pages -->
<!-- script: -->
<!-- - bundle exec jekyll algolia push -->
<!-- rvm: -->
<!-- - 2.2 -->
<!-- ``` -->
<!-- -->
<!-- This file will be read by Travis and instruct it to fetch all dependencies -->
<!-- defined in the `Gemfile`, then run `jekyll algolia push`. This will be -->
<!-- triggered when data is pushed to the `gh-pages` branch. -->
<!-- -->
<!-- ### 2. Update your `_config.yml` file to exclude `vendor` -->
<!-- -->
<!-- Travis will download all you `Gemfile` dependencies into a directory named -->
<!-- `vendor`. You have to tell Jekyll to ignore this directory, otherwise Jekyll -->
<!-- will try to parse it (and fail). -->
<!-- -->
<!-- Doing so is easy, add the following line to your `_config.yml` file: -->
<!-- -->
<!-- ```yml -->
<!-- exclude: [vendor] -->
<!-- ``` -->
<!-- -->
<!-- ### 3. Configure Travis -->
<!-- -->
<!-- In order for Travis to be able to push data to your index on your behalf, you -->
<!-- have to give it your write API Key. This is achieved by defining an -->
<!-- `ALGOLIA_API_KEY` [environment variable][18] in Travis settings. -->
<!-- -->
<!-- You should also uncheck the "Build pull requests" option, otherwise any pull -->
<!-- request targeting `gh-pages` will trigger the reindexing. -->
<!-- -->
<!-- ![Travis Configuration][19] -->
<!-- -->
<!-- ### Done -->
<!-- -->
<!-- Commit all the changes to the files, and then push to `gh-pages`. Travis will -->
<!-- catch the event and trigger your indexing for you. You can follow the Travis job -->
<!-- execution directly on [their website][20]. -->
<!-- -->
<!-- ## FAQS -->
Note that `ALGOLIA_API_KEY` should be set to your admin API key.
# Thanks

View File

@ -10,3 +10,54 @@ they do. Includes ENV variables as well.
## Command line
Here is the list of command line options you can pass to the `jekyll algolia
push` command:
| Flag | Description |
| ---- | ----- |
| `--config ./_config.yml` | You can here specify the config file to use. Default is `_config.yml` |
| `--future` | With this flag, the command will also index posts with a future date |
| `--limit_posts 10` | Limits the number of posts to parse and index |
| `--drafts` | Index drafts in the `_drafts` folder as well |
| `--dry-run` or `-n` | Do a dry run, do not actually push anything to your index |
| `--verbose` | Display more information about what is going to be indexed |
## Environment variables
The recommended place to store your Algolia application ID and index name are in
the `_config.yml` file but there are a few environment variables your can define
to overwrite those values.
key | value
---------------------- | ----------------------
ALGOLIA_APPLICATION_ID | `your_application_id`
ALGOLIA_API_KEY | `your_api_key`
ALGOLIA_INDEX_NAME | `your_index_name`
## `_algolia_api_key` file
The recommended way to define your Algolia admin API key is to use the
`ALGOLIA_API_KEY` environment variable. Because this key should be kept secret,
its better if it's loaded directly from the environment.
But the plugin can also load the key from another source. **Note that this
method trades off security for convenience, so be very careful when using it.**
You can create a file named `_algolia_api_key` in your source directory that
contains your admin API key. If no `ALGOLIA_API_KEY` environment variable is
defined, the plugin will fallback to the value set in the `_algolia_api_key`
file.
**Do not commit this file in your versioning system**. This API key has write
access to your index, so you have to keep it secret. For example, you should
add `_algolia_api_key` to your `.gitignore` file. It contains your private API
key

View File

@ -61,7 +61,7 @@
},
{
"name": "Documentation",
"url": "configuration.html"
"url": "options.html"
},
{
"name": "Examples",

9
docs-src/src/examples.md Normal file
View File

@ -0,0 +1,9 @@
---
title: Examples
layout: content-with-menu.pug
---
- Dropdown
- Blog (Hyde)
- Jekyll plugins

View File

@ -41,18 +41,40 @@ If everything went well, you should be able to run `jekyll help` and see the
## Configuration
You need to provide certain Algolia credentials for this plugin to *index* your
You need to provide your Algolia credentials for this plugin to *index* your
site.
*If you don't yet have an Algolia account, you can open a free [Community plan
here][4]. Once signed in, you can get your credentials from
[your dashboard][5].*
The plugin will try to fetch the credentials from your environment-variables
hash and fallback to your Jekyll configuration if not found.
Once you have your credentials, you should define your `application_id` and
`index_name` inside your `_config.yml` file like this:
To pass the credentials as ENV variables, you can do so at the same time when
you run the `jekyll algolia` command
```yaml
# _config.yml
algolia:
application_id: 'your_application_id'
index_name: 'your_index_name'
```
## Usage
Once your credentials are setup, you can run the indexing by running the
following command:
```shell
ALGOLIA_API_KEY='{your_admin_api_key}' bundle exec jekyll algolia
```
Note that `ALGOLIA_API_KEY` should be set to your admin API key. This key has
write access to your index so will be able to push new data. This is also why
you have to set it on the command line and not in the `_config.yml` file: you
want to keep this key secret and not commit it to your versioning system.
_Note that the method can be simplified to `jekyll algolia` by using an
[alternative way][6] of loading the API key and using [rubygems-bundler][7]._
[1]: https://jekyllrb.com/
@ -60,3 +82,5 @@ you run the `jekyll algolia` command
[3]: http://bundler.io/
[4]: https://www.algolia.com/users/sign_up/hacker
[5]: https://www.algolia.com/licensing
[6]: ./commandline.html#algolia-api-key-file
[7]: https://github.com/rvm/rubygems-bundler

View File

@ -13,3 +13,68 @@ and put all the info so it builds automatically each time.
## GitHub Pages
The initial goal of the plugin was to allow anyone to have access to great
search, even on a static website hosted on GitHub pages.
But GitHub does not allow custom plugins to be run on GitHub Pages.
This means that you'll either have to run `bundle exec jekyll algolia push`
manually, or configure a CI environment (like [Travis][16] to do it for you.
[Travis CI][17] is an hosted continuous integration
service, and it's free for open-source projects. Properly configured, it can
automatically reindex your data whenever you push to `gh-pages`.
For it to work, you'll have 3 steps to perform.
### 1. Create a `.travis.yml` file
Create a file named `.travis.yml` at the root of your project, with the
following content:
```yml
language: ruby
cache: bundler
branches:
only:
- gh-pages
script:
- bundle exec jekyll algolia push
rvm:
- 2.2
```
This file will be read by Travis and instruct it to fetch all dependencies
defined in the `Gemfile`, then run `jekyll algolia push`. This will be
triggered when data is pushed to the `gh-pages` branch.
### 2. Update your `_config.yml` file to exclude `vendor`
Travis will download all you `Gemfile` dependencies into a directory named
`vendor`. You have to tell Jekyll to ignore this directory, otherwise Jekyll
will try to parse it (and fail).
Doing so is easy, add the following line to your `_config.yml` file:
```yml
exclude: [vendor]
```
### 3. Configure Travis
In order for Travis to be able to push data to your index on your behalf, you
have to give it your write API Key. This is achieved by defining an
`ALGOLIA_API_KEY` [environment variable][18] in Travis settings.
You should also uncheck the "Build pull requests" option, otherwise any pull
request targeting `gh-pages` will trigger the reindexing.
![Travis Configuration][19]
### Done
Commit all the changes to the files, and then push to `gh-pages`. Travis will
catch the event and trigger your indexing for you. You can follow the Travis job
execution directly on [their website][20].

View File

@ -11,3 +11,11 @@ a list of what can be done, then explain each hook and how it works
## Custom hooks
```ruby
def self.hook_should_be_excluded?(_filepath)
def self.hook_before_indexing_each(record, _node)
def self.hook_before_indexing_all(records)
```

View File

@ -3,12 +3,29 @@ title: How does this work?
layout: content-with-menu.pug
---
# How does this work?
## How does this work?
More tehcnical explanation, to explain the underlying principles of the plugin.
That it acts as a an actual build, but instead of creating HTML files, it will
pus records to Algolia.
The plugin will work like a `jekyll build` run, but instead of writing `.html`
files to disk, it will push content to Algolia. It will go through each file
Jekyll would have processed in a regular build: pages, posts and collections.
What is a record? We will take each page and split it into several JSON objects
and push thoses.
It will not push the whole content of each page to Algolia as one record.
Instead, it will split each page into small chunks (by default, one per
`<p>` paragraph) and then push each chunk as a new record to Algolia. Splitting
records that way allows for a more fine-tuned relevance even on long pages.
Each record created that way will contain a mix of specific data and shared
data. Specific data will be the paragraph content, and information about its
position in the page (where its situated in the hierarchy of headings in the
page). Shared data is the metadata of the page it was extracted from (`slug`,
`url`, `tags`, etc, as well as any custom field added to the front-matter).
Once displayed, results are grouped so only the best matching paragraph of each
page is returned for a specific query. This greatly improves the perceived
relevance of the search results.
Because the plugin is splitting each page into smaller chunks, it can be hard to get
an estimate of how many records will actually be pushed. The plugin tries to be
smart and consume as less operations as possible, but you can always run it in
`--dry-run` mode to better understand what it would do.

View File

@ -17,6 +17,42 @@ Show pros and cons of each
Synchronizing your local data with your Algolia index can be done in different
ways. By default, the plugin will use the `diff` indexing mode but you might
also be interested in the `atomic` mode.
#### `diff` (default)
By default, the plugin will try to be smart when pushing content to your index:
it will only push new records and delete old ones insted of overwriting
everything.
To do so, we first need to grab the list of all records residing in
your index, then comparing them with the one generated locally. We then delete
the old records that no longer exists, and then add the newly created record.
The main advantage is that it will consume very few operations in your Algolia
quota. The drawback is that it will put your index into an inconsistent state
for a few seconds (records were deleted, but new one were not yet added). Users
doing a search on your website at that time might have incomplete results.
#### `atomic`
Using the `atomic` indexing mode, your users will never search into an
inconsistent index. They will either be searching into the index containing the
old data, or the one containing the new data, but never in an intermediate
state.
To do so, the plugin will actually push all data to a temporary index first.
Once everything is copied and configured, it will then overwrite the old index
with the temporary one.
The main advantage is that it will be completly transparent for your users. The
drawback is that it will consume much more operations as you will have to push
all your records to a new index each time.

View File

@ -5,10 +5,123 @@ layout: content-with-menu.pug
# Options
List of all options that can be passed in the \_config.yml file. A subheading
for each of them, explaining what it does. Some of those options might require
a more precise page to explain.
The plugin should work out of the box for most websites, but there are options
you can tweak if needed. All the options should be added under the `algolia`
section of your `_config.yml` file.
This will be content heavy
You should be familiar with [how this plugin works][1] under
the hood to better understand what some options are doing.
### `extensions_to_index`
This options defines which source files should be indexed, based on their file
extension. If an extension is not in the list, then the file will not be
indexed.
By default, all HTML and markdown source files will be indexed.
If you are using another markup language (such as [AsciiDoc][2] or [Textile][3],
you might want to update the value like this:
```yml
algolia:
# Also index AsciiDoc and Textile files
extensions_to_index: 'html,md,adoc,textile'
```
### `files_to_exclude`
This option lets you define a blacklist of source files you don't want to index.
By default it will exclude all the `index.html` and `index.md` files. Those
files are usually not containing much text (landing pages) or containing
redundant text (latest blog articles) so they are not included by default.
If you want to index those files, you should set the value to an empty array.
```yml
algolia:
# Actually index the index.html/index.md pages
files_to_exclude: []
```
If you want to exclude more files, you should add them to the array. Note that
you can use glob patterns to exclude several files at once.
```yml
algolia:
# Exclude more files from indexing
files_to_exclude:
- index.html
- index.md
- excluded-file.html
- _posts/2017-01-20-date-to-forget.md
- subdirectory/*.html
```
_Note that some files (pagination pages, static assets, etc) will **always** be
excluded and you don't have to specify them._
### `nodes_to_index`
This options defines how each page is split into chunks. It expects
a CSS selector that will be applied on the HTML content generated by Jekyll.
Each matching node will be indexed as a different record.
The default value is `p`, meaning that one record will be created for each `<p>`
paragraph of content.
If you would like to index other elements, like `<blockquote>`, `<li>` or
a custom `<div class="paragraph">`, you should edit the value like this:
```yml
algolia:
# Also index quotes, list items and custom paragraphs
nodes_to_index: 'p,blockquote,li,div.paragraph'
```
### `indexing_mode`
This option will let you choose the strategy used to sync your data with your
Algolia index. The default value should work for most cases, but feel free to
[read the pros and cons][4] of each and pick the one best suited for your needs.
### `settings`
This option let you pass specific settings to your Algolia index.
By default the plugin will configure your Algolia index with settings tailored
to the format of the extracted records. You are of course free to overwrite
them or configure them as best suits your needs. Every option passed to the
`settings` entry will be set as [settings to your index][5].
For example if you want to change the HTML tag used for the highlighting, you
can overwrite it like this:
```yml
algolia:
settings:
highlightPreTag: '<em class="custom_highlight">'
highlightPostTag: '</em>'
```
### `indexing_batch_size`
The Algolia API allows you to send batches of changes to add or update several
records at once, instead of doing one HTTP call per record. The plugin will
batch updates by groups of 1000 records by default.
If you are on an unstable internet connection, you might want to decrease the
value. You will send more batches, but each will be smaller in size.
```yml
algolia:
# Send fewer records per batch
indexing_batch_size: 500
```
[1]: ./how-it-works.html
[2]: http://www.methods.co.nz/asciidoc/
[3]: https://github.com/textile
[4]: ./indexing-modes.html
[5]: https://www.algolia.com/doc/api-reference/api-methods/set-settings/?language=ruby#set-settings