Cleanup the README

This commit is contained in:
Pixelastic 2017-11-17 13:37:28 +01:00
parent ee2e508c5e
commit 7208c12fb4

495
README.md
View File

@ -12,7 +12,7 @@ Jekyll plugin to automatically index your content into Algolia.
## Usage ## Usage
```shell ```shell
$ bundle exec jekyll algolia push $ jekyll algolia
``` ```
This will push the content of your Jekyll website to your Algolia index. This will push the content of your Jekyll website to your Algolia index.
@ -22,37 +22,33 @@ You can specify any option you would pass to `jekyll build`, like
## Installation ## Installation
First, add the `jekyll-algolia` gem to your `Gemfile`, in the The plugin requires a minimum version of Jekyll of 3.6.2 and a Ruby version of
`:jekyll_plugins` section. If you do not yet have a `Gemfile`, here is the 2.2.8 (which are the current versions [deployed on GitHub Pages][7] at the time of
minimal content to get your started. writing).
First, add the `jekyll-algolia` gem to your `Gemfile`, in the `:jekyll_plugins`
section. If you do not yet have a `Gemfile`, here is the minimal content to get
your started. You will also need [Bundler][8] to be able to use the `Gemfile`.
```ruby ```ruby
source 'https://rubygems.org' source 'https://rubygems.org'
gem 'jekyll', '~> 2.5.3' gem 'jekyll', '~> 3.6'
group :jekyll_plugins do group :jekyll_plugins do
gem 'jekyll-algolia', '~> 0.7.0' gem 'jekyll-algolia'
end end
``` ```
Once this is done, download all dependencies with `bundle install`. Once this is done, download all dependencies with `bundle install`.
Then, add `jekyll-algolia` to your `_config.yml` file, under the `plugins`
section, like this:
```yaml
plugins:
- jekyll-algolia
```
If everything went well, you should be able to run `jekyll help` and see the If everything went well, you should be able to run `jekyll help` and see the
`algolia` subcommand listed. `algolia` subcommand listed.
## Configuration ## Basic configuration
Add information about your Algolia configuration into the `_config.yml` file, Add your Algolia credentials under the `algolia` section of your
under the `algolia` section, like this: `_config.yml` file like this:
```yaml ```yaml
algolia: algolia:
@ -60,266 +56,280 @@ algolia:
index_name: 'your_index_name' index_name: 'your_index_name'
``` ```
Your write api key will be read from the `ALGOLIA_API_KEY` environment variable. _If you don't yet have an Algolia account, you can open a free [Community plan
You can define it on the same line as your command, allowing you to type here][9]. If you already have an account, you can get your credentials from
`ALGOLIA_API_KEY='your_write_api_key' bundle exec jekyll algolia push`. [your dashboard][10]._
Note that your API key should have write access to both the `index_name` and Your API key will be read from the `ALGOLIA_API_KEY` environment variable.
`_tmp` suffixed version of it (eg. `your_index_name` and `your_index_name_tmp`) You can define it on the same line as your command, allowing you to type
in the previous example). This is due to the way we do atomic pushes by pushing `ALGOLIA_API_KEY='your_api_key' jekyll algolia`.
to a temporary index and then renaming it.
### ⚠ Other, unsecure, method ⚠ ### ⚠ Other, unsecure, method ⚠
You can also store your write api key in a file named `_algolia_api_key`, in You can also store your API key in a file named `_algolia_api_key`, in
your source directory. If you do this we __very, very, very strongly__ encourage your source directory. If you do this we __very, very, very strongly__ encourage
you to make sure the file is not tracked in your versioning system. you to make sure the file is not tracked in your versioning system.
### Options ## How it works
The plugin uses sensible defaults, but you may want to override some of its For the most part, the plugin will work exactly like a `jekyll build` run, but
configuration. Here are the options you can add to your `_config.yml` instead of writing `.html` files to disk, it will push content to Algolia.
file, under the `algolia` section:
#### `excluded_files` It will split each page of your website into small chunks (by default, one per
`<p>` paragraph) and then push each chunk as a new record to Algolia. Splitting
records that way yields a better relevance of results even on very long pages.
Defines which files should not be indexed for search. The placement of each paragraph in regard to the overall page heading hierarchy
(title, subtitles through `<h1>` to `<h6>`) is also taken into account to
further improve relevance of results.
Each record will also contain metadata about the page it was extracted from
(including `slug`, `url`, `tags`, `categories`, `collection` and any custom
field added to the front-matter).
Every time you run `jekyll algolia`, a full build of the website is run locally,
but only records that were changed since your last build will be updated in your
index.
## Advanced configuration
The plugin should work out of the box for most websites, but there are a few
options you can tweak if needed. All the options should be added under the
`algolia` section of your `_config.yml` file.
### `nodes_to_index`
By default, each page of your website will be split into chunks based on this
CSS selector. The default value of `p` means that one record will be created for
each `<p>` in your generated content.
But maybe you would also like to index other elements, like `<blockquote>`,
`<li>` or a custom `<div class="paragraph">`. If so, you should edit the value
like this:
```yml ```yml
algolia: algolia:
excluded_files: # Also index quotes, list items and custom paragraphs
nodes_to_index: 'p,blockquote,li,div.paragraph'
```
### `extensions_to_index`
By default, only HTML and Markdown files will be indexed. If you are using
another markup language (such as [AsciiDoc][11]
or [Textile][12], then you should overwrite this
option.
```yml
algolia:
# Also index AsciiDoc and Textile files
extensions_to_index: 'html,md,adoc,textile'
```
### `files_to_exclude`
The plugin will try to be smart in the pages it should __not__ index. Some files
will always be excluded from the indexing (static assets, custom 404 and
pagination pages). Others are handled by the `files_to_exclude` option.
By default it will exclude all the `index.html` and `index.md` files. Those
files are usually not containing much text (landing pages) or containing
redundant text (latest blog articles) so we decided to exclude them by default.
If you actually want to index those files, you should set the value to an empty
array.
```
algolia:
# Actually index the index.html/index.md pages
files_to_exclude: []
```
Additionally, if there are more files you would like to exclude from the
indexing, you should add them to the array:
```
algolia:
# Exclude more files from indexing
files_to_exclude:
- index.html - index.html
- 2015-01-01-post.md - index.md
- excluded-file.html
- /_posts/2017-01-20-date-to-forget.md
``` ```
#### `nodes_to_index` ### `settings`
All HTML nodes matching this CSS Selector will be indexed. Default value is `p`, By default the plugin will configure your Algolia index with settings taylored
meaning that all `<p>` paragraphs will be indexed. to the the format of the extracted records. You are of course free to overwrite
them or configure them as best suits your needs. Every option passed to the
`settings` entry will passed to a call to [set_settings][13].
If you would like to also index lists, you could set it like this: For example if you want to change the HTML tag used for the highlighting, you
can overwrite it like this:
```yml
algolia:
nodes_to_index: 'p,ul'
``` ```
#### `lazy_update`
Enabling this option can greatly reduce the number of operations consumed by the
plugin but comes with some drawbacks mentioned above:
`false`: The plugin will push all the records to a temporary index and once
everything is pushed will overwrite the current index with this new one. This is
the most straightforward way to update records and will ensure that all the
changes happen in one move. This is the default value.
`true`: With `lazy_update` enabled, the plugin will try to reduce the number of
calls done to the API. It will get a list of all the records in your index and
all the records ready to be pushed. It will compare both and push the new while
deleting the old. In most cases it should consume less operations, but the
changes won't be atomic (ie. you might have your index in an hybrid state, with
old records not yet removed and/or new records not yet added for a couple of
minutes).
#### `settings`
Here you can pass any specific [index settings][7] to your Algolia index. All
the settings supported by the API can be passed here.
##### Examples
If you want to activate `distinct` and some snippets for example, you would do:
```yml
algolia: algolia:
settings: settings:
attributeForDistinct: 'hierarchy' highlightPreTag: '<em class="custom_highlight">
distinct: true highlightPostTag: '</em>'
attributesToSnippet: ['text:20']
``` ```
If you want to search in other fields than the default ones, you'll have to edit ### `indexing_batch_size`
the `attributesToIndex` (default is `%w(title h1 h2 h3 h4 h5 h6 unordered(text)
unordered(tags))`
```yml The Algolia API allows you to send batches of changes to add or update several
records at once, instead of doing one HTTP call per record. The plugin will
batch updates by groups of 1000 records.
If you are on an unstable internet connection, you might want to decrease the
value. You will send more batches, but each will be smaller in size.
```
algolia: algolia:
settings: # Send fewer records per batch
attributesToIndex: indexing_batch_size: 500
- title
- h1
- h2
- h3
- h4
- h5
- h6
- unordered(text)
- unordered(tags)
- your_custom_attribute_1
- your_custom_attribute_2
- ...
``` ```
### Hooks ### `indexing_mode`
The `AlgoliaSearchRecordExtractor` contains two methods (`custom_hook_each` and Synchronizing your local data with your Algolia index can be done in different
`custom_hook_all`) that are here so you can overwrite them to add your custom ways. By default, the plugin will use the `diff` indexing mode but you might
logic. By default, they do nothing except returning the argument they take as also be interested in the `atomic` mode.
input, and are placeholder for you to override.
The best way to override them is to create a `./_plugins/search.rb` file, with ### `diff` (default)
the following content:
```ruby By default, the plugin will try to be smart when pushing content to your index:
class AlgoliaSearchRecordExtractor it will only push new records and delete old ones insted of overwriting
# Hook to modify a record after extracting everything.
def custom_hook_each(item, node)
# `node` is a Nokogiri HTML node, so you can access its type through `node.name`
# or its classname through `node.attr('class')` for example
# Just return `nil` instead of `item` if you want to discard this record To do so, we first need to grab the list of all records currently residing in
item your index, then comparing them with the one generated locally. We then delete
end the old records that no longer exists, and then add the newly created record.
# Hook to modify all records after extracting The main advantage is that it will consume very few operations in your Algolia
def custom_hook_all(items) quota. The drawback is that it will put your index into an inconsistent state
items for a few seconds (records were deleted, but new one were not yet added). Users
end doing a search on your website at that time might have incomplete results.
end
```
The `AlgoliaSearchJekyllPush` class also lets user define the ### `atomic`
`custom_hook_excluded_file?` method. This method is called on every file that
the plugin thinks it should parse and index. If it returns `true`, the file is
not indexed. You can add here your custom logic to exclude some files.
```ruby Using the `atomic` indexing mode, your users will never search into an
class AlgoliaSearchJekyllPush < Jekyll::Command inconsistent index. They will either be searching into the index containing the
class << self old data, or the one containing the new data, but never in an intermediate
# Hook to exclude some files from indexing state.
def custom_hook_excluded_file?(file)
return true if filepath =~ %r{^/excluded_dir/}
false
end
end
end
```
## Command line To do so, the plugin will actually push all data to a temporary index first.
Once everything is copied and configured, it will then overwrite the old index
with the temporary one.
Here is the list of command line options you can pass to the `jekyll algolia The main advantage is that it will be completly transparent for your users. The
push` command: drawback is that it will consume much more operations as you will have to push
all your records to a new index each time.
| Flag | Description |
| ---- | ----- |
| `--config ./_config.yml` | You can here specify the config file to use. Default is `_config.yml` |
| `--future` | With this flag, the command will also index posts with a future date |
| `--limit_posts 10` | Limits the number of posts to parse and index |
| `--drafts` | Index drafts in the `_drafts` folder as well |
| `--dry-run` or `-n` | Do a dry run, do not actually push anything to your index |
| `--verbose` | Display more information about what is going to be indexed |
## Dependencies
This plugin is compatible with version of Jekyll >= 3.6.2 and version of Ruby >=
2.4.0. Those are the versions [deployed on GitHub Pages][8] at the time of
writing.
You will also need [Bundler][9] to install the gem in your project.
## Searching
This plugin will index your data in your Algolia index. Building the front-end
search is of the scope of this plugin, but you can follow [our tutorials][10] or
use our forked version of the popular [Hyde theme][11].
## GitHub Pages <!-- ## Custom hooks -->
<!-- -->
<!-- -->
<!-- def self.hook_should_be_excluded?(_filepath) -->
<!-- def self.hook_before_indexing_each(record, _node) -->
<!-- def self.hook_before_indexing_all(records) -->
The initial goal of the plugin was to allow anyone to have access to great <!-- ## Command line -->
search, even on a static website hosted on GitHub pages. <!-- -->
<!-- Here is the list of command line options you can pass to the `jekyll algolia -->
<!-- push` command: -->
<!-- -->
<!-- | Flag | Description | -->
<!-- | ---- | ----- | -->
<!-- | `--config ./_config.yml` | You can here specify the config file to use. Default is `_config.yml` | -->
<!-- | `--future` | With this flag, the command will also index posts with a future date | -->
<!-- | `--limit_posts 10` | Limits the number of posts to parse and index | -->
<!-- | `--drafts` | Index drafts in the `_drafts` folder as well | -->
<!-- | `--dry-run` or `-n` | Do a dry run, do not actually push anything to your index | -->
<!-- | `--verbose` | Display more information about what is going to be indexed | -->
But GitHub does not allow custom plugins to be run on GitHub Pages.
This means that you'll either have to run `bundle exec jekyll algolia push`
manually, or configure a CI environment (like [Travis][12] to do it for you.
[Travis CI][13] is an hosted continuous integration <!-- ## Searching -->
service, and it's free for open-source projects. Properly configured, it can <!-- -->
automatically reindex your data whenever you push to `gh-pages`. <!-- This plugin will index your data in your Algolia index. Building the front-end -->
<!-- search is of the scope of this plugin, but you can follow [our tutorials][14] or -->
For it to work, you'll have 3 steps to perform. <!-- use our forked version of the popular [Hyde theme][15]. -->
<!-- -->
### 1. Create a `.travis.yml` file <!-- ## GitHub Pages -->
<!-- -->
Create a file named `.travis.yml` at the root of your project, with the <!-- The initial goal of the plugin was to allow anyone to have access to great -->
following content: <!-- search, even on a static website hosted on GitHub pages. -->
<!-- -->
```yml <!-- But GitHub does not allow custom plugins to be run on GitHub Pages. -->
language: ruby <!-- This means that you'll either have to run `bundle exec jekyll algolia push` -->
cache: bundler <!-- manually, or configure a CI environment (like [Travis][16] to do it for you. -->
branches: <!-- -->
only: <!-- [Travis CI][17] is an hosted continuous integration -->
- gh-pages <!-- service, and it's free for open-source projects. Properly configured, it can -->
script: <!-- automatically reindex your data whenever you push to `gh-pages`. -->
- bundle exec jekyll algolia push <!-- -->
rvm: <!-- For it to work, you'll have 3 steps to perform. -->
- 2.2 <!-- -->
``` <!-- ### 1. Create a `.travis.yml` file -->
<!-- -->
This file will be read by Travis and instruct it to fetch all dependencies <!-- Create a file named `.travis.yml` at the root of your project, with the -->
defined in the `Gemfile`, then run `jekyll algolia push`. This will be <!-- following content: -->
triggered when data is pushed to the `gh-pages` branch. <!-- -->
<!-- ```yml -->
### 2. Update your `_config.yml` file to exclude `vendor` <!-- language: ruby -->
<!-- cache: bundler -->
Travis will download all you `Gemfile` dependencies into a directory named <!-- branches: -->
`vendor`. You have to tell Jekyll to ignore this directory, otherwise Jekyll <!-- only: -->
will try to parse it (and fail). <!-- - gh-pages -->
<!-- script: -->
Doing so is easy, add the following line to your `_config.yml` file: <!-- - bundle exec jekyll algolia push -->
<!-- rvm: -->
```yml <!-- - 2.2 -->
exclude: [vendor] <!-- ``` -->
``` <!-- -->
<!-- This file will be read by Travis and instruct it to fetch all dependencies -->
### 3. Configure Travis <!-- defined in the `Gemfile`, then run `jekyll algolia push`. This will be -->
<!-- triggered when data is pushed to the `gh-pages` branch. -->
In order for Travis to be able to push data to your index on your behalf, you <!-- -->
have to give it your write API Key. This is achieved by defining an <!-- ### 2. Update your `_config.yml` file to exclude `vendor` -->
`ALGOLIA_API_KEY` [environment variable][14] in Travis settings. <!-- -->
<!-- Travis will download all you `Gemfile` dependencies into a directory named -->
You should also uncheck the "Build pull requests" option, otherwise any pull <!-- `vendor`. You have to tell Jekyll to ignore this directory, otherwise Jekyll -->
request targeting `gh-pages` will trigger the reindexing. <!-- will try to parse it (and fail). -->
<!-- -->
![Travis Configuration][15] <!-- Doing so is easy, add the following line to your `_config.yml` file: -->
<!-- -->
### Done <!-- ```yml -->
<!-- exclude: [vendor] -->
Commit all the changes to the files, and then push to `gh-pages`. Travis will <!-- ``` -->
catch the event and trigger your indexing for you. You can follow the Travis job <!-- -->
execution directly on [their website][16]. <!-- ### 3. Configure Travis -->
<!-- -->
## FAQS <!-- In order for Travis to be able to push data to your index on your behalf, you -->
<!-- have to give it your write API Key. This is achieved by defining an -->
### How can I exclude some HTML nodes from the indexing <!-- `ALGOLIA_API_KEY` [environment variable][18] in Travis settings. -->
<!-- -->
By default, the plugin will index every HTML node that matches the <!-- You should also uncheck the "Build pull requests" option, otherwise any pull -->
`nodes_to_index` CSS selector option. The default value is `p`, meaning <!-- request targeting `gh-pages` will trigger the reindexing. -->
that it will index all the paragraphs. <!-- -->
<!-- ![Travis Configuration][19] -->
You can use a [negation <!-- -->
selector][17] to be even more <!-- ### Done -->
explicit. For example the value `p:not(.do-not-index)` will index all `<p>` <!-- -->
paragraphs, *except* those that have the class `do-not-index`. <!-- Commit all the changes to the files, and then push to `gh-pages`. Travis will -->
<!-- catch the event and trigger your indexing for you. You can follow the Travis job -->
If you need a finer granularity on your indexing that cannot be expressed <!-- execution directly on [their website][20]. -->
through CSS selectors, you'll have to use the [hook mechanism][18]. The <!-- -->
`custom_hook_each` method takes a [Nokogiri][19] HTML node <!-- ## FAQS -->
as a second argument and should let you write more complex filters.
# Thanks # Thanks
Thanks to [Anatoliy Yastreb][20] for a [great tutorial][21] on creating Jekyll Thanks to [Anatoliy Yastreb][21] for a [great tutorial][22] on creating Jekyll
plugins. plugins.
@ -329,18 +339,19 @@ plugins.
[4]: https://codeclimate.com/github/algolia/jekyll-algolia/badges/gpa.svg [4]: https://codeclimate.com/github/algolia/jekyll-algolia/badges/gpa.svg
[5]: https://img.shields.io/badge/jekyll-%3E%3D%203.6.2-green.svg [5]: https://img.shields.io/badge/jekyll-%3E%3D%203.6.2-green.svg
[6]: https://img.shields.io/badge/ruby-%3E%3D%202.4.0-green.svg [6]: https://img.shields.io/badge/ruby-%3E%3D%202.4.0-green.svg
[7]: https://www.algolia.com/doc/ruby#indexing-parameters [7]: https://pages.github.com/versions.json
[8]: https://pages.github.com/versions.json [8]: http://bundler.io/
[9]: http://bundler.io/ [9]: https://www.algolia.com/users/sign_up/hacker
[10]: https://www.algolia.com/doc/javascript [10]: https://www.algolia.com/licensing
[11]: https://github.com/algolia/hyde [11]: http://www.methods.co.nz/asciidoc/
[12]: https://travis-ci.org/) [12]: https://github.com/textile)
[13]: https://travis-ci.org/ [13]: https://www.algolia.com/doc/api-reference/api-methods/set-settings/?language=ruby#set-settings
[14]: http://docs.travis-ci.com/user/environment-variables/ [14]: https://www.algolia.com/doc/javascript
[15]: /docs/travis-settings.png [15]: https://github.com/algolia/hyde
[16]: https://travis-ci.org [16]: https://travis-ci.org/)
[17]: https://developer.mozilla.org/en/docs/Web/CSS/:not [17]: https://travis-ci.org/
[18]: #hooks [18]: http://docs.travis-ci.com/user/environment-variables/
[19]: http://www.nokogiri.org/ [19]: /docs/travis-settings.png
[20]: https://github.com/ayastreb/ [20]: https://travis-ci.org
[21]: https://ayastreb.me/writing-a-jekyll-plugin/ [21]: https://github.com/ayastreb/
[22]: https://ayastreb.me/writing-a-jekyll-plugin/