docs(indexing): Removing mentions of indexing_mode in the doc

This commit is contained in:
Pixelastic 2018-01-30 15:42:31 +01:00
parent 531c90777b
commit bfc23df571
6 changed files with 59 additions and 126 deletions

View File

@ -35,7 +35,6 @@ const sidebarMenu = [
{ title: 'Options', url: 'options.html' },
{ title: 'Commandline', url: 'commandline.html' },
{ title: 'Hooks', url: 'hooks.html' },
{ title: 'Indexing modes', url: 'indexing-modes.html' },
],
},
{
@ -48,11 +47,7 @@ const sidebarMenu = [
},
{
title: 'Tutorials',
items: [
{ title: 'Blog', url: 'blog.html' },
// { title: 'Dropdown menu', url: 'autocomplete.html' },
// { title: 'Collection search', url: 'collections.html' },
],
items: [{ title: 'Blog', url: 'blog.html' }],
},
];

View File

@ -75,7 +75,7 @@ want to keep this key secret and not commit it to your versioning system.
![jekyll algolia command example][6]
_Note that in the animation I simplified the method call to `jekyll algolia` by using an
_Note that in the animation we simplified the method call to `jekyll algolia` by using an
[alternative way][7] of loading the API key and using [rubygems-bundler][8] to
remove the need to add `bundle exec`._

View File

@ -5,6 +5,12 @@ layout: content-with-menu.pug
# How does this work?
This page will give you a bit more insight about how the internals of the plugin
are working. This should give you more context to better understand the various
options you can configure.
## Extracting data
The plugin will work like a `jekyll build` run, but instead of writing `.html`
files to disk, it will push content to Algolia. It will go through each file
Jekyll would have processed in a regular build: pages, posts and collections.
@ -53,16 +59,26 @@ front-matter). Specific data is the paragraph content, and information
about its position in the page (where its situated in the hierarchy of headings
in the page).
Once displayed, results are grouped so only the best matching paragraph of each
page is returned for a specific query. This greatly improves the perceived
relevance of the search results.
Using the [distinct setting][1] of the Algolia API, only the best matching
paragraph of each page is returned for a specific query. This greatly improves
the perceived relevance of the search results as you can highlight specifically
the part that was matching.
Because the plugin is splitting each page into smaller chunks, it can be hard to get
an estimate of how many records will actually be pushed. The plugin tries to be
smart and consume as less operations as possible, but you can always run it in
`--dry-run` mode to better understand what it would do.
## Pushing data
![jekyll algolia dry run example][1]
The plugin tries to be smart by using as less operations as possible, to be
mindful of your Algolia quota. Whenever you run `jekyll algolia`, only records
that changed since your last push will be updated.
[1]: ./assets/images/dry-run.gif
This is made possible because each record is attributed a unique `objectID`,
computed as a hash of the actual content of the record. Whenever the content of
the record changes, its `objectID` will change as well. This allows us to compare
what is current available in your index and what is about to be pushed, to only
update what actually changed.
Previous outdated records will be deleted, and new updated records will be added
instead. All those operations are grouped into a batch call, making sure that
the changes are done atomically: your index will never be in an inconsistent
state where records are only partially updated.
[1]: https://www.algolia.com/doc/guides/ranking/distinct/?language=ruby#distinct-to-index-large-records

View File

@ -1,70 +0,0 @@
---
title: Indexing modes
layout: content-with-menu.pug
---
# Indexing modes
Algolia's pricing model is based on the number of records you have in your index
as well as the number of add/edit/delete operations you operate on your index
per month.
By default, the plugin tries to be mindful of your quota and act in a smart way
by default: only updating records that changed between two runs.
It does so by attributing a unique `objectID` to each record, generated from the
actual content of this record. If the content changes, then the `objectID` will
change as well.
Because of this mechanism, the plugin can know which records changed between two
runs and will delete the records that are no longer needed and push the new ones
instead. Doing so only consumes a small number of operations (instead of pushing
everything each time).
When using the default `indexing_mode` value (`diff`), all those changes are
batched into one call to the API. They will be executed atomically (the index
will be updated with all the changes in one go, instead of one record at
a time). This allow users of the website to always search into the most
up-to-date version of the data.
This should work for 99% of the use-cases and you shouldn't need to change the
value of the `indexing_mode`.
## `diff` (default)
Using the default `diff` mode, the plugin will try to be smart when pushing
content to your index: it will only add/edit/delete what changed. All
records that didn't change will stay untouched.
To do so, it first grabs the list of all records in your index, then compares
them with the records generated locally. It then deletes the old records that no
longer exists, and add the newly created ones.
There is no notion of "updating" a record here because as soon as the content of
a record changes, it will be considered as a new record (thus, the old version
will be deleted and the new one will be added).
### Cons
All operations will be done on the same index, sequentially. Old records will
first be discarded, then new ones will be added. Users doing a search on your
website during the update will have inconsistent or incomplete results.
## `atomic`
The `atomic` mode solves the inconsistency issue of the `diff` mode. Instead of
doing all changes in sequence on the same index, the updates will be done on
a temporary index in the background.
The plugin will start by making a copy of the existing data, and will then apply
the `diff` method to it: it will remove old records and add new ones to this
index. While those changes are applied, your current index is still serving
search queries by your users. Once all changes are applied, the plugin will
replace the current public index with the temporary one, all in one atomic move.
### Cons
As this method will need to create a copy of your current index during indexing,
it means you will need an Algolia plan that can hold at least **twice** the
number of records.

View File

@ -46,17 +46,16 @@ been changed:
[extensions_to_index][3]. Note that for the last one, it now expects
a comma-separated list of extensions.
The `lazy_update` option has renamed to [indexing_mode][4]. The default indexing
mode ([diff][5]), is equivalent to `lazy_update: true`. This means that by
default the plugin will now be smart enough to only update records that actually
changed since the last run. You can still get the old behavior of re-pushing
everything every time by using the [atomic][6] indexing mode.
The `lazy_update` option does not exist anymore. The new indexing mode is
equal to `lazy_update: true`. Only records that changed between the current
build and the previous one will be updated, and it will even be done in an
atomic way (all in one go).
## Hooks
All three hooks (`custom_hook_excluded_file?`, `custom_hook_each` and
`custom_hook_all`) are still here, but they have been renamed to
[should_be_excluded?][7], [before_indexing_each][8] and [before_indexing_all][9].
[should_be_excluded?][4], [before_indexing_each][5] and [before_indexing_all][6].
They all have the same behavior and expect the same arguments as before, but
should now extend the `Jekyll::Algolia::Hooks` module. It means that the file
@ -72,7 +71,7 @@ module Jekyll
end
```
You can find the complete documentation on the [dedicated page][10].
You can find the complete documentation on the [dedicated page][7].
## Records
@ -114,17 +113,14 @@ Here is an example of a record extracted by the plugin:
## Need more help?
If you need more help migrating from the previous plugin to this new version,
you can [file an issue][11] on the GitHub repo and we'll do our best to help you.
you can [file an issue][8] on the GitHub repo and we'll do our best to help you.
[1]: ./options.html#files-to-exclude
[2]: ./options.html#nodes-to-index
[3]: ./options.html#extensions-to-index
[4]: ./options.html#indexing-mode
[5]: ./indexing-modes.html#diff-default
[6]: ./indexing-modes.html#atomic
[7]: ./hooks.html#should-be-excluded
[8]: ./hooks.html#before-indexing-each
[9]: ./hooks.html#before-indexing-all
[10]: ./hooks.html
[11]: https://github.com/algolia/jekyll-algolia/issues
[4]: ./hooks.html#should-be-excluded
[5]: ./hooks.html#before-indexing-each
[6]: ./hooks.html#before-indexing-all
[7]: ./hooks.html
[8]: https://github.com/algolia/jekyll-algolia/issues

View File

@ -62,27 +62,6 @@ algolia:
_Note that some files (pagination pages, static assets, etc) will **always** be
excluded and you don't have to specify them._
## `indexing_batch_size`
The Algolia API allows you to send batches of changes to add or update several
records at once, instead of doing one HTTP call per record. The plugin will
batch updates by groups of 1000 records by default.
If you are on an unstable internet connection, you might want to decrease the
value. You will send more batches, but each will be smaller in size.
```yml
algolia:
# Send fewer records per batch
indexing_batch_size: 500
```
## `indexing_mode`
This option will let you choose the strategy used to sync your data with your
Algolia index. The default value should work for most cases, but feel free to
[read the pros and cons][4] of each and pick the one best suited for your needs.
## `nodes_to_index`
This options defines how each page is split into chunks. It expects
@ -108,7 +87,7 @@ This option let you pass specific settings to your Algolia index.
By default the plugin will configure your Algolia index with settings tailored
to the format of the extracted records. You are of course free to overwrite
them or configure them as best suits your needs. Every option passed to the
`settings` entry will be set as [settings to your index][5].
`settings` entry will be set as [settings to your index][4].
For example if you want to change the HTML tag used for the highlighting, you
can overwrite it like this:
@ -120,9 +99,26 @@ algolia:
highlightPostTag: '</em>'
```
## `indexing_batch_size`
This option defines the number of operations that will be grouped as part of one
updating batch. All operations of one batch are applied atomically. The default
value is `1000`.
You might want to increase this value if you are doing a lot of updates on each
run and still want to have your changes done atomically.
You might want to decrease this value if you're using an unstable internet
connection. Smaller batches are easier to send that large ones.
```yml
algolia:
# Send fewer records per batch
indexing_batch_size: 500
```
[1]: ./how-it-works.html
[2]: http://www.methods.co.nz/asciidoc/
[3]: https://github.com/textile
[4]: ./indexing-modes.html
[5]: https://www.algolia.com/doc/api-reference/api-methods/set-settings/?language=ruby#set-settings
[4]: https://www.algolia.com/doc/api-reference/api-methods/set-settings/?language=ruby#set-settings