See all articles

Triggering SOLR to Remove Records Already Deleted from the Database

When you’re using Solr, sometimes the search can index records that have already been removed from your database - causing some frustration. Here’s how to remove these hanging records from Solr so you have a clean search again.

Solr is an open source enterprise search platform. However, like any piece of software, there can be certain conditions or situations where it may not perform as you wish or expect.

This article looks at one of these cases: Sometimes your Solr search engine contains indexed records that are no longer actually present in the database. It’s very hard to detect these kind of records during a regular search, because they don't have any actual representation in the database.

In most cases you would run a search similar to this:

1 2 3 Model.search do # search criteria end.results

Even if there is a record matching your search criteria, it would not be returned - because we receive a collection of model instances as the results.

To bypass this, we have to ask Solr about raw records instead:

1 2 3 results = Model.search do # search criteria end.raw_results

This time, we receive an array of `Sunspot::Search::Hit` objects from which we can easily get each record’s primary key, even if the record is no longer present in the database:

1 2 3 4 ids = [] results.each do |result| ids << result.primary_key end

Because we now have an array of primary keys, we can easily clear our storage of records that are no longer present in our database, by checking for blank primary keys and then removing any that exist:

1 2 3 4 5 ids.each do |id| unless Model.exists?(id) Sunspot.remove_by_id(Model, id) end end

It is good idea to create a rake task and run it systematically to remove these hanging records. Below you can find the function that you can use for clearing records from any of your models:

1 2 3 4 5 6 7 8 9 10 def clean_up_model(model) results = model.search.raw_results results.each do |result| if model.find_by(id: result.primary_key).blank? Sunspot.remove_by_id(model, result.primary_key) end end end clean_up_model(Post) clean_up_model(User)

We like to share these types of fixes so that others can get the solution quickly and easily if they find themselves in the same situation. If you have some strange behaviors in your enterprise software or infrastructure and would like an expert eye over the issue, then please get in touch. We love to help our clients find better, more efficient ways of running their systems!

Similar articles

Previous article

Node.js