Using the 'ruby-readability' gem on Rails

Published 4.7.2024

In the last post we looked at a solution for parsing the text content of a webpage, such as a news article, using the mozilla/readability.js node package on Rails backend.

This time I’ll introduce the ‘readability’ gem to do the same task – in a more Ruby way.

I didn’t manage to get the gem to properly parse many websites, and I ended up going with the solution presented in a previous post.

Install the gem

Run the following in you Rails project folder:

bundle add ruby-readability

I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:

bundle add faraday

Using ‘ruby-readability’

Get the web page with Faraday

response = Faraday.get(‘example.com/article‘)

Parse the text content you want from the body of the page:

content = Readability::Document.new(response.body).content

Readability::Document instance attributes

You have the following methods available:

.images
.author
.title
.content

In my experience the parsing doesn’t always manage to identify the previous sections from the content however.