Hoc blog est pessimus

I'm a Web and Data Science developer. In this blog I post mostly technical writings

Published 4.7.2024

In the last post we looked at a solution for parsing the text content of a webpage, such as a news article, using the mozilla/readability.js node package on Rails backend.

This time I’ll introduce the ‘readability’ gem to do the same task – in a more Ruby way.

I didn’t manage to get the gem to properly parse many websites, and I ended up going with the solution presented in a previous post.

Install the gem

Run the following in you Rails project folder:

bundle add ruby-readability

I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:

bundle add faraday

Using ‘ruby-readability’

Get the web page with Faraday

response = Faraday.get(‘example.com/article‘)

Parse the text content you want from the body of the page:

content = Readability::Document.new(response.body).content

Readability::Document instance attributes

You have the following methods available:

.images
.author
.title
.content

In my experience the parsing doesn’t always manage to identify the previous sections from the content however.

Published 30.9.2024

This time we’ll do a speedrun of installing Dokku on a remote server, deploying a Rails project with Postgresql and configuring a domain name. You should already have a remote server with Ubuntu 20.04/22.04/24.04 or Debian 11+ x64 for this.

Let’s go!

Installing Dokku

Ssh to your remote server and install Dokku using the Dokku install script:

ssh root@YOUR-SERVER-IP
wget -NP . https://dokku.com/install/v0.35.4/bootstrap.sh
sudo DOKKU_TAG=v0.35.4 bash bootstrap.sh

Once the installation is complete, you should configure an ssh key and set your global domain:

cat ~/.ssh/authorized_keys | dokku ssh-keys:add admin
dokku domains:set-global YOUR-SERVER-IP

Deploying your Rails project

Still on your Dokku host, create a new Dokku app:

dokku apps:create YOUR-PROJECT-NAME

Install the Dokku Postgresql plugin, create a database and link it to your project:

sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git
dokku postgres:create railsdatabase
dokku postgres:link railsdatabase YOUR-PROJECT-NAME

Then, on your local development machine, go to your Rails project folder, set up the Git remote and push your code to your new Dokku instance:

git remote add dokku dokku@YOUR-SERVER-IP:YOUR-PROJECT-NAME
git push dokku main

Configuring your DNS settings for your domain

If you’ve just bought a domain name, you should delete the existing DNS records for the domain. What you’ll need, is a record with the following configuration:

  • Type: A
  • Host: The domain name you’ve bought, or a subdomain you might want to use, i.e. YOUR-DOMAIN-OR-SUBDOMAIN-NAME
  • Answer: YOUR-SERVER-IP
  • TTL: Doesn’t really matter, can be 5 min / 300 seconds

Then go back to your server’s ssh session and run:

dokku domains:add YOUR-PROJECT-NAME YOUR-DOMAIN-OR-SUBDOMAIN-NAME

And the last thing is to remember to setup your Rails database and its migrations:

dokku run rails db:setup

That is all! Hope it worked!

Published 8.8.2024

Here’s the steps I followed to setup a self-hosting solution for Rocket.chat:

Digital Ocean droplet

I ended up needing an instance with 4GB RAM (2 vCPUs 4GB / 50GB Disk). For Yunohost the image has to be Debian 11

Yunohost

ssh to your new droplet:

ssh root@<your droplet ip>

Change to /tmp:

cd /tmp

Get the Yunohost install script:

wget -O yunohost https://install.yunohost.org/

Run the script:

sudo /bin/bash yunohost

Let Yunohost overwrite and set any configurations it prompts for.

Login to your new Yunohost instance with a browser by going to the droplet IP address and finish the setup there.

Setup your domain (optional)

If you already have purchased a domain name, you can configure your Yunohost and Rocket.chat instances to use them by editing the ‘domains’ section in your Yunohost admin.

For my case, I set up two subdomains: yunohost.mydomain.com and chat.mydomain.com.

I then configured two A Records at my domain registrar, which in this case was porkbun.com:

Host: yunohost.<my domain>
Answer: <my droplet ip>
TTL: 600

Host: chat.<my domain>
Answer: <my droplet ip>
TTL: 600

Propagation took maybe a couple of minutes, after which I could access both my Yunohost and my Rocket.chat instances with the given addresses.

After you’ve setup Yunohost, you can ssh to your server if needed with:

ssh <your Yunohost admin user name>@<your droplet ip>
password:<your yunohost admin user password>

Rocket.chat setup

When logging in to Rocket.chat for the first time, you’ll be run through a basic setup.

After the setup, the one thing I still needed to do was setup an email account for outgoing emails. I used a Gmail account for this.

For this use case, I needed an “App Password” from my Google Account settings first. Make sure you’ve set up 2-Factor Authentication (2FA), then go to https://myaccount.google.com/apppasswords and generate and save a password for your Rocket.chat instance.

Logged in with your Rocket.chat admin user, first check your admin email address is correctly set by navigating to Users on the menu.

When the admin user email is set, go to Settings > Email > SMTP and set the following:

Protocol: smtp
Host: smtp.gmail.com
Port: 587
IgnoreTLS: false
Pool: true
Username: <my gmail address>
Password: <your new app password>
From Email: <my gmail address>

Save your changes and verify that it works by sending a test mail to your admin user.

Let’s Encrypt Certificates

In your Yunohost admin area you can set up Let’s Encrypt SSL Certificates for your subdomains. Go to Domains > your domain > Certificate > Install Let’s Encrypt Certificate.

In my case, there were some warnings, so I opened up the diagnosis page. The diagnosis indicated a few warnings and one issue, but since everything seemed to be working in practice, I decided to switch on “Ignore diagnosis checks” and proceed with installing the certificates. There seemed to be no further problems a and my certificates appear to function like they should!

Wrap-up

I’m very impressed with Yunohost so far. I had tried a couple of alternatives already, including CapRover, but the experience with Yunohost was extremely smooth up to this point!

Published 1.7.2024

I’ve been working on the second iteration of my RSS client and wanted to include the possibility to read articles within the reader app, in the style of Firefox’s reader view.

I also intend to use full-text search on the feed entries in the future, so want to save the content data in my database.

To these ends, I’ll use a Node.js package server-side to remove website clutter from the source page. To run node server-side, I’ll be using the ‘node-runner’ gem.

In the next blog post we’ll take a look at another option for parsing content for a reader view – using the ‘ruby-readability’ gem.

Readability.js and building a DOM document object

Readability.js wants to consume a DOM document object. To create one, we’ll be using the jsdom node package.

Required packages

Make sure you’ve installed Node.js first. On Ubuntu you can install it by running:

sudo apt update && sudo apt install nodejs npm

On the Rails side, we’re going to install the ‘node-runner’ gem, so, within your Rails project folder run:

bundle add node-runner

I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:

bundle add faraday

And we’ll also be installing the node packages we want to use:

npm install jsdom
npm install @mozilla/readability

Basic usage

Get the web page with Faraday

response = Faraday.get(‘example.com/article‘)

Instantiate a NodeRunner object, require the node packages, add a JavaScript arrow function for parsing the html into a DOM document object and output the readability.js parsed content:

runner = NodeRunner.new(
  <<~JAVASCRIPT 
    const { Readability } =  require('@mozilla/readability');
    jsdom = require("jsdom"); const { JSDOM } = jsdom;    
    const parse = (document) => {    const dom = new JSDOM(document);
  return newReadability(dom.window.document).parse()
}
  JAVASCRIPT
)

After which we can pass the GET response body to our NodeRunner instance and receive the parsed content as a string:

readability_output = runner.parse response.body

And that’s it!