<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Hoc blog est pessimus</title>
    <link>https://paper.wf/pessi/</link>
    <description>I&#39;m a Web and Data Science developer. In this blog I post mostly technical writings</description>
    <pubDate>Wed, 17 Jun 2026 19:24:45 +0000</pubDate>
    <item>
      <title>Using the &#39;ruby-readability&#39; gem on Rails</title>
      <link>https://paper.wf/pessi/using-the-ruby-readability-gem-on-rails</link>
      <description>&lt;![CDATA[Published 4.7.2024&#xA;&#xA;In the last post we looked at a solution for parsing the text content of a webpage, such as a news article, using the mozilla/readability.js node package on Rails backend.&#xA;&#xA;This time I’ll introduce the ‘readability’ gem to do the same task – in a more Ruby way.&#xA;&#xA;I didn’t manage to get the gem to properly parse many websites, and I ended up going with the solution presented in a previous post.&#xA;&#xA;Install the gem&#xA;&#xA;Run the following in you Rails project folder:&#xA;&#xA;bundle add ruby-readability&#xA;&#xA;I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:&#xA;&#xA;bundle add faraday&#xA;&#xA;Using ‘ruby-readability’&#xA;&#xA;Get the web page with Faraday&#xA;&#xA;response = Faraday.get(‘example.com/article‘)&#xA;&#xA;Parse the text content you want from the body of the page:&#xA;&#xA;content = Readability::Document.new(response.body).content&#xA;&#xA;Readability::Document instance attributes&#xA;&#xA;You have the following methods available:&#xA;&#xA;.images&#xA;.author&#xA;.title&#xA;.content&#xA;&#xA;In my experience the parsing doesn’t always manage to identify the previous sections from the content however.&#xA;]]&gt;</description>
      <content:encoded><![CDATA[<p><em>Published 4.7.2024</em></p>

<p>In the last post we looked at a solution for parsing the text content of a webpage, such as a news article, using the mozilla/readability.js node package on Rails backend.</p>

<p>This time I’ll introduce the ‘readability’ gem to do the same task – in a more Ruby way.</p>

<p><strong><em>I didn’t manage to get the gem to properly parse many websites, and I ended up going with the solution presented in a <a href="https://paper.wf/pessi/ive-been-working-on-the-second-iteration-of-my-rss-client-and-wanted-to-ltkb" rel="nofollow">previous post.</a></em></strong></p>

<h3 id="install-the-gem" id="install-the-gem">Install the gem</h3>

<p>Run the following in you Rails project folder:</p>

<pre><code>bundle add ruby-readability
</code></pre>

<p>I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:</p>

<pre><code>bundle add faraday
</code></pre>

<h3 id="using-ruby-readability" id="using-ruby-readability">Using ‘ruby-readability’</h3>

<p>Get the web page with Faraday</p>

<pre><code>response = Faraday.get(‘example.com/article‘)
</code></pre>

<p>Parse the text content you want from the body of the page:</p>

<pre><code>content = Readability::Document.new(response.body).content
</code></pre>

<h3 id="readability-document-instance-attributes" id="readability-document-instance-attributes">Readability::Document instance attributes</h3>

<p>You have the following methods available:</p>

<pre><code class="language-ruby">.images
.author
.title
.content
</code></pre>

<p>In my experience the parsing doesn’t always manage to identify the previous sections from the content however.</p>
]]></content:encoded>
      <guid>https://paper.wf/pessi/using-the-ruby-readability-gem-on-rails</guid>
      <pubDate>Mon, 30 Sep 2024 21:35:07 +0000</pubDate>
    </item>
    <item>
      <title>Install Dokku and deploy your Rails project with git</title>
      <link>https://paper.wf/pessi/install-dokku-and-deploy-your-rails-project-with-git</link>
      <description>&lt;![CDATA[Published 30.9.2024&#xA;&#xA;This time we’ll do a speedrun of installing Dokku on a remote server, deploying a Rails project with Postgresql and configuring a domain name. You should already have a remote server with Ubuntu 20.04/22.04/24.04 or Debian 11+ x64 for this.&#xA;&#xA;Let’s go!&#xA;&#xA;Installing Dokku&#xA;&#xA;Ssh to your remote server and install Dokku using the Dokku install script:&#xA;&#xA;ssh root@YOUR-SERVER-IP&#xA;wget -NP . https://dokku.com/install/v0.35.4/bootstrap.sh&#xA;sudo DOKKUTAG=v0.35.4 bash bootstrap.sh&#xA;&#xA;Once the installation is complete, you should configure an ssh key and set your global domain:&#xA;&#xA;cat ~/.ssh/authorizedkeys | dokku ssh-keys:add admin&#xA;dokku domains:set-global YOUR-SERVER-IP&#xA;&#xA;Deploying your Rails project&#xA;&#xA;Still on your Dokku host, create a new Dokku app:&#xA;&#xA;dokku apps:create YOUR-PROJECT-NAME&#xA;&#xA;Install the Dokku Postgresql plugin, create a database and link it to your project:&#xA;&#xA;sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git&#xA;dokku postgres:create railsdatabase&#xA;dokku postgres:link railsdatabase YOUR-PROJECT-NAME&#xA;&#xA;Then, on your local development machine, go to your Rails project folder, set up the Git remote and push your code to your new Dokku instance:&#xA;&#xA;git remote add dokku dokku@YOUR-SERVER-IP:YOUR-PROJECT-NAME&#xA;git push dokku main&#xA;&#xA;Configuring your DNS settings for your domain&#xA;&#xA;If you’ve just bought a domain name, you should delete the existing DNS records for the domain. What you’ll need, is a record with the following configuration:&#xA;&#xA;Type: A&#xA;Host: The domain name you’ve bought, or a subdomain you might want to use, i.e. YOUR-DOMAIN-OR-SUBDOMAIN-NAME&#xA;Answer: YOUR-SERVER-IP&#xA;TTL: Doesn’t really matter, can be 5 min / 300 seconds&#xA;&#xA;Then go back to your server’s ssh session and run:&#xA;&#xA;dokku domains:add YOUR-PROJECT-NAME YOUR-DOMAIN-OR-SUBDOMAIN-NAME&#xA;&#xA;And the last thing is to remember to setup your Rails database and its migrations:&#xA;&#xA;dokku run rails db:setup&#xA;&#xA;That is all! Hope it worked!]]&gt;</description>
      <content:encoded><![CDATA[<p><em>Published 30.9.2024</em></p>

<p>This time we’ll do a speedrun of installing Dokku on a remote server, deploying a Rails project with Postgresql and configuring a domain name. You should already have a remote server with Ubuntu 20.04/22.04/24.04 or Debian 11+ x64 for this.</p>

<p>Let’s go!</p>

<h3 id="installing-dokku" id="installing-dokku">Installing Dokku</h3>

<p>Ssh to your remote server and install Dokku using the Dokku install script:</p>

<pre><code>ssh root@YOUR-SERVER-IP
wget -NP . https://dokku.com/install/v0.35.4/bootstrap.sh
sudo DOKKU_TAG=v0.35.4 bash bootstrap.sh
</code></pre>

<p>Once the installation is complete, you should configure an ssh key and set your global domain:</p>

<pre><code>cat ~/.ssh/authorized_keys | dokku ssh-keys:add admin
dokku domains:set-global YOUR-SERVER-IP
</code></pre>

<h3 id="deploying-your-rails-project" id="deploying-your-rails-project">Deploying your Rails project</h3>

<p>Still on your Dokku host, create a new Dokku app:</p>

<pre><code>dokku apps:create YOUR-PROJECT-NAME
</code></pre>

<p>Install the Dokku Postgresql plugin, create a database and link it to your project:</p>

<pre><code>sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git
dokku postgres:create railsdatabase
dokku postgres:link railsdatabase YOUR-PROJECT-NAME
</code></pre>

<p>Then, on your local development machine, go to your Rails project folder, set up the Git remote and push your code to your new Dokku instance:</p>

<pre><code>git remote add dokku dokku@YOUR-SERVER-IP:YOUR-PROJECT-NAME
git push dokku main
</code></pre>

<h3 id="configuring-your-dns-settings-for-your-domain" id="configuring-your-dns-settings-for-your-domain">Configuring your DNS settings for your domain</h3>

<p>If you’ve just bought a domain name, you should delete the existing DNS records for the domain. What you’ll need, is a record with the following configuration:</p>
<ul><li>Type: A</li>
<li>Host: The domain name you’ve bought, or a subdomain you might want to use, i.e. YOUR-DOMAIN-OR-SUBDOMAIN-NAME</li>
<li>Answer: YOUR-SERVER-IP</li>
<li>TTL: Doesn’t really matter, can be 5 min / 300 seconds</li></ul>

<p>Then go back to your server’s ssh session and run:</p>

<pre><code>dokku domains:add YOUR-PROJECT-NAME YOUR-DOMAIN-OR-SUBDOMAIN-NAME
</code></pre>

<p>And the last thing is to remember to setup your Rails database and its migrations:</p>

<pre><code>dokku run rails db:setup
</code></pre>

<p>That is all! Hope it worked!</p>
]]></content:encoded>
      <guid>https://paper.wf/pessi/install-dokku-and-deploy-your-rails-project-with-git</guid>
      <pubDate>Mon, 30 Sep 2024 21:05:43 +0000</pubDate>
    </item>
    <item>
      <title>Digital Ocean + Yunohost + Rocket.chat simple setup</title>
      <link>https://paper.wf/pessi/digital-ocean-yunohost-rocket-chat-simple-setup</link>
      <description>&lt;![CDATA[Published 8.8.2024&#xA;&#xA;Here’s the steps I followed to setup a self-hosting solution for Rocket.chat:&#xA;&#xA;Digital Ocean droplet&#xA;&#xA;I ended up needing an instance with 4GB RAM (2 vCPUs&#xA;4GB / 50GB Disk). For Yunohost the image has to be Debian 11&#xA;&#xA;Yunohost&#xA;&#xA;ssh to your new droplet:&#xA;&#xA;ssh root@your droplet ip&#xA;&#xA;Change to /tmp:&#xA;&#xA;cd /tmp&#xA;&#xA;Get the Yunohost install script:&#xA;&#xA;wget -O yunohost https://install.yunohost.org/&#xA;&#xA;Run the script:&#xA;&#xA;sudo /bin/bash yunohost&#xA;&#xA;Let Yunohost overwrite and set any configurations it prompts for.&#xA;&#xA;Login to your new Yunohost instance with a browser by going to the droplet IP address and finish the setup there.&#xA;&#xA;Setup your domain (optional)&#xA;&#xA;If you already have purchased a domain name, you can configure your Yunohost  and Rocket.chat instances to use them by editing the ‘domains’ section in your Yunohost admin.&#xA;&#xA;For my case, I set up two subdomains: yunohost.mydomain.com and chat.mydomain.com.&#xA;&#xA;I then configured two A Records at my domain registrar, which in this case was porkbun.com:&#xA;&#xA;Host: yunohost.my domain&#xA;Answer: my droplet ip&#xA;TTL: 600&#xA;&#xA;Host: chat.my domain&#xA;Answer: my droplet ip&#xA;TTL: 600&#xA;&#xA;Propagation took maybe a couple of minutes, after which I could access both my Yunohost and my Rocket.chat instances with the given addresses.&#xA;&#xA;After you’ve setup Yunohost, you can ssh to your server if needed with:&#xA;&#xA;ssh your Yunohost admin user name@your droplet ip&#xA;password:your yunohost admin user password&#xA;&#xA;Rocket.chat setup&#xA;&#xA;When logging in to Rocket.chat for the first time, you’ll be run through a basic setup.&#xA;&#xA;After the setup, the one thing I still needed to do was setup an email account for outgoing emails. I used a Gmail account for this.&#xA;&#xA;For this use case, I needed an “App Password” from my Google Account settings first. Make sure you’ve set up 2-Factor Authentication (2FA), then go to https://myaccount.google.com/apppasswords and generate and save a password for your Rocket.chat instance.&#xA;&#xA;Logged in with your Rocket.chat admin user, first check your admin email address is correctly set by navigating to Users on the menu.&#xA;&#xA;When the admin user email is set, go to Settings   Email   SMTP and set the following:&#xA;&#xA;Protocol: smtp&#xA;Host: smtp.gmail.com&#xA;Port: 587&#xA;IgnoreTLS: false&#xA;Pool: true&#xA;Username: my gmail address&#xA;Password: your new app password&#xA;From Email: my gmail address&#xA;&#xA;Save your changes and verify that it works by sending a test mail to your admin user.&#xA;&#xA;Let’s Encrypt Certificates&#xA;&#xA;In your Yunohost admin area you can set up Let’s Encrypt SSL Certificates for your subdomains. Go to Domains   your domain   Certificate   Install Let’s Encrypt Certificate.&#xA;&#xA;In my case, there were some warnings, so I opened up the diagnosis page. The diagnosis indicated a few warnings and one issue, but since everything seemed to be working in practice, I decided to switch on “Ignore diagnosis checks” and proceed with installing the certificates. There seemed to be no further problems a and my certificates appear to function like they should!&#xA;&#xA;Wrap-up&#xA;&#xA;I’m very impressed with Yunohost so far. I had tried a couple of alternatives already, including CapRover, but the experience with Yunohost was extremely smooth up to this point!]]&gt;</description>
      <content:encoded><![CDATA[<p><em>Published 8.8.2024</em></p>

<p>Here’s the steps I followed to setup a self-hosting solution for Rocket.chat:</p>

<h3 id="digital-ocean-droplet" id="digital-ocean-droplet">Digital Ocean droplet</h3>

<p>I ended up needing an instance with 4GB RAM (2 vCPUs
4GB / 50GB Disk). For Yunohost the image has to be Debian 11</p>

<h3 id="yunohost" id="yunohost">Yunohost</h3>

<p>ssh to your new droplet:</p>

<pre><code>ssh root@&lt;your droplet ip&gt;
</code></pre>

<p>Change to /tmp:</p>

<pre><code>cd /tmp
</code></pre>

<p>Get the Yunohost install script:</p>

<pre><code>wget -O yunohost https://install.yunohost.org/
</code></pre>

<p>Run the script:</p>

<pre><code>sudo /bin/bash yunohost
</code></pre>

<p>Let Yunohost overwrite and set any configurations it prompts for.</p>

<p>Login to your new Yunohost instance with a browser by going to the droplet IP address and finish the setup there.</p>

<h3 id="setup-your-domain-optional" id="setup-your-domain-optional">Setup your domain (optional)</h3>

<p>If you already have purchased a domain name, you can configure your Yunohost  and Rocket.chat instances to use them by editing the ‘domains’ section in your Yunohost admin.</p>

<p>For my case, I set up two subdomains: yunohost.mydomain.com and chat.mydomain.com.</p>

<p>I then configured two A Records at my domain registrar, which in this case was porkbun.com:</p>

<pre><code>Host: yunohost.&lt;my domain&gt;
Answer: &lt;my droplet ip&gt;
TTL: 600

Host: chat.&lt;my domain&gt;
Answer: &lt;my droplet ip&gt;
TTL: 600
</code></pre>

<p>Propagation took maybe a couple of minutes, after which I could access both my Yunohost and my Rocket.chat instances with the given addresses.</p>

<p>After you’ve setup Yunohost, you can ssh to your server if needed with:</p>

<pre><code>ssh &lt;your Yunohost admin user name&gt;@&lt;your droplet ip&gt;
password:&lt;your yunohost admin user password&gt;
</code></pre>

<h3 id="rocket-chat-setup" id="rocket-chat-setup">Rocket.chat setup</h3>

<p>When logging in to Rocket.chat for the first time, you’ll be run through a basic setup.</p>

<p>After the setup, the one thing I still needed to do was setup an email account for outgoing emails. I used a Gmail account for this.</p>

<p>For this use case, I needed an “App Password” from my Google Account settings first. Make sure you’ve set up 2-Factor Authentication (2FA), then go to <a href="https://myaccount.google.com/apppasswords" rel="nofollow">https://myaccount.google.com/apppasswords</a> and generate and save a password for your Rocket.chat instance.</p>

<p>Logged in with your Rocket.chat admin user, first check your admin email address is correctly set by navigating to Users on the menu.</p>

<p>When the admin user email is set, go to Settings &gt; Email &gt; SMTP and set the following:</p>

<pre><code>Protocol: smtp
Host: smtp.gmail.com
Port: 587
IgnoreTLS: false
Pool: true
Username: &lt;my gmail address&gt;
Password: &lt;your new app password&gt;
From Email: &lt;my gmail address&gt;
</code></pre>

<p>Save your changes and verify that it works by sending a test mail to your admin user.</p>

<h3 id="let-s-encrypt-certificates" id="let-s-encrypt-certificates">Let’s Encrypt Certificates</h3>

<p>In your Yunohost admin area you can set up Let’s Encrypt SSL Certificates for your subdomains. Go to Domains &gt; your domain &gt; Certificate &gt; Install Let’s Encrypt Certificate.</p>

<p>In my case, there were some warnings, so I opened up the diagnosis page. The diagnosis indicated a few warnings and one issue, but since everything seemed to be working in practice, I decided to switch on “Ignore diagnosis checks” and proceed with installing the certificates. There seemed to be no further problems a and my certificates appear to function like they should!</p>

<h3 id="wrap-up" id="wrap-up">Wrap-up</h3>

<p>I’m very impressed with Yunohost so far. I had tried a couple of alternatives already, including CapRover, but the experience with Yunohost was extremely smooth up to this point!</p>
]]></content:encoded>
      <guid>https://paper.wf/pessi/digital-ocean-yunohost-rocket-chat-simple-setup</guid>
      <pubDate>Mon, 30 Sep 2024 21:02:44 +0000</pubDate>
    </item>
    <item>
      <title>Combining readability.js and the ‘node-runner’ gem</title>
      <link>https://paper.wf/pessi/ive-been-working-on-the-second-iteration-of-my-rss-client-and-wanted-to-ltkb</link>
      <description>&lt;![CDATA[Published 1.7.2024&#xA;&#xA;I’ve been working on the second iteration of my RSS client and wanted to include the possibility to read articles within the reader app, in the style of Firefox’s reader view.&#xA;&#xA;I also intend to use full-text search on the feed entries in the future, so want to save the content data in my database.&#xA;&#xA;To these ends, I’ll use a Node.js package server-side to remove website clutter from the source page. To run node server-side, I’ll be using the ‘node-runner’ gem.&#xA;&#xA;In the next blog post we’ll take a look at another option for parsing content for a reader view – using the ‘ruby-readability’ gem.&#xA;&#xA;Readability.js and building a DOM document object&#xA;&#xA;Readability.js wants to consume a DOM document object. To create one, we’ll be using the jsdom node package.&#xA;&#xA;Required packages&#xA;&#xA;Make sure you’ve installed Node.js first. On Ubuntu you can install it by running:&#xA;&#xA;sudo apt update &amp;&amp; sudo apt install nodejs npm&#xA;&#xA;On the Rails side, we’re going to install the ‘node-runner’ gem, so, within your Rails project folder run:&#xA;&#xA;bundle add node-runner&#xA;&#xA;I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:&#xA;&#xA;bundle add faraday&#xA;&#xA;And we’ll also be installing the node packages we want to use:&#xA;&#xA;npm install jsdom&#xA;npm install @mozilla/readability&#xA;&#xA;Basic usage&#xA;&#xA;Get the web page with Faraday&#xA;&#xA;response = Faraday.get(‘example.com/article‘)&#xA;&#xA;Instantiate a NodeRunner object, require the node packages, add a JavaScript arrow function for parsing the html into a DOM document object and output the readability.js parsed content:&#xA;&#xA;runner = NodeRunner.new(&#xA;  &lt;&lt;~JAVASCRIPT &#xA;    const { Readability } =  require(&#39;@mozilla/readability&#39;);&#xA;    jsdom = require(&#34;jsdom&#34;); const { JSDOM } = jsdom;    &#xA;    const parse = (document) =  {    const dom = new JSDOM(document);&#xA;  return newReadability(dom.window.document).parse()&#xA;}&#xA;  JAVASCRIPT&#xA;)&#xA;&#xA;After which we can pass the GET response body to our NodeRunner instance and receive the parsed content as a string:&#xA;&#xA;readability_output = runner.parse response.body&#xA;&#xA;And that’s it!]]&gt;</description>
      <content:encoded><![CDATA[<p><em>Published 1.7.2024</em></p>

<p>I’ve been working on the second iteration of my RSS client and wanted to include the possibility to read articles within the reader app, in the style of Firefox’s reader view.</p>

<p>I also intend to use full-text search on the feed entries in the future, so want to save the content data in my database.</p>

<p>To these ends, I’ll use a Node.js package server-side to remove website clutter from the source page. To run node server-side, I’ll be using the ‘node-runner’ gem.</p>

<p>In the next blog post we’ll take a look at another option for parsing content for a reader view – using the ‘ruby-readability’ gem.</p>

<h2 id="readability-js-and-building-a-dom-document-object" id="readability-js-and-building-a-dom-document-object">Readability.js and building a DOM document object</h2>

<p>Readability.js wants to consume a DOM document object. To create one, we’ll be using the jsdom node package.</p>

<h2 id="required-packages" id="required-packages">Required packages</h2>

<p>Make sure you’ve installed Node.js first. On Ubuntu you can install it by running:</p>

<pre><code class="language-sh">sudo apt update &amp;&amp; sudo apt install nodejs npm
</code></pre>

<p>On the Rails side, we’re going to install the ‘node-runner’ gem, so, within your Rails project folder run:</p>

<pre><code class="language-sh">bundle add node-runner
</code></pre>

<p>I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:</p>

<pre><code class="language-sh">bundle add faraday
</code></pre>

<p>And we’ll also be installing the node packages we want to use:</p>

<pre><code class="language-sh">npm install jsdom
npm install @mozilla/readability
</code></pre>

<h2 id="basic-usage" id="basic-usage">Basic usage</h2>

<p>Get the web page with Faraday</p>

<pre><code class="language-ruby">response = Faraday.get(‘example.com/article‘)
</code></pre>

<p>Instantiate a NodeRunner object, require the node packages, add a JavaScript arrow function for parsing the html into a DOM document object and output the readability.js parsed content:</p>

<pre><code class="language-ruby">runner = NodeRunner.new(
  &lt;&lt;~JAVASCRIPT 
    const { Readability } =  require(&#39;@mozilla/readability&#39;);
    jsdom = require(&#34;jsdom&#34;); const { JSDOM } = jsdom;    
    const parse = (document) =&gt; {    const dom = new JSDOM(document);
  return newReadability(dom.window.document).parse()
}
  JAVASCRIPT
)
</code></pre>

<p>After which we can pass the GET response body to our NodeRunner instance and receive the parsed content as a string:</p>

<pre><code class="language-ruby">readability_output = runner.parse response.body
</code></pre>

<p>And that’s it!</p>
]]></content:encoded>
      <guid>https://paper.wf/pessi/ive-been-working-on-the-second-iteration-of-my-rss-client-and-wanted-to-ltkb</guid>
      <pubDate>Mon, 30 Sep 2024 20:55:07 +0000</pubDate>
    </item>
  </channel>
</rss>