> Blog

Algorithmic URL Shortening for Drupal 7, 8 and 9

I originally wrote this for Drupal 7, if you are you using Drupal 8 or 9 there are updates here

Recently I've been thinking about how to make URL shortening a little simpler for Drupal sites (well, my blog but this could apply to any site). You see, in addition to seanreiser.com, I own sr7.us which I bought to use as an URL shortener. I took a look at a few of the existing modules and they all seem to work in similar ways, either they take advantage of Drupal's Path module and auto generate a slug that can be used in the short url and the forward happens automagically or they maintain a separate table and do the forward in a hook_init call or via the Redirect Module (which again, works in hook_init). These methods are sound Drupal best practice but when I look at them I see two potential problems:

  1. There are still the potential for URL collisions since a path could defined in both hook_menu and as a URL alias.
  2. You have to wait for Drupal to fully bootstrap in order to decode a short URL and do a redirect.

These aren't horrible constraints but it got me to thinking, "Is there a way to do the shortening algorithmically, so it wouldn't require Drupal to bootstrap and yet be reasonably assured that there would be no collisions".

So I took a look at all the paths in the URL alias table and the menu router table for all the systems I am responsible for and I noticed that every path either have more then one argument (not empty arg(1)) or arg(0) contains a vowel. The only cases where arg(0) didn't contain a vowel was when it was numeric and generally had a format like 2013/01/16/my-article-title-here. When you consider the bias toward English in module development (as well as the sites I work on) this makes some sense. Of course this won't work with Welsh since they have words like crwth, cwtch or cwm and when Chinese is converted to puny code but for the 70% of the internet that uses English and other European languages this is a useful hack.

The only pages I'm looking to create a short URL for are nodes (individual pieces of content). So, I can use the Node ID as a key to encode and decode the URL.

All that said if we consider that URLs are case sensitive we can use a dataset of "0123456789bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ" (note the lack of a,e,i,o,u,y both lower and upper case) we wind up with a bastardized base 50. At the core of this are 2 generic functins which convert a decimal to any base, and any base to a decimal:


function shorturl_dec2any($num, $base = 62, $index = false)
{
    if (!$base)
    {
        $base = strlen($index);
    }
    else if (!$index)
    {
        $index = substr("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", 0, $base);
    }
    $out = "";
    for ($t = floor(log10($num) / log10($base));$t-- >= 0;$t--)
    {
        $a = floor($num / pow($base, $t));
        $out = $out . substr($index, $a, 1);
        $num = $num - ($a * pow($base, $t));
    }
    return $out;
}
function shorturl_any2dec($num, $base = 62, $index = false)
{
    if (!$base)
    {
        $base = strlen($index);
    }
    else if (!$index)
    {
        $index = substr("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", 0, $base);
    }
    $out = 0;
    $len = strlen($num) - 1;
    for ($t = 0;$t <= $len;$t++)
    {
        $out = $out + strpos($index, substr($num, $t, 1)) * pow($base, $len - $t);
    }
    return $out;
}


    

 

I copypasta'ed these a while ago from someplace on http://php.net, but the link seems to be dead now. Either way, they were posted for educational purposes, and I'm passing that forward. I just wish I could credit the original author.

 

You'll notice that the default for these functions is Base 62, containing the entire alpha-numeric universe of characters. To limit this I have 2 wrapper functions:

 



DEFINE(BASE50CHARS,"0123456789bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ");
 
function shorturl_encode($nid){
     return shorturl_dec2any($nid, 50, BASE50CHARS);
 
}
 
function shorturl_decode($slug){
 
     if (shorturl_validate_slug($slug)){
          return shorturl_any2dec($slug, 50, BASE50CHARS);
     }
     else{
          return false;
     }
}

 

So I'm sure you've noticed that there's a validate slug function. Basically it's there to ensure that the Slug is valid (contains no vowels) Here's what I'm doing:

 


function shorturl_validate_slug($slug) {
 
  // check the length of the string
  if (strlen($slug) == 0) {
    return FALSE;
  }
 
  // check for vowels  
  return preg_match('/[aeiouyAEIOUY]/i', $slug) ? FALSE : TRUE;
}

Next up the function actually doing the forwarding in a hook_boot call



function shorturl_boot() {
  if ($_GET['q'] && (shorturl_validate_slug($_GET['q'])) && ($nid = shorturl_decode_index($_GET['q']))) {
    drupal_goto('node/' . $nid, array() , 302);
  }
}

Now it's not perfect (what in the world is). You can see it in action across the site here. For example this aricle can be found at http://sr7.us/2gM as well as the conical URL of https://seanreiser.com/blog/note/algorithmic-url-shortening-drupal. I need to clean a couple of things up, but it's my intention to post this as a sandbox project shortly.

Share and Enjoy!

7/21/2021 - Updates for Drupal 8 and 9

Since both hook_boot and drupal_goto were deprecated in Drupal 8, I needed to do some rejiggering to get this to work. I could’ve setup an event subscriber, but again the goal is to have the code execute before Drupal fully bootstraps. It’s become more popular to insert code into settings.php so I went that route.

The code is no longer stored in a module. The URL encoding / decoding is stored in /sites/default/shorturl.php (I refactored the code from above).



function shorturl_encode($num)
{
    $index = "0123456789bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ";
    $out = "";
    for ($t = floor(log10($num) / log10(50));$t-- >= 0;$t--)
    {
        $a = floor($num / pow(50, $t));
        $out = $out . substr($index, $a, 1);
        $num = $num - ($a * pow(50, $t));
    }
    return $out;
}
function shorturl_decode($num)
{
    $index = "0123456789bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ";
    $out = 0;
    $len = strlen($num) - 1;
    for ($t = 0;$t <= $len;$t++)
    {
        $out = $out + strpos($index, substr($num, $t, 1)) * pow(50, $len - $t);
    }
    return $out;
}

I then added the following to the top of settings.php


include_once 'shorturl.php';

$request_uri = explode('/', $_SERVER['REQUEST_URI']);

if (($request_uri[1] <> "") && !(preg_match('/[aeiouyAEIOUY.]/i', $request_uri[1])))
{
    $nid = shorturl_decode($request_uri[1]);
    header('HTTP/1.0 301 Moved Permanently');
    header('Location: https://example.com/node/' . $nid);
    exit();
}

All in all, it works well, ia compatible with my D7 solution and works with D8 and D9.

7/23/2021 - Fixing the Facebook Query String Bug

When I published the D8 D9 version of this post, at first things went swimmingly. I used the short URL when I linked to it it on LonkedIn, Twitter and Facebook. Things were fine until I published it into a Facebook Group, when people said they were getting a 404. A few minutes of debugging and I found that the links from the links from the facebook group contained a query string, so I made a quick adjustment in settings.php.


 
 $request_uri = explode('/', $_SERVER['REQUEST_URI']);
 
 if ($pos = strpos($request_uri[1], "?"))
 {
     $request_uri[1] = substr($request_uri[1], 0, $pos);
 }
 if (($request_uri[1] <> "") && !(preg_match('/[aeiouyAEIOUY.]/i', $request_uri[1])))
 {
     $nid = shorturl_decode($request_uri[1]);
     header('HTTP/1.0 301 Moved Permanently');
     header('Location: https://seanreiser.com/node/' . $nid);
     exit();
 }

Basicily, I test for the existace of a "?" and lop off anything that follows it

Image
Laptop w/ Stickers