Alien Browser

September 30, 2017

Overview

If you have ever wished you could just relax and listen to random stuff on Reddit, this is the skill for you. This Alexa skill enables the user to visit different subreddits and even browser the comments.

Implementation

Implementing this skill was relatively simple. The skill’s backend is an AWS Lambda that is invoked by Alexa Skills Kit (ASK).

This Lambda then queries the Reddit public API for the requested content. This content is then sent back to ASK to be read out to the user.

Challenges

Browsing Comments

Reddit comments are organized as trees. Navigating trees using voice can be cumbersome at the best and downright frustrating at the worst. To address this, the skill reads out only the top level comments.

Family Friendliness

Alexa is very strict about the kind of content that can be served. Reddit can be a bit out-of-lines at times. A multi-layer approach was used to address this. First layer of defense against the explicit content is simply block-listing certain subreddits.

Next, NLP based services like AWS Comprehend are used to analyze and censor explicit words.

Lastly, the Lambda itself maintains a list of explicit words and silences them out before returning the responses.

Images

Images form a big part of Reddit experience. While many images are near impossible to translate into words, there’s a certain class of images (like memes, quotes, screenshots of social media posts, etc.) that can be vocalized very effectively.

An image classifier is used to classify images into one of the classes viz. Tweet, Facebook Post, Meme, Quotes, etc. Different heuristics (like subreddit’s name) are used to boost the confidence of this classifier. After all, we are unlikely to come across a Quote while browsing r/DankMemes.

Next, text extraction is performed using AWS Rekognition. This text coupled with previously extracted metadata can provide surprisingly effective results!

Personality

A big part of Reddit experience is its diverse user base. These users come from different walks of life, offer drastically different perspectives, and have distinct personalities. Having them all voiced by the same voice simple wouldn’t have done them any justice.

Fortunately, Amazon Polly is really great at turning text into lifelike speech. Integrating with Polly added a lot of personality to the skill.

Results and Learnings

It is pleasantly surprising to see how much we can accomplish with little effort. Up until a few years ago, it would have been almost impossible for one person to write, build, and operate such an app at this scale.

At the same time, it also surprising how much nuance a seemingly simple task can have. There is a lot of learning involved in taking a PoC to production grade application. This project only highlighted that.

Just ask

Alexa