Build your API like a Product Manager

Something I have been thinking about for a while is the idea of APIs as your Product - the idea that you should design, build and test your API like it's a public facing product (as it might actually be if its a public API), and what lessons we should be learning from Product Managers about how we achieve this.

In the latest instalment of their "Technology Radar", ThoughtWorks recommended the concept "API as a Product" as ready to trial (although I think it's far beyond trial stage - I can see no real risk of adopting the lessons from Product Management for your API).

I have generally been interested in Product Management for several years - I think it's hard not to get interested in the science of Product Management when you are building side projects, as you naturally have to think about the product you are building - sure, with side projects they can easily turn into vanity projects, where you just build what you think is cool, but I think it's still fun to think about real users and good Product Management techniques that can be applied even to side-projects.

Having tried to gain some understanding, I have turned to some classic material on the subject of Product Management: The Lean StartupDon't Make Me ThinkHow to Build a Billion Dollar App. I have also been lucky enough to attend Mind The Product Conference twice (though volunteering - but managed to watch lots of the talks), and to have worked with several great Product Managers.

So what lessons can we start applying? I think, if one statement sums it up, its an idea from Kathy Sierra:

Does your product make your users awesome?
This is a great way to think about APIs (I'm mostly talking about REST/GraphQL type APIs here - but this thinking is a great way of approaching design for normal class/interface/library APIs too).

As engineers, I'm sure we have all used APIs that don't achieve this - APIs that make us have to think, that make us feel stupid, that seem overly complicated (for example, the old Java Date library - the number of times I had to google just to find out how to simply create a specific date for something simple like a unit test is ridiculous). And hopefully, we have all used APIs that make us feel awesome - APIs that make us productive, that let us do what we expect to be able to do, and can (with the help of the IDE's auto-complete) let us guess what the method calls are.

Another idea Kathy Sierra mentions in that talk, and is also made famous from the title of the Product Management classic, is:

Don't make me think

I have written before about the StackOverflow and GitHub APIs, but they deserve a mention again, if you have ever written an integration with them, you will know.  The APIs are designed in such a way that sensibly models the domain model.

For example, take StackOverflow, if we were going to describe their domain model, what first class objects would we have? I guess Questions, Answers, Comments etc - and sure enough, their API models these:

/questions - will get you all the questions on the site

/answers - will get you all the answers,

Now, how do you think you might get all answers for a given question? Well, a list of answers should probably be a child of a question, so something that represents that, something like:

/questions/{Question ID}/answers

Sure enough, thats the URL - and the behaviour is consistent through the design, I'm sure you can guess how you might retrieve all comments for a given answer.

I'll be honest, developing against an API where I can guess the endpoints makes me feel pretty awesome.

MVP and the Lean Product

The idea of a Minimum Viable Product is a core concept in Lean Startup - the idea that you build the minimum version of the product that can get you feedback as to whether you are heading in the right direction before investing more effort in building out a fuller fledged product.

Whilst I think an appreciation of an MVP is a good awareness for any engineer - I have generally found it is useful in facilitating conversation when building a new product - if as an engineer you recognise some new feature is going to be considerable effort, its helpful to think about what hypothesis the feature is trying to test, and is there a simpler engineering solution to test the hypothesis.

However, the principal of not over engineering the API is also a good one - once you have a well designed domain model, its quite tempting to get carried away with REST design best practices and start building endpoints for all different variations of the model - however, unless you need them, it's better to start with less (if you are building a public REST API from the start then it might be harder to know what isn't needed, but you can still certainly build an MVP version of the API to test the hypothesis - e.g. whether people actually want to use your API)

User Testing

Another important part of building a product is collecting user feedback - Whilst a REST API may not be well suited to A/B testing (your consumers won't be very happy if switch the API design on them after they have implemented against a specific variation!), listening to user feedback is still important - because the consumers will likely be engineers (writing the client code), then hopefully they will be expecting REST standards and good API design principles, but another way to think about this is the importance of not making breaking changes to your API.

A technical way of thinking about this is Consumer Driven Contracts Testing - As RESTful APIs are de-coupled from the client code, the API doesn't have any idea what clients are using or doing, so this approach allows consumers to define what they expect from the API, and then the API can always test against these contracts, so be sure that any change to the API is still in keeping with existing consumers, or users, of the API.


I think, in general, all engineers can benefit from thinking about how good products are built, what concepts are important, and be able to think about the products or services they are building in this way - both in terms of thinking about the actual implementation as a product that someone else will be the user of (either externally if a RESTful API, but also internally if its cross-team or even just someone coming to take over your code once you have moved on).

First of the year

Ignoring global events, 2016 was an interesting year for me. Personally, I manged to achieve, and more importantly learn, quite a bit: I switched jobs, started learning Scala (full time), started learning and working with machine learning, entered my first BBQ competition and my first Chilli cook-off competition. I also submitted my first proposal for a conference talk - it was not selected in the end, but I was pleased to have at least submitted it, and one of my goals for 2017 is start public speaking (conferences/meetups) so will hopefully continue to submit talk proposals in the hope that one gets selected by the end of the year.

There are probably lots of things that I had in mind to do at the start of the year that I didn't manage - such as finishing my education side project, but never mind.

The new tech stuff I have been able to learn has been a product of moving jobs, so that's cool, and I'm very happy with what we are doing there at the moment, but with the two cooking competitions I signed and paid up to enter both of them early in the year on a whim, and that provided the motivation and deadline to force me to practice, prepare and actually complete them.


So here are a few goals I am aiming for in this coming year:

Machine learning - let's start with a realistic one. I want to become a lot more competent and experienced in the ML field. It's an exciting area right now, and something that is only going to get more prevalent, so I am really happy to be somewhere I can be involved in it from the start of a project and get to know it better.

Writing - A little different this time. As a goal to aim for, and also just to see the wider process I want to write a book. About meat. I have lots of ideas already, it will be relatively short but at first at least it should be an e-book whistlestop tour of hows and whys of cooking meat.

Kickstarter - I am currently working on an android app for my boy (5 years old), and it made me think about how I could make it more accessible for other kids to make their own apps. So I might kickstart it once I have made some progress. Again, mostly for the experience of running a kickstart project rather than actually expecting to get the money to do it.

Public speaking - As mentioned, I would like to get into this. So if I manage to get a proposal accepted before the end of the year, that would be cool.

On to the next one!

Managing your personal finance (a Java webapp)

"Cash Rules Everything Around Me" - Method Man

A few years ago I was fed up of how useless all the banks were at helping me understand my transactions/spending patterns with their online banking. Basically, they all just seemed to let you log on, and then see a list of transactions (often just referencing codes) of payments in and out by date. Nothing else. It seemed like there were lots of simple things missing that could make it a lot more useful, to name a few:

  • Smarter tagging/categorisation - where some direct debit was just listed by the code of the recipient rather than meaningful info, or being able to group all the spending in coffee shops
  • Alerting to changes/unexpected behaviour - for example when a fixed term-price contract comes to an end and the price changes - detecting a change in a regular fixed aount/recipient should be easy

So this is what I started building - a lot of the functionality has since been built into banks such as Monzo, the aim of this webapp was simply to allow you to import transaction histories (most online banks I use provide the ability to export transactions in csv) so you could view/filter/tag any existing bank data. It only supports Santander at the moment, and the alerting stuff never actually got built in, but I thought I would chuck it on Github rather than have it just sit around gathering dust in a private repo.

I have written about the project a while ago, when it was first underway, but as it has been not worked on for a while, I thought I would move it to GitHub - hence the post here today. The app and all the code can be downloaded here:

Building & Running

Its a simple java, maven app (it was several years ago, so not migrated to Spring Boot, Gradle, Groovy etc) - and it builds a WAR file. It also uses LESS for style stuff, which is also built by maven (although watch out for silent failures if you make changes!). If you are using an eclipse based IDE you can follow the guide here: to get incremental build support for LESS building (e.g. just change the LESS source and go refresh in the browser and it will reload).

You can run the WAR file in any tomcat/IDE server, and the backend is currently expecting a MySql DB (but can easily be switched for a differed DB driver if required)


  • Migrate to Spring Boot, Groovy, Gradle
  • Add further bank transaction importers
  • Add alerting framework


(I loaded in a generated transaction history - which is why some of the charts look a little weird)

Using OpenNLP for Named-Entity-Recognition in Scala

A common challenge in Natural Language Processing (NLP) is Named Entity Recognition (NER) - this is the process of extracting specific pieces of data from a body of text, commonly people, places and organisations (for example trying to extract the name of all people mentioned in a wikipedia article). NER is a problem that has been tackled many times over the evolution of NLP, from dictionary based, to rule based, to statistical models and more recently using Neural Nets to solve the problem.

Whilst there have been recent attempts to crack the problem without it, the crux of the issue is really that for approach to learn it needs a large corpus of marked up training data (there are some marked up corpora available, but the problem is still quite domain specific, so training on the WSJ data might not perform particularly well against your domain specific data) and finding a set of 100,000 marked up sentences is no easy feat.  There are some approaches that can be used to tackle this by generating training data - but it can be hard to generate truly representative data and so this approach always risks over-fitting to the generated data.

Having previously looked at Stanford's NLP library for some sentiment analysis, this time I am looking at using the OpenNLP library. Stanford's library is often referred to as the benchmark for several NLP problems, however, these benchmarks are always against the data it is trained for - so out of the box, we likely won't get amazing results against a custom dataset. Further to this, the Stanford library is licensed under GPL which makes it harder to use in any kind of commercial/startup setting. The OpenNLP library has been around for several years, but one of its strengths is its API - its pretty well documented to get up and running, and is all very extendable.

Training a custom NER

Once again, for this exercise we are going back to the BBC recipe archive for the source data - we are going to try and train an OpenNLP model that can identify ingredients.

To train the model we need some example sentences - they recommend at least 15,000 marked up sentences to train a model - so for this I annotated a bunch of the recipe steps and ended up with somewhere in the region of about 45,000 sentences.

As you can see in the above example, the marked up sentences are quite straight forward. We just wrap the ingredient in the tags as above (although note that if the word itself isn't padded by a space either side inside the tags, it will fail!).

Once we have our training data, we can just easily setup some code to feed it in and train our model:

This is a very simple example of how you can do it, and not always paying attention to engineering best practices, but you get the idea for whats going on. We are getting an input stream of our training data set, then we instantiate the Maximum Entropy name finder class and ask it to train a model, which we can then write to disk for future use.

When we want to use the model, we can simply load it back into the OpenNLP Name Finder class and use that to parse the input text we want to check:

So, once I had created some training data in the required format, and trained a model I wanted to see how well it had actually worked - obviously, I don't want to run it against one of the original recipes as they were used to train the model, so I selected this recipe for rosemary-caramel millionaire shortbread, to see how it performed, here are the ingredients it found:

  • butter
  • sugar
  • rosemary
  • caramel
  • shortbread

All in all, pretty good - it missed some ingredients, but given the training data was created in about 20 minutes just manipulating the original recipe set with some groovy, that's to be expected really, but it did well in not returning false positives.

In conclusion, if you have a decent training set, or have the means to generate some data with a decent range, you can get some pretty good results using the library. As usual, the code for the project is on GitHub (although it is little more than the code shown in this post).

Turn your GitHub Page into a Personalised Resume

A little while ago, I decided I wanted to update my CV, and figured given I was in tech it made sense for my CV to be online.  I was aware of GitHub Pages - which give you a nice looking URL which seemed like a perfect location for my tech CV.

Once I had it looking pretty decent, and updated to modern Bootstrap styling so it was fully responsive, I thought I would stick it on GitHub, as a GitHub page.  GitHub provides support for everyone to have a free hosted page with normal HTML/JS resources etc (which is pretty nice of them!) and gives you a nice, share-able URL like http://{username}

Whilst I was reading about GitHub pages, I noticed that they have native support for Jekyll - which is a static HTML generator tool for building websites - which is when I had my second realisation of the day - I could make my CV open-source-able by making it a configurable Jekyll projects that lets users just update some config in their GitHub account and hey-presto, have a nicely styled, personalised tech CV!

So I started porting it over to Jekyll: which just involved moving the configurable, user specific items into a config file (_config.yml) and then breaking the HTML sections into fragments to make it more manageable to understand what is going on.  The idea of Jekyll is pretty straight forward - its just a simple tokenised/template approach to putting together static HTML output, but it does work well and I really didn't find myself wanting for anything in the process.  The GitHub native support was also really nice, all I needed to do was just upload the source of the project to my GitHub account and GitHub handled the build and serving of the site out of the box!

And that's all the configuration it takes! The YAML format is pretty readable - it largely just works with indenting, and hopefully taking a look over the nested sections of data, its fairly easy to understand how you can modify parts to make it customise-able.

You can see my GitHub CV page here - Out of the box you can configure lots of aspects: custom text blocks, key skills, blogs, apps, github projects, stackoverflow, etc.

How can you have a custom GitHub CV?

It really is super simple to get your own GitHub CV:
  1. Create a Github account (if you don't already have one)

  2. Go to the project repository and fork the repository

  3. Change the name of the repository (in the settings menu) to {{yourusername}}

  4. Edit the /_config.yml file in your repository - it should be pretty straight forward as to what the links/details are that you need to add.

  5. Visit your new profile page: {{yourusername}} and start sharing it!

Unsupervised Learning in Scala using Word2Vec

A pretty cool thing that has come out of recent Machine Learning advancements is the idea of "Word Embedding", specifically the advancements in the field made by Tomas Mikolov and his team at Google with the Word2Vec approach. Word Embedding is a language modelling approach that involves mapping words to vectors of numbers - If you imagine we are modelling every word in a given body of text to an N-dimension vector (it might be easier to visualise this as 2-dimensions - so each word is a pair of co-ordinates that can be plot on a graph), then that could be useful in plotting words and starting to understand relationships between words given their proximity. What's more, if we could map words to sets of numbers, then we could start thinking about interesting arithmetic that we could perform on the words.

Sounds cool, right? Now of course, the tricky bit is how can you convert a word to a vector of numbers in such a way that it encapsulates the details behind this relationship? And how can we do it without painstaking manual work and trying to somehow indicate semantic relationships and meaning in the words?

Unsupervised Learning

Word2Vec relies on neural networks and trains on a large, un-labelled piece of text in a technique known as "unsupervised" learning.

Contrary to the last neural network I discussed which was a "supervised" exercise (e.g. for every input record we had the expected output/answer), Word2Vec uses a completely "unsupervised" approach - in other words, the neural network simply takes a massive block of text with no markup or labels (broken into sentences or lines usually) and then uses that to train itself.

This kind of unsupervised learning can seem a little unbelievable at first, getting your head around the idea that a network could train itself without even knowing the "answers" seemed a little strange to me first time I heard the concept, especially as a fundamental requirement for a NN to converge on optimum solution requires a "cost-function" (e.g. some thing we can use after each feed-forward step to tell us how right we are, and if our NN is heading in the right direction).

But really, if we think back to the literal biological comparison with the brain, as people we learn through this unsupervised approach all the time - its basically trial-and-error.

It's child's play

Imagine a toddler attempting to learn to use a smart phone or tablet: they likely don't get shown explicitly to press an icon, or to swipe to unlock, but they might try combinations of power buttons, volume controls and swiping and seeing what happens (and if it does what they are ultimately trying to do), and they get feedback from the device - not direct feedback about what the correct gesture is, or how wrong they were, just the feedback that it doesn't do what they want - and if you have ever lived with a toddler who has got to grips with touchscreens, you may have noticed that when they then experience a TV or laptop, they instinctively attempt to touch or swipe the things on the screen that they want (in NN terms this would be known as "over fitting" - they have trained on too specific a set of data, so are poor at generalising - luckily, the introduction of a non-touch screen such as a TV expands their training set and they continue to improve their NN, getting better at generalising!)

So, this is basically how Word2Vec works. Which is pretty amazing if you think about it (well, I think its neat).

Word2Vec approaches

So how does this apply to Word2Vec? Well just like a smartphone gives implicit, in-direct feedback to a toddler, so the input data can provide feedback to itself. There are broadly two techniques when training the network:

Continuous Bag of Words (CBOW)

So, our NN has a large body of text broken up into sentences/lines - and just like in our last NN example, we take the first row from the training set, but we don't just take the whole sentence to push into the NN (after all, the sentence will be variable length, which would confuse our input neurons), instead we take a set number of words - referred to as the "window size", let's say 5, and feed those into the network. In this approach, the goal is for the NN to try and correctly guess the middle word in that window - that is, given a phrase of 5 words, the NN attempts to guess the word at position 3.

[It was ___ of those] days, not much to do

So its unsupervised learning, as we haven't had to go through any data and label things, or do any additional pre-processing - we can simply feed in any large body of text and it can just try to guess the words given their context.


The Skip-gram approach is similar, but the inverse - that is, given the word at position n, it attempts to guess the words at position n-2, n-1, n+1, n+2.

[__ ___ one __ _____] days, not much to do

The network is trying to work out which word(s) are missing, and just looks to the data itself to see if it can guess it correctly.

Word2Vec with DeepLearning4J

So one popular deep-learning & word2vec implementation on the JVM is DeepLearning4J. It is pretty simple to use to get used to what is going on, and is pretty well documented (along with some good high-level overviews of some core topics). You can get up and running playing with the library and some example datasets pretty quickly following their guide. Their NN setup is also equally simple and worth playing with, their MNIST hello-world tutorial lets you get up and running with that dataset pretty quickly.


A little while ago, I wrote a web crawler for the BBC food recipe archive, so I happened to have several thousand recipes sitting around and thought it might be fun to feed those recipes into Word2Vec to see if it could give any interesting results or if it was any good at recommending food pairings based on the semantic features the Word2Vec NN extracts from the data.

The first thing I tried was just using the ingredient list as a sentence - hoping that it would be better for extracting the relationship between ingredients, with each complete list of ingredients being input as a sentence.  My hope was that if I queried the trained model for X is to Beef, as Rosemary is to Lamb, I would start to get some interesting results - or at least be able to enter an ingredient and get similar ingredients to help identify possible substitutions.

As you can see, it has managed to extract some meaning from the data - for both pork and lamb, the nearest words do seem to be related to the target word, but not so much that could really be useful. Although this in itself is pretty exciting - it has taken an un-labelled body of text and has been able to learn some pretty accurate relationships between words.

Actually, on reflection, a list of ingredients isn't actually that great an input, as it isn't a natural structure and there is no natural ordering of the words - a lot of meaning is captured in the phrases rather than just lists of words.

So next up, I used the instructions for the recipes - each step in the recipe became a sentence for input, and minimal cleanup was needed, however, with some basic tweaking (it's fairly possible that if I played more with the Word2Vec configuration I could have got some improved results) the results weren't really that much better, and for the same lamb & pork search this was the output:

Again, its still impressive to see that some meaning has been found from these words, is it better than raw ingredient list? I think not - the pork one seems wrong, as it seems to have very much aligned pork as a poultry (although maybe that is some meaningful insight that conventional wisdom just hasn't taught us yet!?)


Whilst this is pretty cool, there is further fun that can be had - in the form of simple arithmetic. A simple, often quoted example, is the case of countries and their capital cities - well trained Word2Vec models have countries and their capital cities equal distances apart:

(graph taken from DeepLearning4J Word2Vec intro)

So could we extract similar relationships between food stuffs?  The short answer, with the models trained so far, was kind of..

Word2Vec supports the idea of positive and negative matches when looking for nearest words - that allows you to find these kind of relationships. So what we are looking for is something like "X is to Lamb, as thigh is to chicken" (e.g. hopefully this should find a part of the lamb), and hopefully use this to extract further information about ingredient relationships that could be useful in thinking about food.

So, I ran that arithmetic against my two models.
The instructions based model returned the following output:

Which is a pretty good effort - I think if I had to name a lamb equivalent of chicken thigh, a lamb shank is probably what I would have gone for (top of the leg, both pieces of slow twitch muscle and both the more game-y, flavourful pieces of the animal - I will stop as we are getting into food-nerd territory).

I also ran the same query on the ingredients based set (which remember, ran better on the basic nearest words test):

Which interestingly, doesn't seem as good. It has the shin, which isn't too bad in as far as its the leg of the animals, but not quite as good a match as the previous.

Let us play

Once you have the input data, Word2Vec is super easy to get up and running. As always, the code is on GitHub if you want to see the build stuff (I did have to fudge some dependencies and exclude some stuff to get it running on Ubuntu - you may get errors relating to javacpp or jnind4j not available - but the build file has the required work arounds in to get that running), but the interesting bit is as follows:
If we run through what we are setting up here:

  1. Stop words - these are words we know we want to ignore -  I originally ruled these out as I didn't want measurements of ingredients to take too much meaning. 
  2. Line iterator and tokenizer - these are just core DL4J classes that will take care of processing the text line by line, word by word. This makes things much easier for us, so we don't have to worry about that stuff
  3. Min word frequency - this is the threshold for words to be interesting to us - if a word appears less than this number of times in the text then we don't include the mapping (as we aren't confident we have a strong enough signal for it)
  4. Iterations - how many training cycles are we going to loop for
  5. Layer size - this is the size of the vector that we will produce for each word - in this case we are saying we want to map each word to a 300 dimension vector, you can consider each vector a "feature" of the word that is being learnt, this is a part of the network that will really need to be tuned to each specific problem
  6. Seed - this is just used to "seed" the random numbers used in the network setup, setting this helps us get more repeatable results
  7. Window size - this is the number of words to use as input to our NN each time - relates to the CBOW/Skip-gram approaches described above.

And that's all you need to really get your first Word2Vec model up and running! So find some interesting data, load it in and start seeing what interesting stuff you can find.

So go have fun - try and find some interesting data sets of text stuff you can feed in and what you can work out about the relationships - and feel free to comment here with anything interesting you find.