segunda-feira, 24 de outubro de 2016

Benchmark npm vs yarn

Few weeks ago Yarn was released to node community, it is an alternative solution to npm and others.

I heard a co-worked saying it was faster than npm, I found it interesting and I decided to do a quick benchmark using both npm and yarn with the installation of the project I work on.

I've done three tests:
  • Installation with yarn
  • Installation with npm
  • Installation with npm with cache

Yarn gets data from npm repository. When running yarn install, it uses cache by default, differently from npm, which you have to include an option to the install command, example: npm --cache-min 9999999 install

I've used the lib rmdir to clean up the directory node_modules before each execution. And I've also used child_process.exec to run the install command.
Note, rmdir uses child_process.exec behind the scenes. And rmdir could return a promise instead of requiring a callback :P.

Here is the code I ran to do the benchmark:

var rmdir = require('rmdir'),
    exec = require('child_process').exec;

rmAndInstall('yarn');
rmAndInstall('npm');
rmAndInstall('npm', '--cache-min 9999999');

function rmAndInstall(libName, options) {
    rmdir("node_modules", () => {
        const init = new Date();
        const opts = options === undefined ? '' : options
        console.log('Executing '+libName+' ' +opts+' install');
        exec(libName+' ' +opts+' install', () => {
            var result = (new Date() - init).toString();
            console.log(libName+' '+opts+' took: '+ result);
        });
    });
}

And here are the results in milliseconds, interesting humm?!

mcure@cure:~/github/project$ node app/benchmark_npm_yarn.js 
installing stuff with yarn
installing stuff with npm
installing stuff with npm --cache-min 9999999
yarn took: 8930
npm took: 35383
npm --cache-min 9999999 took: 28226

And that's really impressive because yarn execution time was almost 4x faster than npm using cache.

To be honest, I don't know exactly how mature yarn is and neither about its bugs, but I'll definitely push my team to use yarn, at least for evaluating it. Also I want to try all features and learn more about the tool.

It was a very simple post, but I hope I helped you to know a bit more about yarn. And I hope you get encouraged to try it as well.

The benchmark code is also on my github.

Tweet me if you wanna discuss more about it.

segunda-feira, 12 de setembro de 2016

REST anti-patterns

In the past few years I've been working and studying RESTful APIs, and I have seen some common mistakes in different projects and on online forums, then I decided to write this post based on some experiences and on stuff I've read on the internet.
Here are some anti-patterns, their explanation and examples.

URI not very RESTful


Your URI does not reflect the action that's happening under an existing resource.

RESTful APIs are about resources, when we're building our URIs, we need to tell a story about that resource, looking at the URI the consumer must understand all about the given resource, where it came from, which is its identifier, which options it has.

Let's say we have a resource called account and we need to close this account, what is the best way to represent this action?

Below I've divided some examples in wrong and correct. Which one makes more sense to you?

Wrong
  • POST /accounts/close
  • POST /closeAccount
Correct
  • POST /accounts/4402278/close
The wrong options doesn't give this visibility when looking at the URI. Probably we need to send some query parameters or a body, but it's not clear which is the name of the query parameter, neither the format of the body if needed

The correct option shows that the "account" 4402278 can be closed, we can deduce it just looking at the URI.

Using wrong HTTP methods


The HTTP methods must be used to give the intent of the action that is happening. If you are returning information, you must use GETS for example.

Below is a list of actions and its respective methods for the most common HTTP methods.

GET - Retrieve records
POST - Create records
PUT - Update whole records
PATCH - Updates pieces of records
DELETE - Delete records

Having said that, mistakes such as below are often seen:
Wrong
  • POST /accounts/4402278/delete
Correct
  • DELETE /accounts/4402278

The wrong example is doing a POST on accounts for a given id asking a delete option explicitly on the URI.

It's somehow clear, however you are doing it wrong because HTTP has a explicit method for deleting resources, which is DELETE.
There are many other cases which could make this post even bigger, but here is just the challenging.

Hurting Idempotency


No matter how many times you call GET on the same resource, the response should always be the same and no change in application state should occur.
  • Idempotent methods: GET, PUT, OPTIONS
  • Non Idempotent methods: POST

What about the method DELETE? If you DELETE /accounts/4402278 twice
  • The accounts will not be deleted manhttps://http.cat/405y times ... thinking this way it is idempotent
  • The second time the resource will not be found and should return a 404 Not found, this way it's not idempotent anymore

https://www.youtube.com/watch?v=6dVNdFwqeKs

Ignoring status codes


If your API only returns 200 (OK) or 500 (Internal Server Error), you are hurting the response codes.

The status codes were created to give the consumer an overall status of the final state of the request.

It means we need to be carefull when choosing status codes to represend this state of the requisitions, it need to reflect exaclty what happened and the end result.
Wrong:
  • GET /accounts/123456 (and there is no matching record) response: HTTP status 200 (ok) with a body saying it's not found
Correct:
  • GET /accounts/123456 (and there is no matching record) response: HTTP status 404 (not found)
If you are getting a resource as the example below, and there is no matching record, it should return 404, because it was not found. If you are returning a 200(ok) with a message in the body saying "Not found", you are doing it wrong, the status code says "ok" but the message says "not found", it's completely redundant.

Status codes will also help to give more clarity on the responses of your API. In many cases consumers only have to parse the status code to know exactly what happened, much simpler than parse responses with big strings.

HTTP status codes


Here are some of the most used, in my humble opinion :)

2xx - Success4xx / 5xx - Error3xx - Redirection
200 OK400 Bad Request301 Moved
201 Created401 Unauthorized302 Found
203 Partial Information402 Payment Required304 Not Modified
204 No response403 Forbidden
404 Not Found
500 Internal Server Error
503 Service Unavailable

Ignoring caching


It is easy to ignore the caching by including a header "Cache-control: no-cache" in responses of your API calls.

HTTP defines a powerful caching mechanism that include ETag, If-Modified-Since header, and 304 Not Modified response code.

They allow your clients and servers to negotiate always a fresh copy of the resource and through caching or proxy servers increase you application's scalability and performance.

Ignoring hypermedia


If your API calls send representations that do not contain any links, you are most likely breaking the REST principle called HATEOAS.

Hypermedia is the concept of linking resources together allowing applications to move from one state to another by following links.

If you ignore hypermedia, it is likely that URIs must be created at the client-side by using some hard-coded knowledge.

More on HATEOAS


Client interacts with a application through hypermedia provided dynamically the API

Current state of the application is defined by your data and the links on your payloads.

Client must have a generic understanding of hypermedia.

Allows the server functionality to evolve independently.

Interaction is driven by hypermedia, rather than out-of-band information.

Example


In the example below we have a resource called "account" which has 100.00 on it.

{
    "accounts": [
        {
            "accountNumber": "4502278",
            "balance": 100.00,
            "links": [
              {"rel": "deposit", href: "/account/4502278/deposit"},
              {"rel": "withdraw", href: "/account/4502278/withdraw"},
              {"rel": "transfer", href: "/account/4502278/transfer"},
              {"rel": "close", href: "/account/4502278/close"}
            ]
        }
    ]
}


Let's say now the owner of the account is on "the red" at this moment. the API should block some actions and show the payload such as:

{
    "accounts": [
        {
            "accountNumber": "4502278",
            "balance": -60.55,
            "links": [
                {"rel": "deposit", href: "/account/4502278/deposit"}
            ]
        }
    ]
}


Ignoring MIME types


If resources returned by API calls only have a single representation, you are probably only able to serve a limited number of clients that can understand the representation.

If you want to increase a number of clients that can potentially use your API, you should use HTTP's content negotiation.

It allows you to specify standard media types for representations of your resource such as XML, JSON or YAML

Conclusion


When building your APIs ...
  • Be coherent
  • Require headers
  • Use Standards (JSON-API)
  • Build well designed URIs
  • Return coherent status codes
  • Care about idempotency
  • Use correct HTTP methods

I really hope I helped you identifying some common mistakes and I also hope the tips given here will help you when designing your APIs.

Tweet me if you wanna discuss more about it.

sábado, 28 de maio de 2016

Versioning APIs

I'm currently working in a project where my team is developing an API, and one subject that came to the table is versioning. This is a very confusing subject and generates a lot of discussion, that's why I had the idea to write a post about it.

There are a few strategies for versioning a API, but let us get a step before, what happens in the project that requires a new version of the API?

Contract break


Let's say we have a resource called Person, its contract includes id, name, birthDate, address, zipCode and city. At some point a decision is made to change it and separate the address information from the Person. It means that the contract will be changed because the address information will be moved to a separated resource.

All consumers will be broken, because they look at the address information inside the Person. This is a contract break. There are other cases where new information is added to the contract, which means it will not break any consumers, so we cannot consider this case as a contract break.

Version it!


How to avoid breaking the consumers? Versioning it. At this point, API team will create a version 2 of the contract. Consumers will still use the old contract, however a version 2 of the contract will be published. Consumers and API team will now have an agreement on when the API will stop supporting the old version and consumers will have to start using the new version.

Be cautious


Versioning contracts looks like a good solution, but it can get dangerous if the API starts supporting a lot of versions. It will make your code look like a mess, hard to understand, too many branches on the code. I won't even mention that it can cause bugs (just did it :P). I'd say a good practice would be accumulating your contract breaks and release a new API version once you evaluate it is worth to. Also, I wouldn't have more than two vesions in parallel to avoid the issues mentioned above.

Versioning strategies


I've been researching a couple of solutions for API versioning, I will present and comment two strategies that most called my attention.

Version as path/query parameter

This is the strategy I've most seen on projects I worked on and on my researches. It consists in adding the version in the path like the example below:

https://host/api/v1/bands/1/albuns
https://host/api/v1/bands/albuns?version=1

Who is using this approach?
  • Twitter
  • Atlassian
  • Google Search

Version as a header

This is probably the less intrusive strategy, where the version is informed in the header Accept, leaving the URL clear. See the example below:

Accept: application/json; version=1.0

There are other ways to inform the version in the Accept header, but I thought this one is the clearer way. I also saw some example where people use a custom header like X-Version: 1.0

Who is using this approach?
  • Azure
  • Github API
  • Google Data API

Conclusion


A good contract design many times avoids contract breaks, which avoids versioning. Always be careful about the contracts, these set how the external world talk to your API. Contracts break, it's natural, however, always evaluate each change, think about your design and ask yourself if each change is really the right thing to do, collect the pros and cons and do smart decisions.

quinta-feira, 26 de maio de 2016

Integrating node.js and Apache Kafka

In this post I will demonstrate how we can integrate node.js and Apache Kafka, producing and consuming messages in a very simple example.

First of all, let us get Apache Kafka up and running, you can see how to do on the official kafka's site tutorial.

Once it's up and running, we can set up the project and start playing with the lib no-kafka

  • npm init
  • npm install no-kafka --save

I have used the version 2.4.2 of no-kafka. So, if you want to inform the version when installing, just run it as "npm install no-kafka@2.4.2 --save".

Here is a producer example, which will connect to kafka and produce messages in a topic.

var Kafka = require('no-kafka');
var producer = new Kafka.Producer();
 
return producer.init()
.then(function(){
  return producer.send({
      topic: 'kafka-test-topic',
      partition: 0,
      message: {
          value: 'Hello!'
      }
  });
})
.then(function (result) {
  console.log('topic sent');
});
If you are running a local instance, it connects automatically to the local host. To connect to a external instance, you can replace "localhost" by the external host following the example below:

var Kafka = require('no-kafka');
var connString = ' kafka://localhost:9092, localhost:9092 '
var producer = new Kafka.Producer({ connectionString: connString });

Here is a consumer example, which will connect to kafka and subscribe to a topic, receiving messages and printing them to the console.

var Kafka = require('no-kafka');
var consumer = new Kafka.SimpleConsumer();
 
// data handler function can return a Promise 
var dataHandler = function (messageSet, topic, partition) {
    messageSet.forEach(function (m) {
        console.log('topic received: ');
        console.log({
            'topic':topic,
            'partition': partition,
            'offset': m.offset,
            'message': m.message.value.toString('utf8')
        });
    });
};
 
return consumer.init()
.then(function () {
    return consumer.subscribe('kafka-test-topic', 0, dataHandler);
});

As you can see in the pieces of code above, all requests return promises. This is an example of the very basic features of the lib interacting with Kafka.
I put this project on my github, so then you can play with the code and evolve as needed.

Building APIs with HarvesterJS

HarvesterJS helps creating robust APIs on the top of mongoDB and node.js. It is a fork of fortuneJS and is JSONAPI compliant, and runs under Express. It gives the developer the ability to create contracts and validations with Joi.

In this post you'll see how to setup a very basic API with schema validations and some features of HarvesterJS.

Once the resources are properly set up, HarvesterJS provides the GET/POST/PUT/DELETE operations persisting the data on MongoDB.


Initial project setup

  • npm init
  • npm install harvesterjs --save
  • npm install joi --save


Seeting up the API with configs


app.js

var harvester = require('harvesterjs'),
    options = {
        adapter: 'mongodb',
        connectionString: 'mongodb://127.0.0.1:27017/mydb',
        inflect: true
    };
var harvesterApp = harvester(options);

require('./models/customer')(harvesterApp);
require('./models/contact')(harvesterApp);

function onListen() {
    console.log('listening on port 4567');
}

harvesterApp.listen(4567, onListen);



Setting up a resource


You can setup the resource fields and use JOI to describe and include validations on the field. In the examples below, we have a resource called customer with two fields: status and name

  • Status is a string which only accepts two values: Active or Inactive
  • Name is a string which is required.

Basic resource customer.js

var Types = require('joi');

harvesterApp.resource('customer', {
    status: Types.string().valid('Active', 'Inactive'),
    name: Types.string().required()
});


Linking resources



Regular link

In the example below, we have a resource called customer which has a link to a resource called contact. This link is an array of contacts, but you can have a single resource link.

var Types = require('joi');

harvesterApp.resource('customer', {
    status: Types.string().valid('Active', 'Inactive'),
    name: Types.string().required(),
    links: {
       contacts: ['contact']
    }
});


External link

In the example below, we have a resource called customer which has an external link to a resource called contact.

var Types = require('joi'),
   contactURI = 'http://localhost:2426/contacts';

harvesterApp.resource('customer', {
   status: Types.string().valid('Active', 'Inactive'),
   name: Types.string().required(),
   links: {
      contact: { ref: 'contact', baseUri: contactURI }
   }
});


Manipulating resources manually


HarvesterJS gives you the ability to manipulate documents manually. Once you have the harvesterApp object in place, you can use harvesterApp.adapter's methods to interact with mongoDB: find, findMany, create, update, delete.

These are the very basic features of HarvesterJS, for more information check its github.

domingo, 22 de maio de 2016

Funcional Programming with Python

I've been playing with functional programming for some years. I learned languages like Haskell and Scala which help us doing really nice functional things.

You can notice that I mentioned two languages which we can do functional programming, but not both of them are purely functional languages. Haskell is a pure functional language, however Scala is a hybrid language where you can do both of functional and/or object oriented programming. Some people don't like these hybrid languages just for the fact that you can mix both paradigms and actually do wrong things.

In my humble opinion, I don't see big problems, since you know what you are doing and your team have good practices like code review and design review.

This is a big discussion, I'm giving my opinion and I will try to prove that we can do really nice functional programming even with a hybrid language, like Python.

A bit of functional programming


Below are some characteristics of functional programming:
  • Expressions over statements (instead of using a whole for statement, use map/filter/reduce)
  • No side effect
  • Immutability
  • Simpler code
  • Expressiveness
  • Composable code


Functional is Stateless


Statefull

a = 0
def plus_one():
 a += 1
Stateless

def plus_one(a):
 return a + 1


Don't iterate over lists


numbers = [1,2,3,4,5,6,7,8,9]
odd = []
for n in numbers:
 if n%2:
  odd.append(n)


Use map/filter/reduce


def isEven(n):
 return n%2 == 0

numbers = [1,2,3,4,5,6,7,8,9]
odd = filter(is_even, numbers)


An even better way


numbers = [1,2,3,4,5,6,7,8,9]
odd = filter(lambda n: n%2 == 0, numbers)


An example with map


numbers = [1,2,3,4,5,6,7,8,9]
squares = map(lambda x: x * x, numbers)


An example with reduce


numbers = [1,2,3,4,5,6,7,8,9]
numbers_sum = reduce(lambda x, y: x + y, numbers)


Lambdas


greet = lambda name: 'hello {0}'.format(name)
print greet('cure')


High Order Functions

  • Functions that take function(s) as argument
  • Functions that return a function

def compose_func(func1, func2):
 return lambda x: func1(func2(x))

def build_engine(power):
 ...

def build_body(engine):
 ...

build_car = compose_func(build_body, build_engine)
build_car('500hp')


List Comprehentions

  • Comes from mathematics
  • { x2 | x ∈ ℕ }
  • x * 2 given x is contained on the set of natural numbers

a = [x**2 for x in range(10)]
b = [2**i for i in range(13)]
c = [x for x in a if x%2 == 0]


Conclusion


You don't need a pure functional language to do functional programming, you can follow its principles by thinking functional when coding. That's what I tried to prove with the examples above. I hope it helps you.