Sep 18, 2017

Software Circus

Intro

We just came home from two days of talks, workshops and lots of fun during this year's Software Circus. "Cloudbusting" was the theme this year and it took place in the lovely city of Amsterdam. The organization team did a great job finding a unique venue: A festival location including a Big Top circus tent, a rusty hangar and a large outdoor area is not quite the setting you expect from a software conference!

Not your standard conference at all, the Circus provided some relaxed, fun and joyful atmosphere for learning new stuff, meeting new people and chatting to old friends of the cloud native community. The conference was embedded into a futuristic story arch that was moved forward by several great performances of actors, singers and dancers in between talks and sessions. Loud music, great food and Dutch beer rounded out the experience.

Maybe - just maybe, but don't tell our boss - especially the first day was a bit heavy on the show and too light on the content side. We did, however, get to talk tech, as there were several tracks throughout the day. If anything, we would wish for more talks and hands-on sessions during the next year's event!

The second day was reserved for workshops and some deep-dive sessions. Heavy rain and wet feet couldn't stop us from being there, not like many of the other attendants.

The following sections cover the most interesting topics and talks that we experienced this year.


Machine Learning/AI

One big topic at the Software Circus 2017 was Artificial Intelligence (AI), especially Machine Learning (ML).

In a practical part Google gave an intro into Tensorflow. It’s an Open Source library for AI and ML, that is developed by Google’s Brain Team. It performs operations on multidimensional arrays, so called tensors. As Google uses it for it’s search ranking, Tensorflow is worth a closer look.

In a theoretical talk Thiago de Faria spoke about ML, AI and DevOps. He introduced the history of AI and ML which goes back to the late 50’s and 60’s, where the first algorithms occurred. In the 90’s support vector machines mark another big step until 1997 IBM’s Deep Blue beat the World Champion at chess. Nowadays IBM’s Watson is one of the most famous AI / ML projects.

As Thiago is an ML practitioner he pointed out some very important questions concerning DevOps in AI and ML systems which still remain unanswered. A normal program is tested automatically in the context of Continuous Integration (CI). But how can you apply CI to AI and ML systems? Can you create automated tests for such a system? As even the smallest change in an AI has unpredictable effects and might break completely disjoint features, this is a very important point. Furthermore, a normal program is debuggable. You can set breakpoints and follow the program execution. But how can you debug AI and ML systems as there exists no traditional program flow? He hopes that those questions might be answered as AI and ML become more and more explainable.

At last he expressed his concerns and fears regarding ML. On the one hand existing biases might propagate into a system’s learned behavior and influence its decisions, on the other hand people might tend to delegate decisions to algorithms as they are too afraid to decide for themselves. Only time will reveal if his concerns were unfounded.

Software Architecture 

With many buzzwords flying around, it is sometimes forgotten that certain topics never loose their relevance. Independent technology consultant Simon Brown delivered an inspiring (re-) imagination of the modern Software Architect. He debunks the notion that a capable architect is only doing the up-front specification work (the seagull approach), and is an avid proponent of a hands-on approach to software architecture. An architect needs to have people skills as much as technological expertise and is an essential building block for well-performing dev teams.

Simon also advocates the use of modelling tools to support development. This does not mean UML necessarily, he introduced his own creation as an alternative: The C4 model for software architecture is a lightweight alternative to get started with a better architectural documentation.

DevOps 

Improving the software delivery process is still a key concern today and it was an important topic at the Software Circus as well. Often summarized under the 'DevOps' term, speakers and workshop organizers examined the matter from different perspectives, sharing success stories and cautionary tales. Maarten Dirkse of dutch online bookstore bol.com gave a workshop on doing continuous delivery - including automating canary deployments - using Gitlab-CI and Spinnaker.

DevOps is not about a particular technology, it is about culture and collaboration. This is the primary takeaway from Kris Buytaerts emotional talk in which he explained how "Docker Is Killing DevOps Efforts". He uses the Anti-Pattern of the "Enterprise Container", which contain an entire application-stack from Message-Queue to Database. By that he shows that simply adopting a particular technology does not solve the delivery problems and reminds everyone of the core values of the DevOps movement: culture, automation, measurement and sharing (CAMS).

DevOps principals are not only important in application development, as "DataOps" practitioner Thiago de Faria explained. With increasing importance, machine learning and AI projects need to think about their own delivery pipelines to tackle lock-in, onboarding and delivery problems.


Dealing with Legacy Applications 

Even if this year's Software Circus was using the theme "Cloudbusting", there was some good news for those of us dealing with monolithic legacy applications: David Pilato from Elastic Search demonstrated how legacy applications can easily adopt the Elastic stack with incredibly small effort (see github.com/dadoonet/legacy-search). Twelve brave attendants watched this morning's first demo on adopting Elastic Search, all of us defying rain, wind, cold and noise from trains and ice carving...

The new 6.0.0-beta version contains some really cool features: the now build-in client for Elastic's API can be integrated in your JAVA applications quite easily, comes with convenient query builders, and grants out-of-the-box access to the following features:

  • Bulk processing for high performance index operations
  • Custom analyzers for easy index token definitions
  • Easy data aggregations for your application specific needs
  • Fuzziness factor for typo tolerance
  • Easy Kibana integration

Good to know that Elastic Search cannot be used for time series only. Can't wait to try it out in our applications!

Conclusion

The Software Circus is worth a visit, especially if you tire a bit of the conventional conference setting. It is a community event in the truest sense, bringing the people together and creating a comfortable backdrop for talks and sessions. If you come for the conversations and the spirit, you will be delighted. If you are only in for the content, then you may leave longing for some more - even though the Circus had lots to offer in that regards as well!

Jun 7, 2017

"I know it when I see it" - Perceptions of Code Qualty


Everybody talks about code quality, so surely we have a good understanding of what good (or bad) code is, exactly. Right?
 
Fig 1.: Code Quality according to XKCD (https://xkcd.com/1513/)

Well, no. There are certainly many books and scholarly articles on the topic, but they present a wide array of different, and often conflicting views. It doesn’t get better if you turn to industry: If you ask three professional programmers, you get four different opinions, and they’re often fuzzy and apply only to the kind of software the programmer is experienced with.

All attempts to come up with a simple and crisp definition that everybody accepts have failed. In the end of the day, people will resort to “I know it when I see it”. Unfortunately, that doesn’t quite cut it, neither from a scientific point of view, nor from a practical point of view.

Why should I bother?

Some software engineers might be tempted at this point to simply say “Not my problem” and turn away. However, consider the following two scenarios where this lack of a good definition truly is your problem. First, imagine a teaching environment, be it a secondary school or a university, and keep in mind that today’s students are tomorrow’s engineers so they will be your colleagues in no time. In any such setting, students expect to be told what is good code, and what isn’t. After all, that definition will surely affect how their work is graded. Therefore, the definition should be simple, universal, and easy to apply. However, there is a tension between simplicity and universality: simple solutions often fail in difficult situations. That is why practitioners often reject textbook definitions of code quality as simplistic, or vague.

Fig.2: WTF/h as a candidate code quality metric (http://techstroke.com/best-measure-of-code-quality/)

Now imagine a second scenario of a professional programmer acquainting herself with a piece of existing code. In order to understand the code, an IDE can provide valuable help by flagging suspect code to guide the programmer’s attention. Clearly, providing metrics (and threshold values) that the IDE should implement requires absolute precision in the definition of code quality. Without the necessary underpinnings, the tools will be of much less help, to fewer people.

However, the problem is not a shortage of definitions, concepts, and tools – quite the opposite, and all of them claim to be just the right thing, naturally. What we need is guidance to select our approach, lest we want to waste our energy and enthusiasm on ineffective ways or outright hoaxes (and yes, that happens a lot). Unfortunately, there is precious little evidence to help along the way. 

Now what?

In this situation, researchers and practitioners from Sweden, Germany, the Netherlands, the United States, and Finland teamed up to form a Working Group at ITiCSE (see WG 2), me among them representing QAware. The working group pursues three goals. First, it needs to validate the above observation and thus turn it into fact. Next up, we want to clarify and systematize the existing aspects of code quality to inform the conversation about code quality. Finally, we want to elicit and contrast the views on code quality that teachers, students, and professionals hold, respectively, with a view to deriving recommendations for programming education with a greater practical value.
Fig. 3: Aspects of Code quality (http://blog.techcello.com/2013/06/how-can-techcello-help-in-increasing-the-overall-quality-of-your-application/)

Based on the literature (and common sense), we have some up front idea of what we might find. For instance, we expect to find consistent opinions within groups of people in similar professional situations (i.e., teachers, students, and professional programmers), and different opinions across these groups, simply because they have very different levels of expertise, and are likely concerned with different kinds of quality issues. We expect a progression of levels of more and more global properties.
  • SYNTAX At the one end of the spectrum, there are syntax level issues, such as confusing the tokens “=” with “==”, and preference of language constructs (e.g., avoiding unsafe constructs, default-switch-cases and so on).
  • PRAGMATICS One step further up, small-scale pragmatic issues like identifier naming, indentation, and simple structural metrics like cyclomatic complexity.
  • UNITS The next level addresses complete units of code (often a class or module), and considers its overall structure, unit-level metrics (e.g., method/class length) correctness and completeness. 
  • ARCHITECTURE Finally, there is a level of architecture that is concerned with the structure and interrelation of units, e.g., it considers depth of inheritance trees, design patterns, architectural compliance, and other system-level properties.
Clearly, one has to master the lower levels before one can work effectively on the higher levels. But to what degree are the various groups aware of the elements of this hierarchy? Which are the predominant concerns, and what tools and sources of information used by the various populations? And which of the many issues at each level are really relevant, and how do they compare?

Starting Point

There are two types of evidence that exist addressing such questions. On the one hand, there are quantitative studies (mostly controlled experiments and quasi experiments) on very low-level aspects of code quality. Such studies are usually conducted on students and focus on simple metrics [1,2,4,7,8], or individual aspects such as readability [3,5]. Such studies aspire to provide scientific reliability, though necessarily losing ecological validity in the process. On the other hand, there are surveys and experience reports based on practitioner experiences such as [6,9] that generally lack the degree of focus (and, too often, also scientific rigor), but offer a higher degree of validity. Our Study, in contrast, uses a qualitative study design and is the first to look at differences across groups. 

Of course, many a practitioner might object that these questions in particular, or even scientific enquiry in general, while interesting, are of purely academic concern. People might often object that science is too slow, and lags behind coding practice and thus is unable to give good guidance for today’s developers. I beg to differ. While I am ready to accept criticisms of science being slow, sometimes wrong, and often not immediately applicable, it is still the only reliable (!) way forward. The IT industry is highly hype-driven, but lasting improvements are rare. 

Leaving aside this philosophical argument, I believe questions like the ones addressed in our study offer a set of very practical benefits.
  • Raising the awareness about code quality in academic (or school) teaching will trickle down into increased quality awareness and coding capabilities of graduates, and thus junior practitioners.
  • Reliable (i.e., scientific) insight into the relative contributions and effects of the various factors allows practitioners to focus their efforts on those properties that truly make a difference.
  • Finally, fostering understanding of the respective viewpoints should improve mutual understanding, and thus contribute to more collaboration, which I truly believe in—for the common good.
 Stay tuned for the initial results of our study due in late June, and follow me on Twitter @stoerrle!

 

References

[1]    Breuker, Dennis M., Jan Derriks, Jacob Brunekreef. "Measuring static quality of student code." Proc. 16th Ann Joint Conf. Innovation and Technology in Computer Science Education. ACM, 2011.
[2]    Buse, Raymond PL, Westley R. Weimer. "Learning a metric for code readability." IEEE Transactions on Software Engineering 36.4 (2010): 546-558.
[3]    Börstler, Jürgen, Michael E. Caspersen, Marie Nordström. "Beauty and the Beast: on the readability of object-oriented example programs." Software Quality Journal 24.2 (2016): 231-246.
[4]    Börstler, Jürgen, et al. "An evaluation of object oriented example programs in introductory programming textbooks." ACM SIGCSE Bulletin 41.4 (2010): 126-143.
[5]    Börstler, Jürgen, Barbara Paech. "The Role of Method Chains and Comments in Software Readability and Comprehension—An Experiment." IEEE Transactions on Software Engineering 42.9 (2016): 886-898.
[6]    Christakis, Maria, Christian Bird. "What developers want and need from program analysis: An empirical study." Proc. 31st IEEE/ACM Intl. Conf. Automated Software Engineering. ACM, 2016.
[7]    Posnett, Daryl, Abram Hindle, Premkumar Devanbu. "A simpler model of software readability." Proc. 8th Working Conf. Mining Software Repositories. ACM, 2011.
[8]    Stegeman, Martijn, Erik Barendsen, Sjaak Smetsers. "Towards an empirically validated model for assessment of code quality." Proceedings of the 14th Koli Calling Intl. Conf. on Computing Education Research. ACM, 2014.
[9]    Stevenson, Jamie, Murray Wood. "How do practitioners recognise software design quality: a questionnaire survey." (2016).

May 18, 2017

ApacheCon / Apache BigData - Day 2

Here is my conference coverage for ApacheCon and Apache BigData NA 2017 day 2. See day 1 coverage here.

Apache Ignite
Like last year in Vancouver Apache Ignite is again a big thing. It's really an amazing piece of technology. Here's the feature puzzle of Apache Ignite:
At the conference the following Ignite topics were covered for the lately released version 2.0:

SQL Grid
Ignite supports ANSI SQL 99 compliant access to the data within a memory grid. It supports even the tricky things like (distributed) joins and groupings and full-text search within the data model and geo-spatial qeries. The data is always consistent and transactions are ACID. Even if Ignite acts as an read-through/write-through cache for a relational database. This is a very interesting use case as this allows Ignite to act as an caching SQL proxy in front of an relational database. Ignite SQL can be accessed by an own JDBC and ODBC driver as well as by the Ignite SQL API. The relational data model within Ignite can be described and modified with SQL DDL and DMLs as well as by code annotations and XML configuration. The relational data model can also be imported from relational databases. Indexes are stored in-memory (off-heap) as B+ trees.

Streaming
With data streamers you can import data into an Ignite Cluster as stream with automatic partitioning support. Prebuilt data streamers for Kafka, RocketMQ, sockets, JMS, MQTT and others are available. The processing side are continuous SQL queries on sliding windows.

Web Console
There is a web console for Apache Ignite available for query execution, result visualization and monitoring. It also provides a schema import wizard from relational databases. 

File System
Ignite provides an in-memory file system which implements the Hadoop FileSystem API. So it can be used as a HDFS or Alluxio replacement for {Hadoop, Spark, Flink}. In this scenario it can also act as an caching layer between {Hadoop, Spark, Flink} and real (and persistent) HDFS. 

Ignite 2.1
Ignite 2.1 will be released within the next months. The big new thing will be an own high-performance persistent storage implementation to be able to provide durable scenarios without relying on external persistent storage solutions.

Btw.: Ignite claims to be way faster than Hazelcast and an Ignite book has just being completed.

Presto
When it comes to interactive analysis of big data Facebook's Presto seems to be the jack of all trades. It supports full ANSI-SQL (including joins) has its own JDBC driver and Tableau web connector and can connect to various data sources like files within HDFS in formats like Parquet and ORC as well as other persistent storages like Cassandra, Hive, PostgreSQL, and Redis. Presto can be enhanced by UDFs and provides enterprise-grade features like Kerberos and LDAP authentication and secured cluster-internal communication. Presto is maintained by a solid community and has a broad user base. There's also a nice web interface for Presto available from Airbnb. Beside Facebook also Teradata contributes to Presto with about 20 developers and provides an own Presto distribution with enterprise support available.

IoT
Apache is very busy in providing an open source IoT stack on top of mynewt, an real time operating system (RTOS) for low-level devices (Cortex M0-M4, MIPS, RISC-V) with included device management features like build and package mangement, remote firmware upgrade, secure bootloader and signed images.


Incubating Edget provides analytics capabilities at the edge from the cloud to the IoT fog.

May 17, 2017

ApacheCon / Apache BigData - Day 1

The Apache Foundation event management team is really excellent in choosing venues for their conferences. After Vancouver, BC last year this year's ApacheCon and Apache BigData takes place in beautiful Miami, FL. Following my conference coverage of day 1. See day 2 coverage here.

Notebooks
Notebooks for data analysis are very en vogue. Apache Zeppelin and Jupyter are the super heroes in that area. Pixiedust is a nice extension to Jupyter providing easy-to-use data visualization primitives. Helium is a new plugin system and package repository for Zeppelin providing various ready-to-use Zeppelin extensions (visualizations, interpreters, spell).

Cloud
Basically no surprise but a little bit surprisingly intensive is the promotion of Apache CloudStack as open source IaaS platform and competitor to OpenStack. I thought this war is over and OpenStack is the clear winner - but Apache doesn't want to capitulate.

Flink and Spark ... and Beam
Flink seems to be at eye level with Spark. Each time Spark is mentioned also Flink is mentioned. Apache Beam is also very good covered at the conference providing an abstraction layer atop of both. But concerning Apache Beam I'm very suspicious of abstraction frameworks of abstraction frameworks. Beam is also an abstraction for Google Cloud Dataflow. So it maybe also exists for Google having a "no vendor lock-in" argument. Btw.: Google is one of the most contributing companies to Beam.

Messaging
There are two new players around in the field of messaging systems. In the range between Kafka and classical messaging systems like ActiveMQ and RabbitMQ RocketMQ is just in the middle. RocketMQ is an open source contribution of Alibaba - one of the largest web-scale companies on earth. You can find a nice comparison chart of RocketMQ with Kafka and ActiveMQ here. RocketMQ provides more guarantees compared to Kafka like strict ordering but at a price: It's based on a master/slave architecture so it's not as scalable like Kafka. But compared with ActiveMQ and RabbitMQ it has a significant higher throughput through leveraging the pull/distributed log principle of Kafka. As RocketMQ also provides a JMS interface it could be on a real sweet spot between Kafka and ActiveMQ/RabbitMQ. Apache DistributedLog is not a full fledged messaging solution but a building block therefor. It provides a distributed log implementation - f.e. Kafka is also based on a distributed log. Allegro open-sourced Hermes, a message broken on top of Kafka extending Kafka with REST pub/consumer interfaces, message tracing and monitoring, and guaranteed message delivery at a sub-millisecond cost atop of Kafka.

Hardware Diversification
Spark and others are prepared to support diverse Hardware like GPUs, TPUs and non-volatile / durable RAM ... also with a talk on QAware research project "how to leverage the GPU on Spark". There is also a native lib from Intel (Math Kernel Library) which claims to speed-up ML use cases on Spark by 9x at no additional cost.

Dataservices
Dataservices is a new way how to process data and an alternative to Spark and Flink if you want to implement and run data processing applications atop of a microservice platform. I did a talk on how to implement dataservices with Spring Cloud Data Flow.
Others proposed to use a serverless framework like OpenWhisk to implement dataservices.

Jan 23, 2017

Setting up a distributed Ehcache with Mule ESB Community Edition

Setting up a distributed Ehcache on Mule ESB Community Edition is in fact quite simple and can be achieved in a few steps. After creating an Ehcache configuration, we set up a cache manager managed by Spring. We then use the previously defined caches in our Mule configuration together with a cache key extractor in a custom caching interceptor.

Related

If you have Mule Enterprise Edition, check out the Caching Scope which allows caching of predefined blocks inside flows. Ehcache also provides distributed caching with Terracotta and BigMemory Max.

Prerequisites

In our example we are using Mule 3.8.0 together with Ehcache 2.6.3 and Spring 4.1.6 inside a Glassfish 4 server. Mule is configured using XML configuration files.

Setting up the distributed Ehcache

The Ehcache is configured using ehcache.xml configuration files which consist at least of a list of cache configurations. For the distributed cache, we also need a peer provider, a peer listener and an event listener for each cache.

  • peer provider: locates other peers and manages a list of peers belonging to the distributed cache
  • peer listener: listens for incoming cache changes
  • cache event listener: listens for local cache changes and distributes changes to other peers

First, we set up the peer provider which locates other peers in the network and manages a list of peers which belong to the distributed cache:

<cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory" properties="peerDiscovery=automatic, multicastGroupAddress=224.0.0.1, multicastGroupPort=22401"/>

The peer discovery can either be done in automatic mode using multicast (as listed above) or in manual mode explicitly specifying the remote peer addresses. The latter approach is usually safer for company networks or data centers, but requires a lot of lines of configuration when using more than just a few caches and server instances. In automatic mode, the peer provider sends multicast messages to all server instances in the multicast group and tells them about its caches and the port on which the peer listener (see below) listens for incoming cache changes.

Together with the peer provider, we need a peer listener which listens for incoming cache changes:

<cacheManagerPeerListenerFactory class="net.sf.ehcache.distribution.RMICacheManagerPeerListenerFactory" properties="socketTimeoutMillis=2000"/>

If we do not specify a port as in the example above, Ehcache automatically chooses a high numbered port which is still unused. For company networks or data centers, you might want to specify the port explicitly. When using automatic peer discovery, the information about which server instance uses which port is distributed by the peer provider over multicast messages. In case of a manual peer discovery, the addresses and ports have already been stated explicitly in the peer provider configuration.

Finally each cache needs a cache event listener (defined inside its cache tag) which distributes cache changes such as new cache entries to the remote peers.

<cacheEventListenerFactory class="net.sf.ehcache.distribution.RMICacheReplicatorFactory"/>

Also see the Ehcache Replication Guide which is still available on the Ehcache.org website.

Configuring the cache manager

Next, we configure an Ehcache manager in our Spring configuration file. We will use the cache manager later to retrieve the caches and use them in our caching interceptor.

<bean id="ehcacheManager" class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean">
    <property name="configLocation" value="classpath:/ehcache.xml"/>
</bean>
<!-- optional: -->
<bean id="ehcacheCacheManager" class="org.springframework.cache.ehcache.EhCacheCacheManager">
    <property name="cacheManager" ref="ehcacheManager"/>
</bean>
<cache:advice id="ehcacheAdvice" cache-manager="ehcacheCacheManager"/>

The cache manager uses the previously defined ehcache.xml files. This is usually a good place to define separate configuration files for your environments e.g. using ${ } property expansion. For example you might want to disable the distributed cache when testing locally or define different addresses or ports for your production environment. Note that the configLocation is a Spring resource, so if you want to point it to a file not in the classpath, use the file: prefix, e.g. file:/path/to/ehcache.xml.

If we also want to use our caches elsewhere with Spring, it is a good idea to define a Spring cache manager (in the example above called ehcacheCacheManager) and an appropriate cache advice.

An interface for cache key extractors

In order to provide a cache key to our caching interceptor for every service, we implement a cache key extractor defined by a simple extractor interface.

public interface CacheKeyExtractor {
    Object extractKeyFrom(MuleEvent event);
}

For each service, we implement a concrete cache key extractor. An extractor could e.g. analyse the payload of the request, parse it and extract the relevant information that is a viable cache key. Since we pass the MuleEvent, we also have access to inbound, outbound and session properties set by Mule or could retrieve other information from our Spring context.

In order to be able to access our cache key extractor implementations in the Mule configuration files, we define them as Spring components (e.g. @Component("fooCacheKeyExtractor")) and give them a unique name for simple usage.

Implementing the caching interceptor

The last component needed for a working cache is the caching interceptor. It is implemented as a custom Mule interceptor. On a cache hit, further execution of the flow is stopped and the cached payload is returned. On a cache miss, the flow continues and the result of the execution is put into the cache. Logging messages and documentation are stripped from the following example code.

@Component
public class CachingInterceptor implements Interceptor {
    private static final String HTTP_STATUS_OK = "200";
    private Ehcache cache = null;
    private MessageProcessor next = null;
    private CacheKeyExtractor extractor = null;

    @Override
    public MuleEvent process(MuleEvent event) throws MuleException {
        Object cacheKey = extractor.extractKeyFrom(event);
        if (cacheKey == null) {
            return next.process(event);
        }
        Element cachedElement = cache.get(cacheKey);
        if (cachedElement == null) {
            // cache miss
            return updateCache(cacheKey, event);
        } else {
            // cache hit
            return lookupCache(cachedElement, event);
        }
    }

    private MuleEvent updateCache(Object cacheKey, MuleEvent event) throws MuleException {
        // invoke the intercepted processor
        MuleEvent result = next.process(event);
        String status = result.getMessage().getInboundProperty("http.status");
        if (!HTTP_STATUS_OK.equals(status)) {
            return result;
        }
        // cache the payload of the intercepted processor
        try {
            byte[] payload = result.getMessage().getPayloadAsBytes();
            if (payload != null) {
                cache.put(new Element(cacheKey, payload));
            }
        } catch (IOException e) {
        } catch (Exception e) {
        }
        return result;
    }

    private MuleEvent lookupCache(Element cachedElement, MuleEvent event) throws MuleException {
        // extract the cached payload
        try {
            Object payload = cachedElement.getObjectValue();
            MuleMessage cachedMessage = new DefaultMuleMessage(payload, event.getMessage(), event.getMuleContext());
            return new DefaultMuleEvent(cachedMessage, event);
        } catch (IOException e) {
            cachedElement.setTimeToLive(0);
            return next.process(event);
        }
    }

    @Override
    public void setListener(MessageProcessor messageProcessor) {
        next = messageProcessor;
    }
    
    public void setCache(Ehcache cache) { this.cache = cache; }
    public void setCacheKeyExtractor(CacheKeyExtractor extractor) {
        this.extractor = extractor;
    }
}

The caching interceptor can now be used in our Mule flows.

Configuring Mule

The Mule configuration is now simple. We first need access to our caches so we can insert them into the caching interceptor. The Spring EhCacheFactoryBean already provides the extraction of caches from our previsouly defined cache manager.

<beans:beans>
    <beans:bean id="fooServiceCache" class="org.springframework.cache.ehcache.EhCacheFactoryBean">
        <beans:property name="cacheName" value="fooServiceCache"/>
        <beans:property name="cacheManager" ref="ehcacheManager"/>
    </beans:bean>

    <beans:bean id="barServiceCache" class="org.springframework.cache.ehcache.EhCacheFactoryBean">
        <beans:property name="cacheName" value="barServiceCache"/>
        <beans:property name="cacheManager" ref="ehcacheManager"/>
    </beans:bean>
</beans:beans>

In our flows, we can now insert the custom caching interceptor. The interceptor is configured with the cache to be used (the name must match the one in the ehcache.xml file) and a cache key extractor that knows how to extract a cache key for this specific service. Since we defined the extractor as a named Spring bean, we can now easily inject it here. On a side note, a more sophisticated implementation of the caching interceptor could also e.g. find the extractor by some name magic using Spring. The message processor listener, which is also needed by the caching interceptor, is automatically set by Mule.

<flow name="fooService">
    <inbound-endpoint ref="foo-service-inbound-endpoint"/>
    <!-- ... -->
    <custom-interceptor class="de.qaware.caching.CachingInterceptor">
        <beans:property name="cache" ref="fooServiceCache"/>
        <beans:property name="cacheKeyExtractor" ref="fooServiceCacheKeyExtractor"/>
    </custom-interceptor>
    <!-- ... -->
    <outbound-endpoint ref="foo-service-outbound-endpoint"/>
</flow>

And that’s it. Calls to our foo service are now cached and distributed to our other nodes. Subsequent calls of our foo service should now be answered faster.

Troubleshooting

If you have problems with the Ehcache configuration, first make sure that the correct ehcache.xml file is loaded. Spring and Ehcache will switch to a default failsafe configuration in case of errors which will lead you on a wrong trail. Also have a look at the Ehcache log message at debug log level. Ehcache should print a lot of peer discovery messages for automatic mode and give you a hint on problems with your configuration. In case of problems with Mule, also have a look at the log messages in debug mode, they are quite verbose.

Nov 30, 2016

Continuously delivering a Go microservice with Wercker on DC/OS

Currently, I am really into the field of building cloud native applications and the associated technology stacks. Normally I would use Java as a primary language to implement such an application. But since everyone seems to be using Go at the moment, I figured it's about time to learn a new language to see how it fits into the whole cloud native universe.

So let's implement a small microservice written in Go, build a Docker image and push it to Docker hub. We will be using the Docker based CI platform Wercker to continuously build and push the image whenever we change something in the code. The complete example source code of this article can be found on Github here.

Before you start

Make sure you have all the required SDKs and tools installed. Here is the list of things I used for the development of this showcase:
  • Visual Studio Code with Go language plugin installed
  • The Go SDK using Brew
  • The Docker Toolbox or native Docker, whatever you prefer
  • The Make tool (optional)
  • The Wercker CLI, for easy local development (optional)

Go micro service in 10 minutes

If you are new to the Go language, make sure you read the Go Bootcamp online book

To build the micro service, we will only be using the 'net/http' and 'encoding/json' standard libraries that come with Go. We define the response structure of our endpoint using a plain Go struct. The main function registers the handler function for the '/api/hello' endpoint and then listens on port 8080 for any incoming HTTP requests. The handler function takes two parameters: a response writer and a pointer the original HTTP request. All we do in here is to create and initialize the response structure, marshall this structure to JSON and finally write the data to the response stream. Per default, the Go runtime will use 'text/plain' as content type, so we also set the 'Content-Type' HTTP header to the expected value for the JSON formatted response.

package main

import (
    "encoding/json"
    "net/http"
)

// Hello response structure
type Hello struct {
    Message string
}

func main() {
    http.HandleFunc("/api/hello", hello)
    http.ListenAndServe(":8080", nil)
}

func hello(w http.ResponseWriter, r *http.Request) {

    m := Hello{"Welcome to Cloud Native Go."}
    b, err := json.Marshal(m)

    if err != nil {
        panic(err)
    }

    w.Header().Add("Content-Type", "application/json;charset=utf-8")
    w.Write(b)
}

Now it is time to trigger the first Go build for our micro service. Open a terminal, change directory into you project folder and issue the following command:

go build -o cloud-native-go

You should now have an executable called 'cloud-native-go' in your project directory which you can use to run the micro service. You should also be able to call the '/api/hello' HTTP endpoint on localhost, e.g. curl http://localhost:8080/api/hello. Done.

Go CI/CD pipeline using Wercker

Wercker is a Docker native CI/CD automation platform for Kubernetes, Marathon and general microservice deployments. It is pretty easy to use, allows local development and is free for community use. For the next step, make sure you have the Wercker CLI tools installed. The instructions can be found here.

Create a file called 'wercker.yml' in the root directory of your project and add the following code snippet to it to define the local development build pipeline. We specify the Docker base box to use for the build as well as the commands to build and run the app.

dev:
  # The container definition we want to use for developing our app
  box: 
    id: golang:1.7.3-alpine
    cmd: /bin/sh
  steps:
    - internal/watch:
        code: |
          CGO_ENABLED=0 go build -o cloud-native-go
          ./cloud-native-go
        reload: true

In order to continuously build and run our Go microservice locally, and also watch for changes to the sources, you only have to issue the following Wercker CLI command:

wercker dev --publish 8080

This will download the base box, and then build and run the app inside the container. Om case of changes Wercker will rebuild and restart the application automatically. You should now be able to call the '/api/hello' endpoint via the IP address of your local Docker host and see the result message, e.g. curl http://192.168.99.100:8080/api/hello.

Once the application and the development build are working, it is time to define the pipelines to build the application and to push the image to Docker hub. The first pipeline does have 3 basic steps: first call Go Lint, then build the application and finally copy the build artifacts to the Wercker output folder for the next pipeline to use as inputs. The following code excerpt should be pretty self-explanatory.

build:
  # The container definition we want to use for building our app
  box: 
    id: golang:1.7.3-alpine
    cmd: /bin/sh
  steps:
    - wercker/golint
    - script:
        name: go build
        code: |
          CGO_ENABLED=0 go build -o cloud-native-go
    - script:
        name: copy binary
        code: cp cloud-native-go "$WERCKER_OUTPUT_DIR"

The final pipeline will use the outputs from the previous pipeline, build a new image using a different base box and then push the final image to Docker hub. Again, there is not much YAML required to do this. But wait, where is the Dockerfile required to do this? If you pay close attention you will notice that some of the attributes of the 'interna/docker-push' step resemble the different Dockerfile keywords.

deploy:
  # The container definition we want to use to run our app
  box: 
    id: alpine:3.4
    cmd: /bin/sh
  steps:
    - internal/docker-push:
        author: "M.-L. Reimer <mario-leander.reimer@qaware.de>"
        username: $USERNAME
        password: $PASSWORD
        repository: lreimer/cloud-native-go
        tag: 1.0.0 $WERCKER_GIT_COMMIT latest
        registry: https://registry.hub.docker.com
        entrypoint: /pipeline/source/cloud-native-go
        ports: "8080"

Once you have saved and pushed the 'wercker.yml' file to Github, create a new Wercker application and point it to this Github repo. Next, define the build pipeline using the Wercker web UI. Also make sure that you define the $USERNAME and $PASSWORD variables as secure ENV variables for this application and that you set them to your Docker Hub account. After the next 'git push' you will see the pipeline running and after a short while the final Docker images should be available at Docker Hub. Sweet!

Wercker is also capable of deploying the final Docker image to a cluster orchestrator such as Kubernetes, Marathon or Amazon ECS. So as a final step, we will enhance our pipeline with the automatic deployment to a DC/OS cluster running Marathon.

    - script:
        name: generate json
        code: chmod +x marathon.sh && ./marathon.sh
    - script:
        name: install curl
        code: apk upgrade && apk update && apk add curl
    - wercker/marathon-deploy:
        marathon-url: $MARATHON_URL
        app-name: $APP_NAME
        app-json-file: $APP_NAME.json
        instances: "3"
        auth-token: $MARATHON_AUTH_TOKEN

First, we execute a shell script that generates the Marathon service JSON definition from a template enhanced with some Wercker ENV variables. Then we install 'curl' as this tool is required by the next step and it's not included in the Alpine base image. Finally, we will use the built-in Wercker step to deploy 3 instances of our microservice to a DC/OS cluster. We use several ENV variables here, which need to be set on a deployment pipeline level. Important here are $MARATHON_URL and $MARATHON_AUTH_TOKEN, which are required to connect and authenticate to the Marathon REST API.

Summary and Outlook

Implementing simple microservices in Go is pretty straight forward. However, things like service discovery, configuration, circuit breakers or metrics aren't covered by the current showcase application yet. For real cloud native Go applications we will have a closer look at libraries such as Go-Kit or Go-Micro in the next instalment.

Stay tuned. To be continued ...

References


GOTO Berlin 2016 – Recap

I recently returned from Berlin where I attended the GOTO Berlin 2016 conference. Here are some of the insights I brought with me.

Diverse keynotes
There have been some amazing keynotes on important topics like prejudices, (neuro)diversity and algorithms gone wrong (producing biased, unfortunate and hurting results). I liked these talks a lot. Make sure you check out the talks done by Linda Rising, Sallyann Freudenberg and Carina C. Zona.

The Cloud is everywhere
This is no surprise. There have been many talks about cloud native applications and micro services. Mary Poppendieck did a good keynote, why these applications are so important now and in the future. On a more technical side IBM presented OpenWhisk as an alternative to Amazon's Lambda for building serverless architectures. It supports JavaScript, Swift, Python and Java right out of the box. Additionally, arbitrary executables can be added using Docker containers. What's especially notable about OpenWhisk is that it is completely open source (see https://github.com/openwhisk/openwhisk). So you could think about switching your provider or even host it by yourself. Of course IBM offers hosting on their very own cloud platform BlueMix.

UI in times of micro services
There have been a lot of talks covering the idea of using micro services and splitting up your application in different parts with potentially different independent development teams. Most of the time this is all about the backend. On the front end side you still end up with a monolithic, maybe single page, web application that uses these micro services.
Zalando introduced it's open source framework ‘Mosaic’, a framework for microservices for the frontend, that should tackle these problems. They do this by replacing placeholders in a template with HTML fragments. This happens during the initial page request on the server side (asynchronous replacements via AJAX are supported). The HTML fragments can be provided by the same team that developed the backing micro service.
Mosaic currently offers two server side components. One written in Go and one in Node.js.
Side note: to make the different application fragments look the same, they still have to provide some shared library code (in their case React components).

New ways to visualize data with VR/AR/MR
There was a talk and some demos about the new Microsoft HoloLens. Philipp Bauknecht put the HoloLens in the space of ‘mixed reality’ (as only existing device, Pokemon Go was the example for Augmented Reality). His talk covered some basics about the hardware, possible usage scenarios, existing apps and how to develop new applications.
The interesting part were some completely new possibilities of displaying data, which could result in amazing new kinds of applications. This is (with VR) one of the first really new output device for quite some time! Very exciting.

This and that

  • Ola Gasidlo mentioned PouchDb, an open-source JavaScript database inspired by Apache CouchDB. Interestingly, it enables applications to store data locally while offline, and then synchronize the data with CouchDB or compatible servers when the application is back online.
  • Ola introduced the phrase ‘Lie Fi’ to me: Lie Fi - Having a data connection, but no packages are coming through ;-)
  • Martin Kleppmann did an interesting talk about his algorithm for merging concurrent data changes. He did this with the example of a text editor like Google Docs. The project he is currently working on is actually about using cloud technology but with encrypted data (so you don't have to trust the cloud provider that much). The project is called Trve Data.