This week at Codemotion I gave a presentation on web scale patterns and how we apply them in the bol.com back office services. Codemotion is the biggest tech conference in Italy and one of the most important in Europe.
Next week I’ll be presenting in Amsterdam at Codemotion on web scale patterns in bol.com back office services. Codemotion is the biggest tech conference in Italy and one of the most important in Europe, with a network of more than 30k developers.
The title of my presentation is: “Applying “web scale” patterns in the bol.com back office”. In the session, we show you how we use “web scale” patterns to achieve scalability and flexibility in our back office software. We will guide you through how we apply patterns like CQRS, event sourcing, polyglot persistence and micro services to solve puzzles in our back office services. Interesting is that in our experience we don’t need these just to solve a technical problem. It helps us to solve some of our business problems!
In the previous weeks, we started a series of blog posts that show you how we use “web scale” patterns to achieve scalability and flexibility in our back office software. The previous patterns discussed were Event Sourcing and CQRS. This week we will dive into mixed SQL – NoSQL. Showing you how this doesn’t just solve a technical problem, they help us solve our business problems!
Mixed SQL – NoSQL
Where needed in our services we are moving away from pure SQL. We create a mix with other types of storage. So we are using NoSQL (Not only SQL). One could also call this polyglot persistence. The notion that your application can write to or query multiple databases or one database with multiple models. It uses the same idea as polyglot programming. This expresses the idea that applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems.
RTN – Billing platform for our retailers
RTN stores all kinds of transactions to charge and pay partners in our LvB (Fulfilment by bol.com) operations. In this part of our operation, we store and fulfil products in our warehouses for retailers that sell their goods on our platform.
The transactions to create invoices for our LvB partners stems from a number of services. We have all kinds of different attributes we want to account for to know why decisions have been made and for auditing purposes. These attributes depend on the transaction type. It was decided that the attributes wouldn’t be a part of the transactions table since they are only filled for a part of the records.
To accommodate for the attributes that depend on the transaction type we created an additional column in the table that is able to store key-value pairs in JSON format. The use of a pure SQL solution would have resulted in a weak design. As would the use of a pure NoSQL. In these cases, they work great together.
FNK – Warehouse orders
FNK processes our customer orders to create warehouse orders. It determines the warehouse that will fulfil the customer demand and instructs the warehouse to fulfil the order. Besides regular warehouses, it also communicates with our warehouse for digital products (e-books and software downloads) and with retailers that sell their products on our platform and take care of the fulfilment themselves.
These retailers have requirements that differ from the other warehouses. To accommodate these while avoiding a to a specific part in this services we introduced an additional column that stores XML. This mixture of SQL (one table for all warehouse orders) and NoSQL (stored XML) results in a simple model that can handle requirements that are only needed for a part of the orders. Since the data in the XML is hardly needed in this service but mostly in downstream services, there are no drawbacks on performance.
What we learned
The NoSQL parts in these mixed data stores are mostly used to read from. If you need to specifically filter on these of have a requirement to use them in joins performance will degrade.
Next in web scale patterns in the bol.com back office
In the next week’s episode on the following subject will be published:
- Micro services
In this series of blog posts, we show you how we use “web scale” patterns to achieve scalability and flexibility in our back office software. We will guide you through how we apply patterns like CQRS, event sourcing and micro services to solve puzzles in our back office services. These patterns don’t just solve a technical problem, they help us solve our business problems!
We need web scale in the back office since more and more functionality from the back office is needed on the web site to offer better service to our customers. For example, more parts of our web shop do request on our stock levels and warehouse configuration to determine how fast product can be delivered to our customers and with what options. Consequently, the services that know our stocks levels and warehouse configuration also have to be scaled to handle these volumes. To enable this we don’t just need more hardware, we also need to apply patterns to our services to create a proper structure.
CQRS is short for Command Query Responsibility Segregation. At the core of CQRS is the notion that a different model can be used to alter data than the model that is used to query data. Updating and reading information have different requirements on a model. There are enough cases where it serves to split these. The downside of this separation is that it introduces complexity. So this pattern should be applied with caution.
The most common approach for people to interact with data in a service or system is CRUD. Create, Read, Update and Delete are the four basic operations on persistent storage. The term was likely popularised by James Martin in his 1983 book Managing the Database environment. Although there exist other variations like BREAD and MADS, CRUD is widely used in systems development.
When a need arises for multiple representations of information and users interact with these multiple representations, we need something that extends CRUD. This because the model to access the data tends to be split over several layers and becomes overly complicated.
What CQRS adds
CQRS introduces a split into separate models for update and display, Command and Query respectively. The rationale for this is that for many problems in more complex domains having the same model for commands and queries leads to a more complex model. A model that does neither well.
Where do we use it at bol.com?
One of the examples of where we use CQRS in the back office services at bol.com is in our Inventory Management. Inventory Management handles all updates on stock levels and serves them to several services in out landscape including our web shop.
The updates of stock levels come from our warehouse management and include reservations based on customer orders, shipments and received goods. The queries on the stock level originate in the web shop, check out and fulfilment network. As you can imagine these queries have quite a different profile compared to the updates. Besides that, the number of queries far outreaches the number of updates.
Given these different requirements we decided to split command (updates) and query for inventory management. All updates are handled by a technically isolated part of the service. Stock levels are served by other services by another isolated part.
The part that handles the updates has several models. The incoming changes like the shipments and received goods have to be handled in for example stock mutations, stock levels and stock valuation. These models receive updates and process them to a new stock level and stock valuation. Once a new stock level is calculated, it is published on a messaging queue to the query part. This message is also consumed by other services that need these.
The query part is a simple single table. The messages from the update part are stored in this table and there is no additional logic or processing. Queries from other services are handled by a REST interface. Due to this design, this call has a very high cache hit ratio. Which of course leverages performance.
Next in web scale patterns in the bol.com back office
In the next week’s episodes on the following subjects will be published:
Everyone knows what debt is. If you are in or around the software development community you probably also know the term technical dept. For others:
Technical debt is a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution.
Or as Ward Cunningham describes it:
“Shipping first-time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite … The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organisations can be brought to a stand-still under the debt load of an unconsolidated implementation.”
But there is a third kind of debt: Organizational Debt. Here we pay interest on bad decisions of decisions that we put off. This, of course, has a strong impact on organisations.
Some definitions of Organizational Debt
In organizational debt is like technical debt but worse Steve Blank gives this definition:
Organizational debt is all the people/culture compromises made to “just get it done” in the early stages of a startup.
However, I think that organisational debt isn’t a startup thing. It is worse in other organisations since there is a large accumulation of bad decisions and decisions not taken. Besides that larger and/or older organisations tend to have more rules to work around.
That is why I like the shorter description offered by Scoot Belsky in Avoiding Organizational Debt:
Organisational debt is the accumulation of changes that leaders should have made but didn’t.
Another interesting description is given by Aaron Dignan in How to eliminate organizational debt
The interest companies pay when their structure and policies stay fixed and/or accumulate as the world changes.
This one really goes from the VUCA point. The ever changing world in which organizations have to adapt or are to be extinct.
Over at ScaleScale, a blog about all the good stuff when it comes to scaling, an interesting post was published on the stack behind Netflix scaling. Since Netflix is quite public about how they operate, the post put was together with stuff from around the internet.
Like Spotify Netflix is kind of famous for creating and scaling their culture. This gives some important context to the culture to understand how they scale their software stack and why it works. If you are interested in scaleable platforms and full stack development check it out.
The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win is written the by Gene Kim in the tradition of The Goal (1984, by Dr. Eliyahu M. Goldratt). The Goal is a management novel explaining the Theory of Constraints. This book, The Phoenix Project shows how the theory in The Goal works in an IT environment.
The Goal – Theory of Constraints
In simple terms the Theory of Constraints is about:
A chain is as strong as its weakest link.
In this theory the first step is to identify the constraint. Step 2 is to exploit the constraint. In other words, make sure that the constraint is not allowed to waste any time. Only by increasing flow through the constraint can overall throughput be increased. This to the extend that improving something anywhere not at the constraint is an illusion.
Because of the need for flow, work in process (WIP) is the silent killer. Therefore, one of the most critical mechanisms in the management of any plant is job and materials release. Without it, you can’t control WIP.
The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win
The Phoenix Project describes the problems that almost every IT organization faces, and then shows the practices (based on the Theory of Constraint, Lean and more) of how to solve the problems. The main character Bill, is thought how to deal with these problems using the Socratic Method. Each dialogue a question is posed to which in turn causes Bill to think and to talk to his colleagues to come up with a solution to their problem.
Bill starts to see that IT work has more in common with manufacturing plant work than he ever imagined. Leading to the application of the Theory of Constraints in terms like:
The First Way helps us understand how to create fast flow of work as it moves from Development into IT Operations, because that’s what’s between the business and the customer. The Second Way shows us how to shorten and amplify feedback loops, so we can fix quality at the source and avoid rework. And the Third Way shows us how to create a culture that simultaneously fosters experimentation, learning from failure, and understanding that repetition and practice are the prerequisites to mastery.
Work in process in IT perspective
Until code is in production, no value is actually being generated. It’s merely WIP stuck in the system. By reducing the batch size, you enable a faster feature flow. In part this is done by ensuring the proper environments are always available when they are needed. Another part is automating the build and deployment process. Here we recognize that infrastructure can be treated as code, just like the application that Development ships. This can enabled to create a one-step deploy procedure.
Besides the parts mentioned before this requires removing a unneeded (since no value is created) hand off between Development and Operations. For this to work the two have to be integrated, not separated.
Like in a manufacturing plant, in IT, it is crucial to manage the release of work to the shop floor / development and to track the work in process. There are a lot of visual aids available to support this, like Kanban or scrum boards. All have their origin in lean or agile ways of working.
No need to say that in the novel this all works out pretty well 😉 In real life we see that these principles work, however more iterations are needed to really improve things. These iterations at first look like failures because of the acceleration of entropy. They are needed in the learning process of people and organization. Reduce the feedback cycle and learn fast!
On the relation between business and IT
There are some interesting statements in the book, that are heard more often in the industry.
IT is not just a department. IT is a competency that we need to gain as an entire company.
We expect everyone we hire to have some mastery of IT. Understanding what technology can and can’t do has become a core competency that every part of this business must have. If any of my business managers are leading a team or a project without that skill, they will fail.
In ten years, I’m certain every COO worth their salt will have come from IT. Any COO who doesn’t intimately understand the IT systems that actually run the business is just an empty suit, relying on someone else to do their job.
Personally i think they hold at least some value. Please share your ideas in the comments.
In this blogpost I’ll share a list of books I read during the first six months of 2013.
Hadoop – The definitive guide
This book proved very useful to get an introduction and solid background in Hadoop. I was reading it a little before starting an enhancement of MapReduce code. This made it possible to better understand the production code and how to make the changes.
Wanted to read Essential Scrum to renew and deepen my theoretical knowledge of Scrum. This is a great read for that purpose!
I like the visuals that are used and set it apart from other books on the subject. Besides that I liked the MindMap-like figures that support the stucture in the chapters.
The scope goes beyond the core of Scrum and does that well. It also touches on subjects like Multilevel and Portfolio planning, The role of managers in Scrum context, and Product Planning.
This is a great follow up read for anyone with basic Scrum training or certification. It doesn’t just offer the big picture but both details and examples on how to become more agile. It will help you deal with the complexities of implementing and refining Scrum.
Thinking Fast and Slow
The aim of Daniel Kahneman the author of Thinking Fast and Slow is to enrich the vocabulary of people talking at a watercooler, where opinions and gossip are exchanged. He wrote this book to influence the way they talk about judgements and choices of others. He has succeeded. As Economist has put it: Kahneman shows that we are not the paragons of reason we assume ourselves to be. When you realise this it put you and the world around you in a different perspective.
Mr. Kahneman is a person that understands like no other on the planet how and why we make the choices we make. He knows how to share his insights! This is a great read for any curious mind, escpecially those with an interest in how and why we make choices.
This book will change the way you think.
There is an interesting talk on Thinking Fast and Slow by Mr Kahneman at the The Long Now.
Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
The book Seven databases in seven weeks will take you on a tour visiting some of the hottest open source database today. This is typical software development reading.
It has a progressive style of offering insigts to databases and their capabilities. The open source databases covered are PostgreSQL, Riak, Apache HBase, MongoDB, Apache CouchDB, Neo4J, and Redis. These were chosen to span five database styles or genres: Relational, Key-Value, Columnar, Document and Graph.
This book is recommended for anyone looking for a solid introduction fo databases besides the traditional RDBMS. It will provide the knowledge you need to choose one database to suit your needs.
What Oracle and some other BPM and ECM vendors call Adaptive Case Management – ACM – is called Dynamic case management by Forrester and others. The notion of a case and the need for these systems emerge from requirements elicited by existing Business Process Management (BPM) and Enterprise Content Management (ECM) implementations. Forrester states:
We found a clear recognition that older process automation approaches based on traditional mass production concepts no longer fit an era of peolple-driven processes.
Types of Dynamic Case management
Forrester uses a division in three categories of Case Management:
- Investigative – Examples are Audit request, Fraud detection and regulatory queries. All these are aiming at risk mitigation and cost control.
- Service Request – Think claims, customer service, underwriting and customer onboarding. Processes like these are aimed at customer experience and risk mitigation.
- Incident management – Think managing complaints, order exception and acute helth care. This categorie is aimed at customer experience and cost control.
Dynamic Case Management extends BPM
In contrast to traditional BPM products, DCM software supports:
- The ability to run multiple procedures against a given case of work – An individual case instance can be influenced by multiple processses.
- The ability to associate different types of objects with a case – A set of data (structure, unstructured, assets, customers calls, etc) provide the context for an individual case.
- Mechanisms that allow end users to handle variantion – Humans working on the case use their skills and expertise to interpret what is needed to handle the case and see the results of this reflected in the supporting system.
- Mechanisms to selectively restrict change on a process – Certain lock down of change on certain assets is required due to compliance on one hand and facilitating goal-centric behavior on the other hand.
Beware of the untamed processes
In every organization there are several to loads of untamed processes. With a growing demand to track these, meet compliance regulations and gain insight on their effectiveness (and efficiency). Dynamic Case Management aligns with these untamed processes since they support:
- both structured and unstructured content
- both human and system controlled processes
- facilitating khowledge and expert guidance
DCM has very strong point when bringing flexibilty and manageability together. It provides visibility and control for tasks that have to be performed. Key drivers for the DCM initiatives are both agility and traceability.
Oracle and ACM
As Forrster states: Many ECM and BPM tools form the basis for Dynamic Case Management solutions. With PS6 and release 12c of the Oracle BPM Suite, Oracle will take a leap into Adaptive Case Management segment as they call it. Check the other vendors in the Forrester Wave for Dynamic Case Management.
While preparing guidelines for the usage of the Oracle Service Bus (OSB) I was looking for a definition of a Service Bus. There wasn’t one on my blog yet (more posts on integration) so i decided to use the following and share them with you.
Forrester Service Bus definition
From 2009 Forrester has used this one:
An intermediary that provides core functions to makes a set of reusable services widely available, plus extended functions that simplify the use of the ESB in a real-world IT environment.
Erl Service Bus definition
Thomas Erl offers the following description of a Service Bus::
An enterprise service bus represents an environment designed to foster sophisticated interconnectivity between services. It establishes an intermediate layer of processing that can help overcome common problems associated with reliability, scalability, and communications disparity.
An Enterprise Service Bus is seen by Erl et al as a pattern. That is why it is even more important to share what that patterns is. Later on I’ll also shortly describe the VETRO pattern. Also a very useful pattern to use when comparing integration tools or developing guide lines.
Erl Enterprise Service Bus pattern
On the SOA patterns site we learn that an enterprise service bus represents an environment designed to foster sophisticated interconnectivity between services. The Enterprise Service Bus pattern is a composite pattern based on:
- Asynchronous Queuing basically an intermediary buffer, allowing service and consumers to process messages independently by remaining temporally decoupled.
- Service Broker composed of the following patterns
- – Data Model transformation to convert data between disparate schema structures.
- – Data Format transformation to dynamically translate one data format into another.
- – Protocol bridging to enable communication between different communication protocols by dynamically converting one protocol to another at runtime.
- Intermediate routing meaning message paths can be dynamically determined through the use of intermediary routing logic.
- With optional the following patterns: Reliable Messaging, Policy Centralization, Rules Centralization, and Event-Driven Messaging. Also have a look at slide 12 etc of the SOA Symposium Service Bus presentation.
VETRO pattern for Service Bus
The VETRO pattern was introduced by David Chappell, writer of the 2004 book Enterprise Service Bus.
- V – Validate: Validation of messages eg based on XSD or schematron.
- E – Enrich: Adding data from applications the message doesn’t originate from.
- T – Transform: Transform the data model, data format or the protocol used to send the message.
- R – Routing: Determine at runtime where to send the message to.
- E – Execute: You can see this as calling the implementation.
We also used this pattern to compare Oracle integration tools and infrastructure. It can be very well used while choosing the appropriate tools for a job and deciding on guidelines on how to use these tools.