Tuesday 31 March 2009

Private and Public Cloud Outages and Performance










Some high profile reported outage examples of public cloud services. (Sources: blog reports and service monitoring sites).

·         Nov 26, 2007, Yahoo e-commerce services. Heavy online traffic affected half of 40,000 sites that subscribe to yahoo’s e-commerce service. The outage prevented sales from being compoleted on thousands of web sites that depend on the e-commerce service. Outage: approx 6 hours

·         Feb 11, 2008,  Saleforce.com , North American CRM servers, NA5, up and won for most of business day Feb 11. Due to software upgrade installed over weekend causing subsequent service degradation. Outage: 24 Hours

·         Feb 15, 2008  Amazon  S3/EC2  Outage: started at 4.30am to 7.00am (approx 2 hours). Affected many startup sites e.g. Twitter, SmugMug, 37Signals, AdaptiveBlue that use S3 to store data fro their websites.

·         Feb 19, 2008  Yahoo mail smtp. Outage: delays in smtp service estimate 24 hours

·         April 28, 2008 Amazon S3. Service authentication system overloaded with user requests. Outage: 3 hours.

·         Jul 20, 2008 Amazon S3  internal system problems causes S3 to be inaccessible for up to 8 hours. Outage:  5 hours 45 minutes

·         Jul 22 , 2008 Apple MobileMe launch. Mail server crash, some subscribers without email access for 5 days. Overall affecting less than 1% of customers have lost permanently some emails  sent between 18 July and 22 July.

·         Aug 6, Google Gmail, small  number of Apps premier users affecting some users 24 Hours

·         Aug 7, 2008  Citrix , GoToMeeting,GoToWebinar. Due to surge in demand Outage: a few hours. 

·         Aug 8, 2008  Nirvanix and MediaMax/The linkup(Storage). Cloud service failed and closed. Lost unspecified amount of customer data and approx 45% of all data stored. Linkup had about 20,000 paying subscribers.  The aim was to migrate to Nirvanix storage delivery network but only a partial migration was possible before closure.

·         Aug 12, 2008  Google Gmail. Users unable to access mail boxed as Gmail returned a “Temporary Error (502)”.  About 20 million users visit Gmail daily, with more than 100 million accounts in total.  Issue caused by a temporary outage in the contacts system used by Gmail which prevented Gmail from loading properly.  Outage: officially 1 hour 45 min (unofficial 2 hours)

·         Aug 15, 2008 Google Gmail, small  number of Apps premier users affecting some users 24 Hours

·         Aug 26, 2008  XCalibre flexiscale Cloud affecting many businesses using flexiscale on-demand storage, processing and /or network bandwidth. Cited as partly human error. The data structure was not replicated across multiple data centers.    Outage: 2-3 days

·         Jan 6, 2009, Saleforce.com, System wide outage. All Salesforce.com services across all regions were largely unavailable between 12:39pm and 1:17pm. Outage : approx 40 minutes.

·         Feb 24, 2009, Google gmail outage in America and Europe. Third outage in 6 months.  One blog estimate suggested 62 hrs in last 8 months calculating 99.2%, projecting to 99.4% in 12 months.  Outage: 2 hours 30 minutes.

·         March 10, 2009  Google Gmail small number of users affected. Gmail has approx 113 million users (comScore).  Outage: Partially fixed in a few hours but between 24 and 36 hours to restore all affected accounts.

The reduction and duration of frequency of outages has improved from two to three year ago during the start up phases of these services. The current performance should also bee seen in the light of the size of the user accounts that the large public vendors manage which far outnumber even large scale outsourcing and public infrastructure user groups that may be in the order of 100,000+ unique desktop users to 5-10 million subscriber accounts. 

  • Google Gmail has 113 million active accounts,  March 2009

  • Facebook has 175 million active accounts, March 2009

  • Amazon S3 stores more than 29 billion objects , October 2008

  •  Yahoo! Mail has 260 million users with a 67 Petabyte server in the California Region, March 2009

  • Myspace had 106 million accounts inn Sept 2006. Myspace was overtaken by its main competitor Facebook in April 2008

  • Twitter has 4-5 million users November 2008

  •  Apple intunes sold 9 billion songs, representing 70% worldwide digital sales, Jan 2009

  •  Yahoo! Websites receive 2.4 billion page hits per day in October 2007 

These statistics support the “wikinomics paradigm” of a huge online resource and user capacity in comparison to physical bricks and mortar storage and products range. The microeconomics and service design has significant economies of scale leverage.

 With the cloud and on-demand services becoming more visible in mainstream discussion these events will become more critical.  A  learning point from these public cloud failures is the need for transparent communications with the user groups. With the cloud service becoming more visible it is necessary to increase the level of communications on system status to the users in parallel with any system technology improvements.

Yet most proprietary system failures go unnoticed by all except those affected directly. In the cloud there is however more transparency and higher visibility of failure and downtime events.

Google has stated it guarantees corporate customers of google enterprise services will pay for use of Google Apps Premier Edition that Gmail will be available 99.9% of the time.  The 0.1% would be taken literally as 8.76 hours per year.   Google publishes a status dashboard:

Amazon has implemented availability zones and persistent storage and elastic IP addresses rather than static address to enable dynamic remapping on the fly to point to compute instances by the user rather than a Amazon data technician. Amazon have announced a S3 storage service with a 99.9% SLA availability back in October 2007.  Many companies are stated as using AWS to handle spike overflow called “cloud bursting”.

Saleforce.com publish a operating status dashboard for all its server groups globally. This also includes a maintenance schedule for planned downtime typically duration of 1 hour.

In conclusion you can draw at least three potential outcomes as next steps if cloud computing is to become enterprise level for public , private and hybrid combinations of cloud services.

·         Use cloud burst technology to hot box failure and continuity of cloud services

·         Accept existing public cloud service levels as these may be higher than your current service levels for a number of non-core or even core services.

·         Build a private cloud that has the elastic compute benefits of the cloud but is preserved and managed as a internal data center standard

Monday 23 March 2009

SASS - Short Attention Span Summary - a brief history of the Cloud

I recently was looking at the reviews of the “Day the earth stood still” film on Amazon as at the time of writing it is due to release to DVD and I can’t decide if this movie is a complete lemon or a subtle recasting of a classic with new ideas. It certainly took a hammering by the box office critics but some reviews including my own amateur effort saw some elements that were of merit.  The point I found more interesting however was a reviewer, pen name Amanda Richards who used the concept called SASS- Short Attention Span Summary. Apart from the prodigious output I thought the concept of SASS and the notable art of a short blog was a really useful aid.  So here goes with my SASS of “TheLong Tail”, ““Wikinomics”, “The Big Switch”, “The World is Flat”, “Does IT Matter” and “IT Doesn’t Matter , Business Processes do” in chronological order.  I thought I’d include all six as they appear to be the mainstream cloud background subject matter books currently.

The SASS of the cloud is:

  1. "Does IT Matter" - No
  2. "IT doesnt matter, but business processes do" - Yes , everything is about processes
  3. "Wikinomics" - but IT is everywhere and potential to change the way we do business
  4. "The Long Tail" - Yes and its changed the way products and services are created, provisioned and delivered
  5. "The world is flat" - yes and its now global and the west missed the significance of this shift
  6. "The Big Switch" - The significance is that we will have virtual businesses, centralized IT utility services - the cloud changes everything....

Read on

“Does IT Matter ?” Nicolas Carr, (HBR article published 2003)

  1.  Postulate: IT has become a commodity and wide spread and no longer provides business competitive advantage
  2. Companies spent billion of dollars on IT but have not seen real competitive advantage improvements.
  3. IT investments need to assess the role of IT on business and commerce to achieve the right focus on value and competitive advantage
  4. Strategic importance of IT is decreasing.

 “IT Doesn’t Matter , Business Processes do” , Howard Smith, Peter Fingar, Nicholas Carr, (Aug 2003)

  1. Postulate:  Business processes enable business competitive advantage. It is wrong and dangerous to ignore processes and the role of IT
  2. Michael Hammer’s 1990 article “Reengineering work” is a example of this
  3. Strategic importance of IT is actually increasing  e.g. the emergence of Web services enabling businesses to redesign and innovate new services. 
  4. A business process revolution will occur as businesses redefine the use of IT in the context of their operations and markets.

Wikinomics, Don Tapscott, Antony Williams 2006

  1. Postulate: Mass collaboration changes everything
  2. The perfect storm: internet, web 2.0 tools, collaborative platforms
  3. The emergence of peer production - prosumers
  4. The wisdom of crowds, acting globally, shared spaces – the world is your R&D department
  5. Open and free e.g. Open Source Ecosystem e.g. Linux spawned a multi-billion dollar ecosystem and changed balance of power in Software Industry
  6. Ideagoras – Marketplaces for ideas, innovations and skills. Engage and co-create- co-innovation – emergence and serendipity.
  7. Escalating scope and scale of resources applied to innovation means change can unfold more quickly. Getting the right ratio between internal and external innovation.
  8. The collaboration economy, the business web
  9. Lower barriers to monetizing co-creation and collaboration channels
  10. Managing complexity – a Darwinion approach.
  11. The rise of social computing, Enterprise 2.0, harnessing the power of wikinomics.
  12. Building critical mass, supply a platform for collaboration, people governance enablers, incentives, build trust, let the process evolve,  objectives, leadership, culture of collaborative mind.

“The Long Tail” , Chris Anderson 2006

  1. Postulate:  Size of potential market is sum of all participants
  2. Temporal competition and the back catalogue including niche products can all be provisioned
  3. End of the hit parade
  4. The power of free
  5. The tyranny of locality versus logistics anywhere
  6.  The economics of abundance: Acquisition costs DOWN, Average Sale price DOWN, Gross Margin UP
  7. Why?
  8. Democratization of tools of production, distribution, joining of supply and demand (Industrialization)
  9. The power of peer production and collective intelligence
  10. One size does fit all - The emergence of statistical multiplexing
  11. The aggregators emerge e.g.  Physical retailers à Hybrid retailers à Pure digital retailers
  12. The paradox (of the cloud) – Long tail drives a shift towards 101 tastes and choice; the ability to provision is king

“The World is Flat”, Thomas Friedman, 2006

  1. Postulate:  The global balance of economics have changed with a triple convergence of complementary goods, horizontal collaboration business models emergence, and opening up of eastern markets into a global market together with 10 flattener (Changers) of new open playing field.
  2. The perfect storm: Late 20th century investment in fiber optic cables between west and east; collaborative tools on the internet; economic reforms to enable eastern countries to enter and exploit technologies and services.
  3. The west were looking the wrong way while the east has caught up in China, India, former Soviet Union countries emergence and Asia pacific markets
  4. Ten flatteners: Collapse of the Berlin Wall; Netscape, Workflow, Open Sourcing, Outsourcing, Offshoring, Supply chaining, Insourcing (BPO e.g. UPS repairs Toshiba PCs), In-forming, “The Steroids”( personal mobile communications devices)
  5. Connection and Collaboration
  6. The 21st Century is flat  

“The Big Switch”, Nicholas Carr, 2008

  1.  Postulate: Business and society will fundamentally change to a virtual society creating fundamental shifts in people’s lives, markets, jobs and skills
  2. IT will become like the utility industry – the “electrification”, “the Electric Grid” of the IT industry
  3. The PC age giving way to the Utility age
  4. The internet will become a “world wide computer”
  5. The rise of the Google model will supersede the old ownership models ( the older Microsoft model – Bill Gates memo to employees Fall 2005 creates a path to the utility age)
  6. The Google model will create super data center services outmatching any company own computer investments.  All physical hardware will to “centralized” as commodity utility services.
  7. Virtualization will truly revolutionize all IT
  8. 3Tera’s software provides an example of what the future of computer business might look like: No physical products at all. Create virtual versions of their equipment of software and sell them as icons that plug into programs like AppLogic.
  9. Move to service based economies
  10. From the many to the few – consolidation and flushing out versus the power of the many and 101 choice
  11. Living in the Cloud
  12. Collaborative content
  13. Accelerating concentration of wealth in large businesses
  14. Clustering of like minds - a threat or a benefit?
  15. The threats to the internet
  16. The spiders web – zero privacy, you are what you do and say profiling.
  17. iGod - all Human knowledge, wisdom and interactions inscribed into the internet
  18. Impact of this on society and relationships. 

SoA Gods Kitchen


A colleague at work, Adam Philips produced a great picture concept for the description of SOA based on earlier work entitled “SOA Beef Bourguignon”.

I love the slide, its one of the best SOA slides I've seen in a long while, I can relate to that cook with the rolling pin! the use of the pictures to convey the concept of the tiers is very good.  Strictly the abstraction of logic and infrastructure does have more layers, the ingredients would include the hardware: servers, networks, storage, security etc so you could add another cylinder on the bottom called "Infrastructure" and add a pic like kitchen utensils...   you also can show the multiple form factors that can consume services such as a Mobile, laptop, netbook, mainframe terminal etc at the top of the picture.

This picture can be further enhanced with  SOA Governance and organisation and in particular the concept of an Service Inventory (Thomas Erl has this as a key feature of the SOA world.  see www.soapatterns.org   or  www.soapatterns.com  the latter has podcasts you can listen to.)  The analogy here is that it is the "Kitchen" in which the SOA environment / Ecosystem would exist.   The basic difference is that we need to build a logical Service contract library in design time and the run time expression of this is typically in a traditional SOA  the Registry and repository (can be multiple).  SOA does not need these strictly to work as you can still do loose coupling as with WOA  and REST and still recognise this as a SOA concept albeit a light weight implementation of it.

The title god’s kitchen is in reference to the trance music that has the same name sake and the myriad of music and many blended mixes that constantly remind me of variation.  This made me  recall an earlier YouTube SOA “hit” which used the concept of music orchestration in SOA and now we see new music, new orchestration and new kitchens emerging in the clouds.    


Wednesday 11 March 2009

Applying SOA to testing

Some thoughts on criteria for SOA enabled Testing.

Key principles:

  •  Definition of the types of service tested e.g.  technical, IT service, Business process service tests
  • Definition of the SOA Inventory concept and the Service Life cycle idea.
  • Integration of  functional and non-functional combined under the Service contract test strategies
  • Definition of the granularity og testing in SOA is a holistic testing approach
  • Definition of testing assurance and  continuous testing and the management of the Service life cycle of testing. 
  • separation and specific treatment of security testing notably around Security policy and approach particularly with external testing services.
  • The business operating models for testing supported: there are at least 4 different testing business models to give the client options all underpinned by a common process:
  • Testing onsite
  • Testing off-site
  • Testing 3rd parties via direct and proxy
  • Testing onsite via remote secure access

  • Testing for SOA governance.
  • Specific and quantified benefits of SOA approach to testing e.g. faster speed of test          service , 10 to 50% reduction in time to test etc, positioned in relation to pricing , billing           methods and benchmarks to competitors.

 

SOA Testing functional capability list

  •   Elastic provisioning of the Dev and Test environment on a Cloud Utility service (Infrastructure as a service
  • Ability to offer a collaborative test cycle with the users / customer them selves ( a core element of SOA and Service orientation is to design for SLA)
  • Security provision for the Testing to be either off premise or on premise inside client firewall where security permits
  • Test life cycle with a service contract as a unit of test specification  passing through a SOA style service lifecycle - cradle to grave.
  • Definition of the Service Contract in the test strategy and test specification process. In particular a Use case style Test template script style
  • Test assurance cycle - how the service orientation is to govern the state and performance of the system but some kind of assurance service where requested  (not guaranteed service SLA  but some kind of gold, silver bronze service management rating with ITIL links.) This may include continuous testing cycles for assurance.
  • Test estimating process on Function point analysis based on Service contract unit of estimate
  • Abstraction of service testing from business testing and IT testing. A founding principle of SOA is to "test the test" of the logical service as conformant to a SOA governance model ideally.
  • Configuration and version management testing of service versioning.  This may include generational testing and isomorphic architecture testing principles to test families of services or type of delivery.
  • Legacy testing (non SOA managed) versus SOA testing - the differences and how it is handled.
  • Storage of the nomenclature of test environments and results.  How is the Test spec/script and results stored for reuse?
  • Contractual penalties testing and how this is differentiated from ordinary project delivery requirements testing  which may or may not be penalties driven - this is not SOA specifically but to show how the SLA  non-functional features are built into the SOA testing - SOA is based on integrated functional services with non-functional metric characteristics e.g.  "this sales order process is a service that operates to these performance and volume characteristics."   In traditional testing the volume and system testing is separate from the functional testing which is erroneous.
  • Billing mechanisms for testing in SOA is based on per use, per hour or per Service Function point test.
  • 3rd party testing - how do we test multiple vendors and parties SOA solutions and conformance.
  • Simulation and other types of SOA accelerators. 

Sunday 1 March 2009

Cloud Architecture Choices

Alternative views  abound in the topology of cloud architecture solutions that range from the type of technology to the patterns of federation and distributed services.  Six examples of these emerging approaches include:

 

·         A build out cloud versus heterogeneous cloud ecosystem strategies. A build out cloud example like Google  seeks a homogenous ecosystem approach that derives a consistency to the service framework.  Rightscale and Elastra are examples of multi-cloud management systems that seek to manage provisioning across different physical cloud environments such as AWS, GoGrid and other resource utility cloud sources.

 

·         Actor model versus god model  hypervisor resource management perspectives. One mandates a distributed set of hypervisors and contiguous flow of services. The other a monitor of monitors command and control perspective that seeks to connect to a resource pool oriented set of services federated between a number of parties.

 

·         Architectural semantic representation versus appliance container semantic representation standards development.  Approaches based on RDF, OWL and other ontology’s both open and proprietary seek to evolve a meta standards definition of the physical architecture configuration. Examples of this are seen in Elastra and Eucalyptus API and Architecture nomenclature standards development. The extensibility of this can move into the software layer as example by Salesforce.com and NetApps  meta model ontology’s for software customization for specific users of their SaaS.  The core axiom in this approach is to define the architecture as a meta specification that can be provisions as a configuration (this approach has parallels to the concept of disposable software and service provisioning of commodity solutions.)  The appliance approach is more a middle out approach that seeks to virtualize and containerize the workload itself, example of this is the OVF standard and vCloud initiative and vApp standard that seeks to create a standard for a virtual appliance. A key challenge in this approach is the extent of how aware the software application is to the virtual container and environment to best optimize the federated elastic qualities of the cloud. Ideally customers want software that can be portable across devices and resources and best use of distributed best sourced storage, compute and network integration resources. Today's n-tier architectures in Java , .Net and LAMP are largely blind to the physical separation of the web, application and database server architecture. The evolution of these as seen in Microsoft Azure  Services+Resources cloud approach will see new architecture standards and approaches to federated services in the cloud.

 

·         Stateless service representation versus stateful service representation strategy developments.  Examples of stateless management are the thin client services that we see in Facebook, Myspace and other on-demand services that are hosted for wide scale services.  The debate over stateful management is less clear in the cloud where the state representation of inflight transactions and long lived multi-actor collaborative services will require forms of composite services that manage the various event states of service users. The recent profile of Coghead, (now part of SAP) used BPM to manage some services in its framework. This edifice is a long suffering area of debate in Service orientation architecture around the level of service granularity and business logic encapsulation. Studying the boundary of federated security; the service API and how the business process orchestration will be provisioned inside a cloud service provider will drive out questions on state management from a business perspective.

 

·         Federated Marketplace versus Integrated Application as a Platform ecosystem approaches. The core concept in this is the idea of IT software applications that are built and pre-configured to run in  a on-demand hardware environment. By doing so the selection from a catalogue style interface and automated deployment and provisioning is in minutes rather than weeks and months.  The federated marketplace as seen in  examples of eTask  and to some extent Mechanical Turk by AWS is similar to the concept of reverse auctioning where the platform is a digital market of suppliers and customers which exchange services. The solution  is a hosting services platform environment bring together commercial exchange but does not contain any resources itself. largely what can be described as “eBay meets  the SI”.   The Integrated AasaP  is seen in examples of Saleforce.com where the development of software is designed and built to run in the PaaS environment.  Much like  the famous Apple itunes  Appstore , the application is a click and go interface that is available on demand.  The power of this approach is to grow an ecology of applications that are like a catalogue of functional solutions for a customer. (A variant of this is the application does not necessarily have to be hosted in the PaaS cloud and just the API is “posted” in the catalogue pointing to the external PaaS.)

 

·         Open Source versus proprietary meta tag browser and operating system architecture standards. An issue inherited from the open source Linux standard, Microsoft Windows and tag standards'  not all browsers are made equal in the current competing world of Mozilla Firefox, Google chrome, Apple Safari and Microsoft explorer browser market. Why this is important in the cloud architecture can be seen in the subtle differences in use of cloud hosted application services that support specific browser formats affecting the interoperable reuse of services between open source and proprietary platforms and the choice of form factor browser. It is expected this to continue on in the Google Android platform and Microsoft Azure platforms following the early Apple apps development platform entry as these play out into the mobile market space.

 

What is clear is that the evolving debate of homogenous versus heterogeneous cloud frameworks; what and where to invest in a cloud platform is still emerging and the interoperability and portability standards and capabilities is far from over in terms of wide scale adoption.

 

What is at stake and the prize is the integration of hardware, software and business as a utility of use. The exploitation of such technology is abstracted from the point of use to enable freedom of choice and mobility.   An aspiring goal of cloud computing is the speed of selection, provisioning and deployment of IT services increased manifold by the ability to define and deploy solutions into the “cloud”.