Service-interrupting events can and will happen in your environment. Your network could have an outage, your latest application push might introduce a critical bug, or you might someday have to deal with a natural disaster. When things go wrong, it’s important to have a robust, targeted, and well-tested DR plan for your resources in Google Cloud.

DR Planning Fundamentals

Disaster Recovery (DR) is contained as a subset of business continuity planning. The start of a DR plan can really be simplified by analyzing the business impact of two important metrics:

  • Recovery Time Objective (RTO) is the maximum length of time you find acceptable that your application can be offline. Your RTO value is typically defined as part of your service level agreement (SLA).
  • Recovery Point Objective (RPO) is the maximum length of time you find acceptable that your application could lose data due to an incident.

In most scenarios, the shorter the RTO and RPO values the more expensive your application will cost to run. Let’s look at a ratio of cost to RTO/RPO

Business impact analysis for business continuity: Recovery time requirements

As these smaller RTO and RPO values typically lead to greater complexity, the correlated administrative overhead follows a similar curve. A high-availability application might require you to manage distribution between two physically separated data centers, manage replication, etc.

It’s likely that you are also considering and planning for high availability (HA). HA doesn’t entirely overlap with DR, but it’s important to take HA into account when you’re planning for RTO and RPO values. HA helps to ensure an agreed level of operational performance, usually uptime, for a higher-than-normal period.

Google Cloud in Relation to RTO and RPO

GCP can often reduce the costs associated to RTO and RPO compared to their costs on-premises. \

On-premises DR planning forces you to account for the following requirements

  • Capacity: securing enough resources to scale as needed.
  • Security: providing physical security to protect assets.
  • Network infrastructure: including software components such as firewalls and load balancers.
  • Support: making available skilled technicians to perform maintenance and to address issues.
  • Bandwidth: planning suitable bandwidth for peak load.
  • Facilities: ensuring physical infrastructure, including equipment and power.

Google Cloud, as a highly managed solution, can help you bypass many of these on-premises requirements, removing many of the costs from your cloud DR design.

GCP offers these features that are relevant to DR planning, including:

  • Global network: Google backbone network uses advanced software-defined networking and edge-caching services
  • Redundancy: Multiple points of presence (PoPs) across the globe.
  • Scalability: App Engine, Compute Engine autoscalers, and Datastore give you automatic scaling
  • Security: The site reliability engineering teams at Google help ensure high availability and prevent abuse of platform resources.
  • Compliance: Google undergoes regular independent third-party audits to verify that Google Cloud is in alignment with security, privacy, and compliance regulations and best practices.

The Three Stages of Disaster Recovery Sites

A backup site is a location where you can relocate following a disaster, such as fire, flood, terrorist threat or another disruptive event. This is an integral part of the DR plan and wider business continuity planning of your organization.

  • A cold site is an empty operational space with basic facilities like raised floors, air conditioning, power and communication lines etc. Following an incident equipment is brought in and set up to resume operations. It does not include backed up copies of data and information from the original location of the organization, nor does it include hardware already set up.
  • A warm site is a compromise between hot and cold. These sites will have hardware and connectivity already established, though on a smaller scale. Warm sites might have backups on hand, but they may not be complete and may be between several days and a week old
  • A hot site is a near duplicate of the original site of the organization, with full computer systems as well as complete backups of user data. Real time synchronization between the two sites may be used to completely mirror the data environment of the original site using wide area network links and specialized software.

The terms cold, warm and hot can also be used within DR context to describe patterns that indicate how readily a system can recover when something goes wrong.

Creating Your Disaster Recovery Plan

These are the basic components when creating your DR plan.

  • Design to your recovery goals: look at your RTO and RPO values and which DR pattern you can adopt to meet those values. For example, if you have historical non-critical compliance data, you with a large RTO value, a cold DR pattern is likely fine.
  • Design for end-to-end recovery: It’s important to make sure your DR plan covers the full recovery process, from backup to restore to cleanup
  • Make Disaster Recovery (DR) Tasks Specific: If you need to execute your DR plan, each task should be concrete and unambiguous. For example, “Run the restore script” is too general. In contrast, “Open Bash and run ./restore.sh” is precise and concrete.

Applying Control Measures

Another important component when thinking of DR is how you can potentially precent a disaster before it occurs. For example, add a monitor that sends an alert when a data-destructive flow, such as a deletion pipeline, exhibits unexpected spikes or other unusual activity. This monitor could also terminate the pipeline processes if a certain deletion threshold is reached, preventing a catastrophic situation.

Making Sure Software is Configured for Disaster Recovery

Part of the DR planning is to make sure your software is configured in the event a recovery is needed.

  • Verify software can be installed: Make sure that your applications can be installed from source or from a preconfigured image, licensing is available this these apps, and that any Compute Engine resources are available such as pre-allocating VM instances.
  • Think of the CD in CI/CD: The Continuous Delivery (CD) component of your CI/CD pipeline is integral to how you deploy applications. As part of your DR plan, consider how this will work in your recovered environment.

Security and Compliance Controls

Often with recovery we are just thinking of how to get our site back online with the least disruption. But don’t forget, security is important. The same controls that you have in your production environment must apply to your recovered environment. Compliance regulations will also apply to your recovered environment.

  • Make sure network controls provide the same separation and blocking from as your production environment offered. Think of Shared VPCs and Google Cloud Firewalls.
  • Replicate IAM policies to DR environment: IaC methods in Cloud Deployment Manager can help with this.
  • After you’ve implemented these security controls in the DR environment. Make sure to test everything.
  • Train your users on the DR environment and the steps in the plan.
  • Make sure DR meets compliance requirements: only those who need access have access, PII data is redacted and encrypted, etc.

Disaster recovery scenarios for Data

Disaster recovery plans should specify how to avoid losing data during a disaster. The term data here covers two scenarios. Backing up and then recovering database, log data, and other data types fits into one of the following scenarios:

  • Data backups: This involves copying od data in discrete amounts from one place to another, such as production site to DR site. Typically, data backups have a small to medium RTO and a small RPO.
  • Database backups: These are slightly more complex because they are often centered around a time component. When you think of your database, you might immediately think, from what moment in time is that data? Adopting a high-availability-first approach can help you achieve the smaller RTO and RPO values your DR plan will probably desire.

Let’s look at some different scenarios and how we could achieve a DR plan for these types.

Production Environment is On-Premises

In this scenario, your production environment is on-premises, and your disaster recovery plan involves using Google Cloud as the recovery site.

Data backup and recovery

  • Solution 1: Back up to Cloud Storage using a scheduled task
    • Create a scheduled task that runs a script or application to transfer the data to Cloud Storage.
  • Solution 2: Back up to Cloud Storage using Transfer service for on-premises data
    • This service is a scalable, reliable, and managed service that enables you to transfer large amounts of data from your data center to a Cloud Storage bucket.
  • Solution 3: Back up to Cloud Storage using a partner gateway solution
    • Use a partner gateway between your on-premises storage and Google Cloud to facilitate this transfer of data to Cloud Storage.

Database backup and recovery

  • Solution 1: Backup and recovery using a recovery server on Google Cloud
    • Backup your database to file backup and transfer to Cloud Storage Bucket. When you need to recover, spin up an instance with database capabilities and restore backup file to instance.
  • Solution 2: Replication to a standby server on Google Cloud
    • Achieve very small RTO and RPO values by replicating (not just a back up) data and in some cases database state in real time to a hot standby of your database server.
    • Configure replication between your on-premises database server and the target database server in Google Cloud

Production Environment is Google Cloud

In this scenario, both your production environment and your disaster recovery environment run on Google Cloud.

Data backup and recovery

A common pattern for data backups is to use a tiered storage pattern. When your production workload is on Google Cloud, the tiered storage system looks like the following diagram. You migrate data to a tier that has lower storage costs, because the requirement to access the backed-up data is less likely.

Conceptual diagram showing image showing decreasing cost as data is migrated from persistent disks to Nearline to Coldline

Database backup and recovery

If you use a self-managed database on Google Cloud such as MySQL, PostgreSQL, or SQL Server as an instance on Computer Engine, you will have similar concerns as with those same databases on-premise. The one bonus here is that you do not need to manage the underlying infrastructure.

A common pattern is to enable recovery of a database server that does not require system state to be synchronized with a hot standby.

If you are using a managed database service in Google Cloud, you can implement appropriate backup and recovery.

  • Bigtable provides Bigtable replication. A replicated Bigtable database can provide higher availability than a single cluster, additional read throughput, and higher durability and resilience in the face of zonal or regional failures.
  • BigQuery. If you want to archive data, you can take advantage of BigQuery’s long term storage. If a table is not edited for 90 consecutive days, the price of storage for that table automatically drops by 50 percent.
  • Firestore. The managed export and import service allows you to import and export Firestore entities using a Cloud Storage bucket
  • Spanner. You can use Dataflow templates for making a full export of your database to a set of Avro files in a Cloud Storage bucket
  • Cloud Composer. You can use Cloud Composer (a managed version of Apache Airflow) to schedule regular backups of multiple Google Cloud databases.

Disaster recovery scenarios for applications

Let’s frame DR scenarios for applications in terms of DR patterns that indicate how readily the application can recover from a disaster event.

  • Batch processing workloads: Tend not to be mission critical, so you typically don’t need to incur the cost of designing a high availability (HA) architecture. Take advantage of cost-effective products such as preemptible VM instances, which is an instance you can create and run at a much lower price than normal instances. (By implementing regular checkpoints as part of the processing task, the processing job can resume from the point of failure when new VMs are launched. This is a warm pattern.
  • Ecommerce sites: can have larger RTO values for some components. For example, the actual purchasing pipeline needs to have high availability, but the email process that sends order notifications to customers can tolerate a few hours’ delay. The transactional part of the application needs high uptime with a minimal RTO value. Therefore, you use HA, which maximizes the availability of this part of the application. This approach can be considered a hot pattern.
  • Video streaming: In this scenario, an HA architecture is a must-have, and small RTO values are needed. This scenario requires a hot pattern throughout the application architecture to guarantee minimal impact in case of a disaster.

 

Migrating a workload from your legacy on-premises environment to a cloud-native environment, such as a public cloud, can be challenging and risky. Successful migrations change the workload to migrate as little as possible during the migration operations. Moving legacy on-premises apps to the cloud often requires multiple migration steps

There are three major types of migrations that you can consider:

  • Lift and Shift
  • Improve and Move
  • Rip and Replace

Lift and Shift (Rehost)

Known as “Moving out of a data center”, the easiest of all workload migrations, lift and shift is the movement of your workload from your on-prem environment to the cloud with little to no modifications or refactoring. The only modifications that are necessary are those just required to get your applications working in the cloud environment.

Lift and shift migrations are best when the workload can still operate as-is in the cloud environment or where you have no business need for the change or where technical constraints won’t allow it any other way. This could be due to complicated source code that would be difficult to refactor.

On the down side, lift and shift migrations are considered non-cloud-native workloads that happen to be running in the cloud. These workloads don’t take full advantage of cloud platform features, such as horizontal scalability, more controlled pricing, and having highly managed services.

Improve and move (Replatform)

Known as “Application Modernization”, in an improve and move migration, you modernize much of your workload while migrating it. The idea is to modify the workloads to take advantage of cloud-native capabilities, as opposed to simply just trying to make them work in the cloud environment like we did with Lift and Shift. You can improve each workload for performance, features, cost, or user experience.

Improve and move migrations let your applications use features of a cloud platform, such as scalability and high availability. You can also architect the improvement to increase the portability of the application.

The downside here is that improve and move migrations take longer than lift and shift migrations, because they must be refactored in order for the applications to migrate.

Rip and Replace (Refactor)

Known as “Building in and for the cloud”, with rip and replace migration, you are not migrating your applications, but completely decommissioning them and rebuilding and rewriting them as a cloud-native app.

If your current applications are not meeting your goals, for example, you are sick of maintaining it or it would be too costly to migrate, or perhaps its not even supported on Google Cloud, you can do a rip and replace.

This migration allows your application to take full advantage of Google Cloud features, such as horizontal scalability, highly managed services and high availability.

However, rip and replace migrations can take longer than lift and shift or improve and move migrations. Further, this type of migration isn’t suitable for off-the-shelf applications because it requires rewriting the apps. You need to think about the extra time and effort to redesign and rewrite the apps as part of its lifecycle.

Migration Path

The goal with a cloud migration is to get from point A (where you are now on-prem) to point B (in the cloud). To get from A to B you can use any of the methods we just discussed.

The journey from A to B can be summarized as:

Assess

Perform a thorough assessment and discovery of your existing environment in order to understand your app and environment inventory, identify app dependencies and requirements, perform total cost of ownership calculations, and establish app performance benchmarks.

  • Take Inventory: databases, message brokers, data warehouses, network appliances and dependencies. Machines + OS + specs
  • Catalog Apps: Mission critical, non-mission critical
  • Educate: Train and certify engineers on Google Cloud – frameworks, APIs, libraries
  • Experiment / POC: Run a bunch of POCs such as firewall rules, performance on Cloud SQL, play with Cloud Build, play with GKE Clusters
  • Calculate total cost of ownership: Google Cloud vs On-Prem. Which is cheaper? Use the Google Cloud price calculator
  • Choose what workloads to first migrate: Non business critical, dependency-light workload, requires minimal refactoring

Plan

Create the basic cloud infrastructure for your workloads to live in and plan how you will move apps. This planning includes identity management, organization and project structure, networking, sorting your apps, and developing a prioritized migration strategy.

  • Establish Identities: Google Accounts, Service Accounts, Google Groups, Google Workspace Domains, Cloud Identity Domains
  • Design Resource Organization: Organizations, Folders and Projects
  • Define hierarchy: Environment-oriented, function oriented or granular access-oriented
  • Define groups and roles for resource access:
    • Org Admin: IAM policies
    • Network Admin: networks, subnetworks, Cloud Router, Cloud VPN, Cloud Load Balancing
    • Security Admin: IAM roles for projects, logs and resource visibility
    • Billing Admin: billing accounts, monitor resource usage
  • Design Network Topology / Establish Connectivity: Create VPC(s), cloud interconnect/peering/cloud VPN/Public internert

Deploy

Design, implement and execute a deployment process to move workloads to Google Cloud. You might also have to refine your cloud infrastructure to deal with new needs.

  • Fully manual deployment: do everything from provision, configuration and deployments manually
  • Configuration Management (CM) Tools: Deploy in automated, repeatable way. Is a bit complicated.
  • Container Orchestration: GKE to orchestrate workloads
  • Deployment Automation: CI/CD Pipeline to automate creation and deployment of artifacts
  • Infrastructure as Code (IaC): Terraform or Deployment manager

Optimize

Begin to take full advantage of cloud-native technologies and capabilities to expand your business’s potential to things such as performance, scalability, disaster recovery, costs, training, as well as opening the doors to machine learning and artificial intelligence integrations for your app.

  • Build and train your team: Train deployment and operation teams to know new cloud environment.
  • Monitor everything: Cloud Logging and Cloud Functions, Prometheus, Cloud Monitoring alerting
  • Automate everything: Automate critical activities such as deployments, secrets exchanges, and configuration updates. Automating infrastructure with Cloud Composer and Automating Canary Analysis on Google Kubernetes Engine with Spinnaker are examples of automation on Google Cloud.
  • Codify everything: Infrastructure as Code and Policy as Code, you can make your environment fully auditable and repeatable
  • Use managed services instead of self-managed ones: Cloud SQL for MySQL instead of managing your own MySQL cluster, for example.
  • Optimize for performance and scalability: Compute Engine autoscaling groups, GKE cluster autoscaler, etc.
  • Reduce costs: analyze your billing reports to study your spending trends, etc.

More information can be found here: https://cloud.google.com/architecture/migration-to-gcp-getting-started

 

CI/CD is the combined practices of continuous integration (CI) and either continuous and continuous deployment (CD). CI/CD is designed to bridge the gap between development and operation activities and teams by enforcing automation in building, testing and deployment of applications. Modern day DevOps practices involve continuous development, continuous testing, continuous integration, continuous deployment and continuous monitoring of software applications throughout its development life cycle. The CI/CD pipeline forms the backbone of modern day DevOps operations.

The main goal of this automation pipeline within your business is to be able to deploy your application into different environments such as Dev/QA/Production without manual intervention. This automation reduces the risk of errors during deploying, reduces the number of hours for deploying code changes in multiple environments, and helps to deploy the changes more frequently in development and QA environments as soon as possible after changes are made. The methods a CI/CD pipline allows you to apply are:

  • Version control of source code.
  • Automatic building, testing, and deployment of apps.
  • Environment isolation and separation from production.
  • Replicable procedures for environment setup.

Creating a CI/CD Pipeline on Google Cloud

Let’s look at how you can set up a continuous integration/continuous deployment (CI/CD) pipeline for processing data by implementing CI/CD methods with managed products on Google Cloud. The Google Cloud tools we will to build the pipeline is Google Cloud Build. A typical CI/CD setup will look like:

First, the developer checks in the source code into GitHub (any Version Control System is ok). Next, GitHub triggers a post-commit hook to a Cloud Build. The Cloud Build then builds the container imager and pushes it to the Container Registry. Cloud Run is then notified to redeploy and Cloud Run pulls the latest image from Container Registry and runs it.

To build a simple pipeline in Google Cloud using Cloud Run, we will go through the high level steps.

  1. Create a Dockerfile with the necessary steps to build your container image, such as line that might install Tomcat: “RUN wget https://apache.mirrors.nublue.co.uk/tomcat/tomcat-8/v8.5.54/bin/apache-tomcat-8.5.54.tar.gz” and pulls the source code for the application from your git repo.
  2. Create a gcpbuild.YAML file that will build the docker image in GCP, push the container image into the Google Cloud Registry and then Deploy the image to Google Cloud Run.
  3. Go to Cloud Build and connect your Git Repository first.
  4. Now, create a Trigger
  5. Ensure Cloud Build has access to deploy to Cloud Run. For that go to settings and Enable the service account permission for Cloud Run.

Test Your CI / CD Pipeline

Now you are ready to test. Make a small change to your code and push it to your repository. This should trigger the cloud build:

After a little while, you’ll see your new container listed in Cloud Build:

And you should finally be able to see your new service deployed in Google Cloud Run:

That’s it! You have setup a simple CI/CD Pipeline for automation in Google Cloud!

Let’s look at some of the high-level methods required to migrate your website from a monolithic platform to a container-based microservices platform on GCP. The goal is to migrate your application one feature at a time, avoiding a single large-scale migration methodology.

Our goal here is to make the website more agile and scalable for each of these individual features. Each of these features can now be independently managed and updated, leading to faster improvements for each migrated feature.

Why Microservices?

Let’s look at some of the biggest advantages of microserviced applications. Most of these advantages now stem from the fact that each feature or microservice is loosely coupled to one another.

  • Microservices can be tested and deployed independently to one another. Typically, the smaller each deployment, the easier each deployment.
  • Each microservice can be written in its own language or framework. Because microservices communicate over a network via API calls, they do not all need to be written in the same language.
  • Microservices can also be tasked to different teams, making it easier to have a team dedicated to one or any related microservices.
  • Microservices teams have loosened dependencies on one another. Each needs to focus on their on making sure their APIs made available to the other services stay consistent, but beyond that do not need to worry about release cycles, how services are implemented, etc.
  • You can design more cleanly for failure. With clearer boundaries between your services, it can be easier to have a backup in place for that particular service.

Some of the disadvantages of microservices include:

  • The complexity of the design can increase as your app is not an interconnect of microservices over a network.
  • Security concerns can arise as your services now talk over a network. Products like Istio were developed to try and address these issues.
  • Performance can take an impact as data usually has to traverse a more complex route over the program’s microservices network.
  • System design can get more complicated, making understanding your application more difficult.

Migration Overview

Our goal is to get to a microservices environment from a single monolithic application. Let’s first take a look at the beginning and end of our journey.

First, the beginning of our journey is a monolithic website that runs on prem with dependencies on traditional database communications and the app servers running the application code. The important thing to note about the application is its design. Even if this were lift-and-shifted into the cloud on a GCP VM Instance, it would still be considered a monolithic application and the same migration principles would still apply.

The end result would look something like this after fully migrated to a microservices framework. You can see that each service is now running as an independent microservice container image on Google Kubernetes Engine (GKE) and traditional databases have moved to a Cloud SQL model and content in a Cloud Storage model. Note the application can still work with your datacenter via a Cloud Interconnect or VPN for backend services like CRM. The Cloud CDN is just there to help you distribute cached content more efficiently to your customers and the Cloud Load Balancing is there to help distribute the workload across resources in GCP. Apigee is a managed API gateway. Apigee is not necessary in this migration, but it’s recommended that all of your site’s content be served by public APIs. An API gateway like Apigee provides many features for API management, such as quotas, versioning, and authentication.

Preparing your Google Cloud Environment

Before you begin your migration journey, it is important to have your GCP environment setup, with a strategy defined on how services will be accessed, deployed, etc. Here are some of the major steps involved to get your cloud environment setup:

  1. Setup your Google Cloud Organization, the environment that will host your cloud resources. During this process you’ll setup your Google Workspace and Cloud Identity.
  2. Design your Google Cloud policies for control of the cloud resources. This is setting thigs like Network configuration and security controls, and Organizational security controls to meet requirements of your application.
  3. Design a method to deploy cloud resources, such as using Infrastructure as Code (IaC) to deploy to your GKE Clusters which will host your new microservices. Cloud Deployment Manager is a perfect tool for this and will give you standardized, reproducible, and auditable environments.
  4. Prepare your GKE environment for production and harden your cluster security. This is things like thinking how your clusters will be load balanced across regions and disabling public endpoint access.
  5. Build your continuous integration/continuous delivery (CI/CD) tooling for Kubernetes. You can use Cloud Build to build your container images, and Container Registry to store them and to detect vulnerabilities.

Migration Step-by-step Approach

Each feature of the website should be migrated one by one to the new environment, created microservices where it makes sense. These new microservices can call back to the legacy system when needed. The idea is to transform one major migration and refactoring project into several smaller projects. The advantages of migrating this way are:

  • The smaller projects have a more definable bound and will be easier to get movement on than one grand migration project. If we do it all at once, we’d need to get all the teams involved and have everyone understand all the interactions between systems, 3rd party dependencies, etc.
  • These smaller projects give us lots of flexibility. Smaller projects mean smaller teams, and this way they can be tackled one by one without anyone getting overwhelmed. You could also parallelize some of the work leading to a faster migration.

Before you start migrating any particular feature to a microservice, it’s most important that you take into account dependencies between features and what relies on what. This will help you formulate a chronology of events as some features make more sense to migrate before others.

Let’s look at a shopping cart example and what the journey currently looks like within the monolithic application and the dependencies this makes us aware of:

  1. A user browses your site and clicks “Add to cart” on an item they like. This triggers an API call from their browser to the shopping-cart feature. This is a first dependency you need to pay attention to: the front-end is acting on the shopping cart
  2. When the shopping-cart receives the API call, the shopping cart does an API call to the system that handles stock. This is the second dependency: the shopping cart depends on the stock system.
  3. If it’s in stock, now this is stored in a database such as “user A has 1 instance of X in cart.” This is the third dependency: the shopping cart needs the database to store this information
  4. When the user finally checks out and pays, the shopping cart is queried by the payment subsystem to compute the total. This is the fourth dependency: the shopping cart is queried by the payment subsystem.

Taking these dependencies into consideration, our migration to a microservice would look like this:

  1. Create a new microservice that implements your shopping-cart API. Use Firestore to store the shopping cart data. Make sure this new microservice can call the stock subsystem (see dependency 1 and 2 above).
  2. Create a script that can be rerun as needed that copies shopping carts from the legacy shopping-cart system and writes them to Firestore.
  3. Create a similar script that does the same thing, but the other way around: it copies Firestone carts back to your legacy system. This is just in case you need to roll-back.
  4. Expose the shopping cart API with Apigee
  5. Modify the frontend and payment subsystem so they call this new shopping cart microservice rather than the legacy one.
  6. Run the script from step 2.

You may also want to test this in a non-production website environment first and then replicate the results to the production system when you have it working correctly. Your shopping cart feature is now a microservice hosted on GCP.

For a deeper breakdown of this process, see https://cloud.google.com/architecture/migrating-a-monolithic-app-to-microservices-gke

 

 

Leveraging leadership and people management best practices like re:Work, Site Reliability Engineering and BeyondCorp, Google developed a framework for adopting the cloud, know as The Google Cloud Adoption Framework. This framework works with people, processes and technology, allowing you to see where you are now in your cloud adoption journey, and get your where you want to be. You can use the framework to make an assessment of your readiness for the cloud and what you will need to development to get there.

For Google’s official look at the Cloud Adoption Framework, with a deeper dive on each of the components, see their whitepaper here: https://cloud.google.com/adoption-framework

Four Themes of Cloud Adoption

To bring your organization into the cloud, there are four themes you will need to excel in, regardless of your business objectives.

  1. Learn – The value and scale of your learning programs that you have in place to enhance to skillset of your technical teams. It also refers to your ability to supplement your technical teams with the right partners.
  2. Lead – The degree to which your technical teams are supported from leadership to migrate to the cloud. Additionally, we need to consider how cross-functional, collaborative, and self-motivated these teams are.
  3. Scale – The degree to which you will use cloud-native services which will reduce operational overhead and automate manual processes and polices.
  4. Secure– Your capacity to protect your cloud services from unauthorized access using a multilayered, identity-centric security model.

Three Phases of Each Theme

Each of the themes above will fall into one of these three phases.

Phase Description
Tactical You have individual workloads in place but no solid plan bringing them all together with a strategy that builds out towards the future.
Strategic You have a broader vision that brings together the individual workloads which are designed and developed with a concern for future needs and scale.
Transformational With your cloud operations now functioning smoothly, you are integrating data and insights learned from working now in the cloud.

The Cloud Maturity Scale

Once you have assessed what phase each theme falls into, you can begin to paint a picture of your cloud maturity. We can combine them as follows:

Learn Lead Scale Secure
Tactical Self-taught with 3rd party reliance. Teams by function with a central project manager Chance is slow and risky with operations still heavy Fear of public internet, but trust in private network
Strategic Organized training with assistance from 3rd parties. New cross-function cloud team Templates are allowing for reliable governance without manual review Central identity with a hybrid network
Transformational Peer learning and sharing. Cross-function featured teams and greater autonomy All change is constant, low risk, and quickly fixed Trust only the right people, devices and services

 

Fine-tuning Your Direction with Epics

Once you have found your position on the maturity map above, you can start to plan a path forward. The scope and structure of the program you will use for cloud adoption can be broken into workstreams, which Google refers to as epics. Epics are designed to not overlap one another, are aligned to manageable groups of stakeholders and can be further broken down into induvial user stories.

In summary, these are the three components of the framework Google Cloud uses to help you get to the cloud:

  1. Three Maturity Phases applied to the
  2. Four Adoption Themes
  3. Epics

Anthos is a fully managed hybrid cloud platform that enables you to run Kubernetes clusters in the cloud and on-premises environments. As an open cloud computing platform that works well with multi-cloud environments, it works across public clouds such as Microsoft Azure and AWS. In other words, as you work to containerize and modernize your existing application environment, Anthos allows you to do this on-prem or across any cloud provider. It does not force you to have to use the Google Cloud Platform to modernize.

History of Anthos

Because Anthos is a pretty broad product suit, Anthos means different things to different people. Before we get any deeper into Anthos, let’s take a journey on how Google got to Anthos. Over the last 10 years, there have been a number of fundamental technological shifts. These shifts have made it different in how we build our applications, and the way they are run in the cloud. Some of the new fundamental technologies that have helped shape Anthos include:

  • Cgroups: 2006, developed by Google as an early containerized implementation for Google’s internal software
  • Docker: 2013, built a lot of tooling around containers based on things learned from Cgroups. It’s used for deploying containerized software to one machine. As developers were making more and more containers as monolithic software was being reorganized into microservices, it became hard to orchestrate all these containers.
  • Kubernetes: 2014, Leveraging what Google had learned running Containers in Docker at scale, this was released. Kubernetes is used for deploying containerized software to multiple machines and is now the standard way for running containers at scale.
  • Istio: to help manage services in production the same way Google does. Allows you to deploy things like Site Reliability Engineering Practices.

Many of you are saying cloud is here to stay and an even larger portion believe multi-cloud (multiple cloud vendors being used) and Hybrid (on-prem and in cloud) is key and have multi-cloud plans, yet most company’s applications remain on-prem, with a small fraction of workloads having been moved to the cloud. Many of you have also made big investments into your on-prem infrastructure and datacenters and, if moving to the cloud, want to still leverage your on-prem investments as you move incrementally to the cloud. Additionally, for those of you that did move your applications to the cloud, not all were successful, and some of you had to roll-back to your on-prem environments.

As you tend to try and modernize your applications, and pull apart your monolithic services and try to make microservices out of them, you are moving to the cloud at the same time and things can get complicated. This is full of risk as you are reengineering your application, familiarizing yourself with new tools and processes, and trying to discover new cloud workflows at the same time.

Introduce, Anthos

Anthos was developed in 2019 to try and meet you where you currently stand, to help you try and modernize where you are, in-place within the on-prem setup, in the datacenter, before you move to the cloud. That way, by the time you want to move to the cloud, things are already set; you’ve done the organizational work on-prem. Therefore, Anthos was developed to work in the cloud as well as in the datacenter.

Google's Anthos drives new storage and data management API

At a high level, Anthos is an application deployment and management tool for on-premise and multi-cloud setups. It accomplishes this while remaining entirely a software solution, with no hardware lock-in. From your standpoint, your infrastructure is abstracted away so you can focus on building your applications, not managing your infrastructure. As Anthos is built on open technologies, you can typically avoid vendor lock-in as well.

Anthos Tools

Anthos has different types of tools for different types of people within your organization. There are both open-source and hosted versions of many tool types as outlined in the table.

Type Open-Source Version Gold-Class Hosted Version
Developer Knative Cloud Run
Service Operator / SRE Istio Service Mesh (Istio on GKE)
Infrastructure Operator Kubernetes Kubernetes Engine (GKE)

When you work on-premises, the hosted versions of the software are brought to you via VMware vSphere.

Anthos Components

Let’s look at all the Anthos components:

When you first look at diagrams like this, you may be overwhelmed by the complexity of Anthos, but the more we delve into these components, the more familiar you will become. We will break this diagram down piece by piece to understand every component.

Kubernetes

Let’s spend a moment looking back at Kubernetes. Kubernetes are:

  • Container packaged (portable and predictable deployment with resource isolation)
  • Dynamically scheduled (Higher efficiency with lower operational costs)
  • Microservices oriented (Loosely coupled services that support independent upgrades)

Kubernetes uses the kubectl command to administer your cluster. Kubectl communicates with the master via a set of APIs and tells it to deploy the required containers, scaling type, etc.

Kubernetes Engine (GKE)

Kubernetes Engine was then developed as a managed Kubernetes platform that will run on GCP. It also includes:

  • Managed Kubernetes
  • Fast cluster creation
  • Automatic upgrade and repairing of nodes
  • Curated versions of Kubernetes and node OS

This made it much easier to run your Kubernetes clusters on GCP. With Kubernetes, it could take hours to spin up your infrastructure and get it to a Kubernetes-ready state. With GKE, there is an easy interface on the cloud console and after a few clicks, you will have a running Kubernetes cluster.

With GKE, you can still use kubectl to communicate with the master, but you are also given the cloud console to interface with the master from the cloud. This uses the same set of APIs to communicate with the Master as Kubernetes.

GKE On-Prem

Anthos introduces GKE On-prem and runs as an automatic deployment on top of vSphere within your environment. This allows you to not just run Kubernetes on your infrastructure, but GKE entirely, giving you all the GKE benefits on-prem. It offers:

  • Benefits of GKE, plus
  • Automated deployment on vSphere
  • Easy upgrade to latest Kubernetes release
  • Integration with cloud hosted container ecosystem

When using GKE On-Prem, there is now introduced an Admin Workstation and you can also use kubectl to talk to this new type of cluster called an Admin Cluster. The Admin Cluster is responsible for creating your clusters for you, in your environment. You can also continue to use kubectl to interface directly with your cluster as you had with Kubernetes and GKE.

GKE Hub (within the GKE Dashboard)

Even easier than continuing to use kubectl, you can also manage your on-prem GKE and other Kubernetes environments from the cloud console via the GKE Hub. GKE On-Prem clusters are automatically registered upon creation and the Hub gives you centralized management of hybrid and multi-cloud infrastructure and workloads. You are given a single view for all clusters in all of your estate.

Now, if you refer back to the Anthos Components diagram above, using what we know so far from Google Kubernetes Engine (GKE) and GKE On-Prem and GKE Dashboard, we can start to build out the diagram from the ground up. The Cloud Interconnect is just added to allow the two to communicate.

Service Mesh

Let’s continue to add to our diagram by discussion the Service Mesh. The Service Mesh provides a transparent and language-independent way to flexibly and easily automate application network functions. Another way to look a Service Mesh is to think of it as a network that is designed for services, not bits of data. It’s a layer-3 network that does not know what applications it belongs to and does not make network decisions based on your application settings.

The Istio Service mesh is an open framework for connecting, securing, managing and monitoring services and manages interactions between services. The mesh deploys a proxy next to each Service, and this allows you to make smart decisions about routing traffic, and enforcing security and encryption policies. This gives you:

  • Uniform observability
  • Operational agility
  • Policy-drive security

First, let’s look at a typical service architecture with our Service Mesh:

The problem with this configuration, is there is no way to enforce security policies between services. For example, even though it is not designed to do so, we do not want the Pictures service to be able to talk to the Auth service.

When deploying the Istio Service Mesh, we create a proxy that acts like the original Service Image, but instead is an Istio-enabled service where the communication to the image must go through the Istio proxy first:

With this deployment, no change in code was required as Kubernetes has deployed the service and simply provides smart decisions on how we route traffic.

The end result after Istio Service Mesh is enabled to the service architecture looks like this. You see they must all communicate through the proxy now:

You can also use Istio for Traffic Control, Observability, Security, Fault-Injection and Hybrid Cloud.

Just as Kubernetes has been turned in a managed service in the cloud with GKE, Istio has also has also been released as Istio on GKE (Beta). Istio and Kubernetes are independently released so they have independent versions. By using Istio on GKE, you are giving Google the trust to certify version compatibility between Istio and GKE. So when there is a new version of Kubernetes, there will be a matching version that Google will release of Istio that has been tested and certified to work. With Istio on GKE there are some additional adapters that were created to be able to plugin to other GCP products. Some of these Istio on GKE adapters are:

  • Apogee Adapter
  • Stackdriver (the destination Istio sends from telemetry data from the Istio Mixer component) – this gives you observability into Istio.

To continue our diagram, we now can include the Service Mesh components:

Anthos Config Management

Anthos Config Management was designed to make it easy to apply your policies between this heterogeneous of Multi-Cloud and On-Prem. Config Management lets you enforce guardrails for central IT governance. You can manage configuration of the tools in all of your clusters in one place. You don’t need a separate repository for on-prem and a separate one for any of your cloud providers. It becomes a single, auditable source of truth. It allows you to:

  • Sync configs across clusters on-prem and in the cloud
  • Continuous enforcement of policies
  • Security and auditability through policy-as-code.

As Config Manager acts as the Policy-as-Code you can have:

  • Git repository as source of truth where your policy gets checked into
  • YAML applied to every cluster
  • Integration with your SCM
  • Pre-commit validation

Now, updating our diagram, we can now include Anthos Config Management:

Knative

Knative is the efforts by google to bring serverless workloads into Kubernetes. Knative is the open-source serverless framework that builds, compiles, and packages code for serverless deployment, and then deploys the package to the cloud. For some context, serverless computing itself is a cloud computing model in which Google allocates machine resources on demand, taking care of the servers on behalf of you. Serverless computing does not hold resources in volatile memory; computing is rather done in short bursts with the results persisted to storage. When an app is not in use, there are no computing resources allocated to the app.

Knative provides:

  • Building-blocks for serverless workloads on Kubernetes
  • Backed by Google, Pivotal, IBM, RedHat and SAP

Cloud Run

Remember that Knative is the open-source implementation of the serverless framework. Google has then taken Knative and provided a managed version of it called Cloud Run. Cloud Run is a fully managed serverless product, compatible with Knative. It provides:

  • Stateless containers via HTTP(s) requests
  • Built-in domain handling
  • Scales to zero – or as high as you need

Cloud Run is fantastic for application developers because if you have any stateless component in your application, you can package it up as a container and give the container image to Cloud Run. Cloud run is going to deploy it for you and give you a secure URL that you can map to your domain and it will even scale it for you. You don’t need to think about any of the rest. You just focus on your code. The contract with the Cloud Run container is simple:

  1. It is Stateless
  2. The port that you need to run it on is given to you as an environmental variable

Cloud Run can make the application modernization journey much easier, because at the end of the day this is just a container. Meaning you can write it in any language you want, use whatever tools you desire, use any dependencies you desire, put them into a container, and Cloud Run handles the rest. So, in terms of bringing in legacy applications that are not working on the latest versions of Java, Python, COBOL or whatever, Cloud Run supports all of it. At the end, Cloud Run is just a binary. It’ll allow you to bring your applications into the cloud journey but still work within your comfort zone using the skillset that you are most comfortable with.

Further, Cloud Run for Anthos integrates into the Anthos ecosystem. We did mention that with Cloud Run, if you give it the container image it will be automatically scaled. You may want more control than that and scaling to not be fully automated. Cloud Run for Anthos allows you to deploy your containers on your clusters, giving you freedom to run Cloud Run anywhere, such as your on-prem environment. Let’s again review the iterations we’ve discussed:

  • Cloud Run – Fully managed and allows you to run the instance without concern for your cluster
  • Cloud Run for Anthos – Deploy into your GKE cluster, running serverless side-by-side with your existing workloads.
  • Knative – Using the same APIs and tooling, you can run this anywhere you run Kubernetes, allowing you to stay vendor neutral.

Stackdriver

Discussed earlier when we were reviewing Istio adaptors, Stackdriver is a tool for visibility. When you enable Istio and GKE, the logs and telemetry from your Kubernetes cluster are sent to Stackdriver. This one tool gives you logs across your entire estate. It provides:

  • Logging
  • Monitoring
  • Application Performance Management
  • Application Debugging (Link Stackdriver to source code)
  • Incident Management

Marketplace

There are times when you don’t need to build everything yourself. You may have a specific job to do, and something like a standard Redis database would be perfect. The GCP Marketplace is a collaboration with Google and third-party vendors to certify software to be run on GCP. One of the places you can run that software is your Kubernetes cluster. And with Anthos, those clusters could be on-prem or in the cloud.

https://console.cloud.google.com/marketplace

Now, finally, we can complete our diagram as we saw it from the beginning, with all of our components for Anthos:

We’ve added Marketplace and Stackdriver in the middle. So now we should be able to understand all of this. We have

  • Google Kubernetes Engine and GKE On-Prem for container orchestration
  • Istio for security policies, traffic routing across the services in our estate
  • Anthos Config Management to make sure we can have a centralized place for governance and application policies and settings, keeping them consistent between on-prem and GCP.
  • Marketplace and Stackdriver to help us have a much better application experience

Migrate for Anthos

Up until now, everything we have talked about in Anthos has been centralized around the applications running as containers. Many of you still deploy your applications with Virtual Machines and if you have been looking to migrate to containers, the path is not necessarily straight forward. Migrate for Anthos literally takes your Virtual Machines and converts them into containers. This makes your container image much smaller than the virtual machine image as you no longer have the bulk of the operating system. Migrate also releases the operating system security burden for you, as once the virtual machine is containerized, the OS security is now handled on Google’s hosted infrastructure. The unique automated approach extracts the critical application elements from the Virtual Machine so those elements can be inserted into containers in Google Kubernetes Engine or Anthos clusters without the VM layers (like Guest OS) that become unnecessary with containers.

Use discovery tool to determine which applications might be a good fit for migration, as not all applications can migrate this method.

Let’s look at the Cloud Deployment Manager, an infrastructure automation tool from Google Cloud, Google’s Infrastructure as Code (IaC) offering, that automates the creation and management of Google Cloud resources. Infrastructure as Code (IaC) is the managing and provisioning of infrastructure through code instead of through manual processes like provision virtual machine instances in the cloud console.

With IaC, configuration files are created that contain our infrastructure specifics, which makes it easier to edit and distribute configurations. It also ensures that we provision the same environment every time.

This is the similar IaC offering to Amazon’s AWS CloudFormation, Microsoft’s Azure Resource Manager and the open-source IaC, Terraform (a cloud agnostic solution).

Cloud Deployment manager only works within Google Cloud Platform and has many similarities to the AWS CloudFormation template, albeit less mature.

Deployment is a Unit

One concept that is important to consider with Deployment Manager, is that the entire deployment is handled in a unit, called a deployment. In the AWS world with CloudFormation, this is known as a stack. For example, if your team’s development environment needs two virtual machines (VMs) and a database, you would define these resources in a configuration file (the unit), and use Deployment Manager to create, change, or delete these resources. You can make the configuration file part of your team’s code repository, so that anyone can create the same environment with consistent results.

Configuration File

The configuration file we will use to describe our resources is a YAML file, such as vmcreator.yaml. This file has a resources group where we define all resources within the unit. Each resource must contain three components:

  1. name: a user-defined string to identify this resource, such as quickstart-deployment-vm.
  2. type – The type of resource being deployed, such as compute.v1.instance.
  3. properties – The parameters for this resource type. They must match the properties for the type, such as disks.

Here is an example YAML configuration file from Google:

Supported Resource Types and Properties

As we noted, the type and properties within the YAML resource file need to be specific to supported values from Google. To get a list of supported types, goto: https://cloud.google.com/deployment-manager/docs/configuration/supported-resource-types

This document will show us a list of deployment types. In the list here you will note our example resource type, compute.v1.instance is listed as a valid type.

If you click Documentation for the type, you will see the supported properties. Note our example property, disks, is listed as a supported property.

How to Deploy Resources

Let’s look at deploying resources through the Google Cloud Shell. First, go to your Cloud Console and click Active Cloud Shell.

Once the Cloud Shell has been activated, click Open Editor and once opened, click Open in a new window

Create a New File and call it sample.YAML. Paste your YAML code into the file as seen here.

Click File -> Save

Now go back to the Cloud Shell, and run an ls to make sure you see your new file listed.

Next, find your project ID from the console dashboard as we will use it in our next command.

Now back in the console and type gcloud config set project <insert your project ID> to set the active project based on your project ID.

To deploy our YAML file, now type gcloud deployment-manager deployments create compute-gcs –config sample.YAML

The deployment should now be underway:

In the meantime, within the console we can open up Deployment Manager and note our deployment is underway:

And click the active deployment to view the status.

This console will also list any errors if they were to arise here.

You will also notice your resource should be created within the proper resource area, such as the virtual machine instance in Compute Engine:

 

Let’s look at the technology landscapes that are helping enterprises scale, adopt and modernize while they migrate to Google Cloud. We’ll review strategies and the tools to help you get there.

Cloud Migration Approach

Migrations are often considered for an extended period of time by enterprises as they fair considerations about moving their workloads over to a cloud environment. There is also the a difference in approach between small companies and large companies as they have different needs and migration paths. Let’s look at dissecting this process.

Why Move to the Cloud?

The first question to as during the cloud migration process is, why? Why are we going down this path? What is the business or technical goal you are looking to accomplish? Here are some common reasons why you might be moving to the cloud. You may want to:

  • Develop faster
  • Decommission workloads
  • Consolidate systems
  • Move VMs to cloud
  • Modernize by transforming into containers
  • Utilize the efficiencies and scaling the cloud offers.

These answers will vary greatly based on the size of your business and your business goals. Once the answer has been framed, you can begin to provide a clearer path on how to get from where your company is today to where you want to be.

What Are You Moving to The Cloud?

The next question to ask yourself is what? What do you have right now? For example, you will want to provide a:

  • A catalog of all the applications you have
  • The workloads that you are thinking about moving
  • Network and Security Requirements

This will help further build your migration strategy moving forward. Many businesses think they will need to build a very complicated system diagram, showing all the connections taking place or perhaps schematics showing the way all of our applications will work in the cloud. However, for many large organizations that have grown organically, this will not be feasible. As the what phase starts, sometimes all that is needed is to sit down with a napkin or sheet of paper and write a general overview of what apps, workloads, etc. you are looking to move. Later, when you need to expand this list, you will work with the various groups and lines of business that you have to find other resources and applications that you are concerned about.

What Dependencies Do You Have?

As we expand this further, we need to get more specific about our dependencies and start making lists of such things as:

  • Dependencies on application stacks
  • Database and message brokers
  • Underlying infrastructure
  • Firewall and security rules
  • Source code repositories

Often times, the “gotchas” happen as your business has grown organically and we see business unites popping out of the woodwork as the migration is moving along saying things like, “I’m actually keeping my source code over here and not in the official repository.” Overall, the more comprehensive your evaluation is ahead of time, the less headaches you will have during the migration.

Does Moving Everything to the Cloud Always Make Sense?

Sometimes there are cases where moving something to the cloud is not practical or might not be technically feasible in the near term. For example, maybe you have licenses that you can’t move to the cloud, your technology stack may not be virtualizable, you have 3rd party frameworks being used, or you have mainframes that need to stay independent. In these cases, it is OK to say NO! Rather, you want to focus what can be moved to the cloud. The last thing you want to do is force something into the cloud that doesn’t belong. Finally, you may also find that there is an interim path for one of your services that might not directly place it in the cloud, but aligns with your strategy. For example, if you are shutting down a datacenter that has a design too complicated to migrate to the cloud, you could move it to a co-location facility. This would allow you to gain some of the benefits, such as being closer to a cloud entry point, or getting that high-throughput or low-latency that you were looking for.

Choosing a Migration Path

There are a lot of ways to approach a cloud migration, such as an all-in-one lift in switch, a hybrid approach or private and public cloud. The answer to this will depend on what you are looking to accomplish and where you are coming from. If you are coming from legacy applications and hardware, you will likely have a much different migration path than if you are already cloud-native and just looking to scale. There could also be a scenario where you have an aggressive deadline to shutdown a datacenter, and do not have time to modernize. In this case, you would likely want to lift-and-shift your datacenter to the cloud and worry about modernizing or containerization strategies later on once the dust has settled.

Application Containerization Strategy

As a developer, containers can provide you a lot of freedom to package an app with all of its dependencies into an easy to move package.

One of the major moves during a cloud migration is deciding to containerize your applications rather than brining them back up into their own virtual machines. How do we know if an application is a good candidate for containerization? For example, you might have apps such as Dev Test Applications, Multi-Tier Stacks, LAMP applications, or a Java or web applications running on-premises. How do we know if these are good to containerize? There are a few questions we should ask.

  1. Is the app pre-packaged as a stand-alone binary or JAR file?
    1. Stand-alone binaries, such as EXE or JAR files are easy to containerize. Java and JAR files are especially flexible because the JRE can stay within the container.
  2. Is the platform on which your app is built available in a containerized version or package yet?
  3. Are any of your 3rd party apps available in a container version yet?
  4. Is the app stateless?
  5. Is your application already part of continuous integration/continuous deployment pipeline?

This may still leave us to question monolithic applications (think a monolithic SAP application), as many enterprises still use these. How might we convert these to a microservice environment that is more compatible with a containerization strategy? It may be possible to slowly break down the monolithic application into its subsequent services for a microservice strategy. It is also possible to containerize the entire application into one application container. This would allow for some of the containerization benefits such as fault tolerance and portability that containerization provides, without breaking down the monolithic application all at once.

Next, we want to look at what options are available to containerize your apps. In GCP, there are three main options, Google Kubernetes Engine (GKE), Cloud Run, and Compute Engine (GCE). Although the concepts underlying containers have been around for many years, Docker, Kubernetes and a collection of products and best practices have emerged in the last few years. This has enabled many different types of applications to be containerized. The solutions for running containers in Google Cloud Platform vary in how much of the underlying infrastructure needs to be exposed.

Google Kubernetes Engine (GKE)

As the inventor of Kubernetes, Google offers a fully managed Kubernetes service, taking care of scheduling and scaling your containers, while monitoring health and state. Getting your code to production on GKE can be as simply as creating a container deployment with the cluster being provisioned on the fly. Once running, these GKE clusters are secure by default, highly available, and run on GCP’s high-speed network. They can also be targeted for zonal and regional locations, and use specific machine types with the option of adding GPUs or Tensor Processing Units (TPUs). GKE Clusters also can provide you auto-scaling, auto-repair of failing nodes and automatic upgrades to the latest Kubernetes version. GKE is also a key player within Anthos, Google Cloud’s enterprise, hybrid and multi-cloud platform. Using Anthos, you can even migrate existing VMs directly into containers and move workloads freely between on-prem and GCP.

Cloud Run

It is also possible to shift your focus from building stateless apps, not on writing YAML files, and still deliver code packaged in a container. Cloud Run combines the benefits of containers and serverless. With cloud run, there is no cluster or infrastructure to provision or manage and any stateless containers are automatically scaled. Creating a Cloud Run service with your container only requires a few simple fields, such as name and location and choosing your authentication method. Cloud Run supports multiple requests per container and works with any language, library, binary or base Docker image. The result is serverless with pay-for-usage, the ability to scale-to-zero (can be reduced down to zero replicas when idle and brought back up if there is a request to serve), and out-of-the-box monitoring, logging and error reporting. Because Clour Run using Knative (offering serverless abstraction on top of Kubernetes), you can have a dedicated private hosting environment and deploy the same container workload on Cloud Run for Anthos in GCP or on prem.

Compute Engine (GCE)

It is also possible to use the Google virtual machine environment to run your containers. This means using your existing workflow and tools without needing to master cloud-native technologies. When you create the GCE virtual machine, there is a container section which will allow you to specify the image and other options the container will use. It is when you get to the boot disk section of setting up the VM, the suggested virtual machine OS is a Container-Optimized, which optimized for running Docker containers. This comes with Docker Runtime preinstalled.

Google Container Registry (GCR)

Where do these container images come from, where do you store them, how are they versioned, and how is access restricted to them? Google Container Registry (GCR), a container registry running on GCP. It is possible to push, pull and manage images in GCR from any system, VM instance or hardware. You then use it to control who can access, view and download those images. It also possible to deploy to GKE, Cloud Run, or GCE right from the registry. GCR works with popular continuous delivery systems, such as Cloud Build, Spinnaker, or Jenkins to automatically build containers on code or tag changes to repository.

System Containers

There are situations where you may want to go all-in on application containers, but due to technical requirements, you may want to explore system containers. System containers are similar to virtual machines, as they share the kernel of the host operating system and provide user space isolation. However, system containers do not use hypervisors. (Any container that runs an OS is a system container.) They also allow you to install different libraries, languages, and databases. Services running in each container use resources that are assigned to just that container.

Migrate VMs to Compute Cloud

After we have an application migration strategy defined, it is time to start thinking about a machine migration path. Migrate for Compute Engine (GCE) which allows one or many workloads to be moved in a unified way to GCE. Migrate also provides cloud testing and validation, including a plug-in that is available to find workloads and move them over. There is also the possibilities for a stateful rollback, so if at any point you feel like you need to get out the migration, there is the capability to roll-back to the on-prem environment. This can give you time to take a pause and see what is going on with a migration. There may also be a use-case where you need to maintain a VMWare-based control plane, there is support for VMWare and vSphere workloads that can be moved to GCP.

Unify On-Prem Applications

Some applications will need to stay on-premise, yet you may still want to take advantage of cloud-native capabilities. Anthos can be used to manage your on-premise and your hybrid-cloud or multi-cloud environments in one place. Anthos uses a tool called GKE On-Premise, which allows you to implement a containerization strategy in your on-premise environments. For example, you might use Cisco HyperFlex and use Anthos to develop a strategy for on-premise and cloud together. This can be used to simplify monitoring, logging, and config management and still get access to all the benefits of being on cloud. It’s as if Cisco HyperFlex was cloud-native but the underlying infrastructure is still on-premise.

Where to Start with Google Cloud Migration?

  1. Start by looking at the migration guides found at https://cloud.google.com. These are great at getting started guides that can help you think through what is required to migrate your business.
  2. Google also offers professional migrations services, known internally as PSO (Post Sales Department). For example, PSO can help leverage workshops to help out with the migration strategy.

 

Google Cloud Architecture

Role of the Google Cloud Architect

As a Google Cloud architect, we are here to help customers leverage Google Cloud technologies. You need to have a deep understanding of cloud architecture and the Google Cloud Platform. You should be able to design, develop, and manage robust, secure, scalable, highly available and dynamic cloud solutions to drive business objectives. You should be talented in enterprise cloud strategy, solution strategy and best architectural design practices. The architect is also experienced in software development methodologies and different software approaches that could span multi-cloud or hybrid environments.

Getting Started with Google Cloud Platform

To start with you own free account to help you along, start by going to https://cloud.google.com/ and click “Get Started for Free”. This will get you into the cloud console with some free money to play around with different services.

Cloud Resources, Regions and Zones

Prior to getting too deep into Google Cloud Platform, it is important that we understand resources, regions and zones. A resource is a physical asset such as a physical server or hard drive or virtual asset, such as a virtual machine that is sitting in one of Google’s datacenters. It is these resources that Google makes available to their customers that you then use to create your own instances, etc.

As I mentioned, these resources are all somewhere in a datacenter, and where that somewhere is, we call a region. So think about it, if google has a Physical server in some datacenter over in NY, then regardless of where you might be accessing that resource, it will be in an eastern US region. To further break up these regions (such as eastern US region), we isolate by zones. For example, there will be a us-east1-a, us-east1-b, us-east1-c, etc. zones, all in the US east1 region.

The point of having these regions and zones transparent to the customer is so they can choose where to place things physically in the cloud for redundancy in case of failure and reduced latency to your audience. Think about it, if Google never told you where that Virtual Machine you are creating was physically hosted, than maybe it’ll be in China, near a flood plain. If you were a US customer, wouldn’t you want to be able to choose that your Virtual Machine is on a US server, and if it were also on a flood plain, you could have a redundant copy of it in a separate isolated zone as well?

The Scope of Resources

This distribution of resources also introduces some rules about how these resources can be used together. The Global Scope shows us what resources can be accessed. Some resources can be accessed by any other resource across regions and zones. These global resources (name makes sense, doesn’t it?) include addresses, disk images, disk snapshots and firewalls, routes and cloud interconnects.

Other resources can only be accessed by resources that are located in the same region. These are called regional resources and include static external IP addresses, subnets, and regional persistent disk.

Further, some resources can only be accessed by resources that are in the same zone, and these zonal resources include Virtual Machine Instances, machine types available in that zone, persistent disks and cloud TPUs.

So, if you have a VM instance over in us-east1-a, can it talk to a VM instance over in us-east1-b? I’ll let you answer that. (hint, no).

Finally, when an operation is performed, such as “hey, I’m gonna go create a virtual machine instance”, this operation is scoped to either global, regional or zonal, depending on what that operations is creating.This is known as the scope of the operation. In our example, creating a VM instance is a zonal operation because VM Instances are zonal resources. Make sense?

Google Cloud Platform Console

Projects

Looking back at the https://console.cloud.google.com/ dashboard you created earlie. Any GCP resource that you allocate and use must get placed into a project. Think of a project as the organizing entity for what you are building. To create a new project, just click that drop down and select ‘New Project’.

Each GCP project has a project name, project id and project number. As you work with GCP, you’ll use these identifiers in certain command lines and API calls

Interacting with GCP via Command Line

Google Cloud SDK

To interact with GCP via command line, install the Google Cloud SDK at https://cloud.google.com/sdk. These include gcloud (the primary CLI), gsutil (python app for cloud storage), and Bq (python BigQuery tool).

Google Cloud Shell

You can also access gcloud from Cloud Shell. Within the console, click Cloud Shell in the upper right corner. This will provision an e2 virtual machine in the cloud that you can use from any browser to open a shell.

Google Cloud API

Google Cloud APIs are central to Google Cloud Platform, allowing you to access to storage, instances, machine-learning-based tasks and more from an API.

You can access could APIs in a few different ways:

  • From a server with Google client libraries installed.
  • Access for mobile applications via the Firebase SDKs
  • From the Google Clouds SDK or Google Cloud Console, using gcloud CLIs to manage authentication, local configuration, developer workflow, and interactions with Google Cloud APIs.

Google Cloud Platform Services

Let’s look at the most common Google Cloud Services is use.

Computing & Hosting Services

Google Cloud’s computing and hosting services allow you to work in a serverless environment, use a managed application platform, leverage container technologies, and build your own cloud infrastructure to have the most control and flexibility. Common services include:

Virtual Machines

Google Cloud’s unmanaged computer service is called Compute Engine (virtual machines. It is GCP’s Infrastructure as a Service (IaaS). The system provides you a robust computer infrastructure, but you must choose and configure the platform components that you want to use. It stays your responsibility to configure, administer and monitor the machines. Google will ensure that resources are available, reliable and ready for you to use, but it is up to you to provision and manage them. The advantage is that you have complete control of the systems with unlimited flexibility.

Introducing an easy way to deploy containers on Google Compute Engine virtual machines | Google Cloud Blog

You use Compute Engine to:

  • Create Virtual Machines, called instances
  • Choose global regions and zones to deploy your resources
  • Choose what Operating System, deployment stack, framework, etc. you want
  • Create instances from public or private images
  • Deploy pre-configured software packages with Google Cloud Marketpalce (LAMP stack with a few clicks)
  • Use GCP storage technologies, or use 3rd party services for storage
  • Autoscaling to automatically scale capacity based on need
  • Attach/Detach disks as needed
  • Connect to instances via SSH

Application Integration

Google Cloud’s application integration service is called App Engine. This is GCP’s Platform as a Service (PaaS) offering. App Engine handles most of the management of the resources on your behalf. For example, if your application requires more computing resources because traffic to your site increases, GCP will automatically scale the system to provide those resources. Additionally, let’s say the software required a security update. This is also handled for you automatically by GCP.

App Engine Application Platform | Google Cloud

When you build you app on App Engine:

  • You can build your app in several languages including Go, Java, Node.js. PHP, Python, or Ruby
  • Use pre-configured runtimes or use custom runtimes for any language
  • GCP will manage app hosting, scaling, monitoring and infrastructure for you
  • Connect with Google Cloud storage products
  • Use GCP storage technologies or any 3rd party storage
  • Connect to Redis databases or host 3rd party databases like MongoDB and Cassandra.
  • Use Web Security Scanner to identify security vulnerabilities.

Containers

With container-based computing, you can focus on your application code rather than infrastructure deployments and integration. Google Kubernetes Engine (GKE) is GCP’s Containers as a Service (CaaS) offering. Kubernetes is open source and allows flexibility of on-premises, hybrid and public cloud infrastructure.

GKE allows you to:

  • Create and manage groups of Compute Engine instances running Kubernetes, called clusters.
  • GKE uses Compute Engine instances as nodes in the cluster
  • Each node runs the Docker runtime, a Kubernetes node agent that monitors the health of the node and a network proxy
  • Declare the requirements for your Docker containers by utilizing simple JSON config files
  • Use Google’s Container Registry for management of Docker images
  • Create an external network load balancer

Serverless Computing

GCP’s Serverless Computing offering is known as Cloud Functions. It is Google Cloud’s Functions as a Service (FaaS) offering. It’s an environment for building and connection cloud services. Cloud Functions allows you to write simple, single purpose functions that are attached to events produced from your cloud infrastructure and services. A function is triggered when some event being watched is fired. The code within the function then executes in a fully managed environment (no infrastructure to manage or configure). Cloud Functions are typically written using JavaScript, Python, Python 3, Go or Java Runtimes in GCP. You can then run the function is any Node.js, Python 3, Go, or Java environments

You do not have stick with just one type of computing service. Feel free to combine App Engine and Compute Engine, for example, to take advantage of features and benefits of both.

Storage Services

Regardless of your application or requirement, you’ll likely need to store some media files, backups, or other file like objects. Google Cloud provides a variety of storage services.

Cloud Storage

Cloud Storage gives you consistent, reliable and large capacity data storage. Google Storage allows you to select standard storage, which gives maximum reliability, or you can choose Cloud Storage Nearline for low-cost archival storage or Cloud Storage Coldline for even lower cost archival storage. Finally, there is Cloud Storage Archive for the absolute lowest cost archival storage, which is ideal for backup and recovery or for data which you intend to access less than once a year.

Persistent Disks on Compute Engine

Persistent Disk on Compute engine is used as a primary store for your instances. It can be offered in both hard-disk based persistent disks (called “standard persistent disk”) and Solid-state persistent disks.

Google Cloud Persistent Disk Reviews 2021: Details, Pricing, &amp; Features | G2

Filestore

You can get fully managed Network Attaches Storage (NAS) in Filestore. You can use Filestore instances to store data from applications running on Compute Engine VM Instances or GKE clusters.

Database Services

GCP provides a variety of SQL and NoSQL database services. The first is Cloud SQL, which is a SQL database for MySQL or PostgreSQL databases. Next, Cloud Spanner is a fully managed, mission-critical relational database service. It offers transactional consistency at global scale, schemas, SQL, querying, and automatic synchronous replication for high availability. Finally, there are two main options for NoSQL Data storage: Firestore and Cloud Bigtable. Firestore is for document-like data and Cloud BigTable is for tabular data.

You could also setup your preferred database technology on Compute Engine running in a VM instance.

Networking Services

These are the common services under Google’s networking services.

Virtual Private Cloud

Google’s Virtual Private Cloud (VPC) provides the networking functionality to Compute Engine virtual machine (VM) instances, GKE, and the App Engine environment. VPC provides networking for cloud-bases services and is global, scalable and flexible. One key feature of VPC is that it is global. A single VPC can be deployed across multiple regions without communicating over the public internet. It is also shareable in that a single VPC within an entire organization can be shared. Teams can be isolated within projects with separate billing and quotas, yet still be within the same shared private IP space.

With a VPC, you can:

  • Set firewall rules to govern traffic coming into instances on a network.
  • Implement routes to have more advanced networking functions such as VPNs

Load Balancing

If your website or application is running on compute engine, the time might come when you are ready to distribute the workload across multiple instances.

With Network Load Balancing, you can distribute traffic among server instances in the same region, based on incoming IP protocol data such as address, port, or protocol. Networking Load Balancing can be a great solution if, for example, you want to meat the demands of increasing traffic to your website.

With HTTP(S) Load Balancing you can distribute traffic across regions, so you can ensure that requests are routed to the closest region or, if there were a failure (or capacity limitation), failover to a healthy instance in the next closest region. You could also use HTTP(S) Load Balancing to distribute traffic based on content type. For example, you might setup your servers to deliver your static content such as media and images from one server, and any dynamic content from a different server.

Cloud DNS

You can publish and maintain DNS records by using the same infrastructure Google uses. You can do this via the Cloud Console, command line, or REST APIs to work with managed zones and DNS records.

Advanced Connectivity

If you have an existing network that you want to connect to Google Cloud resources, Google cloud offers the following advanced connectivity

  • Cloud Interconnect: This allows you to connect your existing network to your VPC network through a highly available, low-latency, enterprise grade connection.
    Google Cloud Dedicated Interconnect
  • Cloud VPN: This enables you to connect your existing network to your VPC network via an IPsec connection. You could also use this to connect two VPN Gateways to each other.
  • Direct Peering: Enables you to exchange traffic between your business network and Google at one of Google’s broad reaching edge network locations
  • Carrier Peering: Enables you to connect your infrastructure to Google’s network edge through a highly available, lower latency connections by using service providers.

Big Data and Machine Learning

Let’s look at some of Google’s commonly used Big Data and Machine learning technologies.

Data Analysis

First, there is Data Analysis. GCP has BigQuery for implementing data analysis tasks. BigQuery is an enterprise data warehouse, providing analysis services. With BigQuery you can organize data into datasets and tables, load data from a variety of sources. Query massive datasets quickly and manage and protect data. BigQuery ML enables you to build and operationalize ML modems on planet-scale data directly inside BigQuery using simple SQL. BigQuery BI Engine is a fast, in-memory analysis service for BigQuery that allows you to analyze large and complex datasets interactively with sub-second query response time.

Batch and Streaming Data Processing

For batch and streaming data processing, Dataflow is a managed service for executing a wide variety of data processing patterns. It also provides a set of SDKs that you can use to perform batch and streaming processing tasks. It works well for high-volume computation, especially when tasks can be clearly divided into parallel workloads.

Machine Learning

The AI Platform offers a variety of machine learning (ML) services. You can use APIs to provide pre-trained models optimized for specific applications, or build and train your own large-scale, sophisticated models using a managed TensorFlow framework. There are a variety of Machine Learning APIs you can use:

  • Video Intelligence API: Video analysis technology
  • Speech-to-Text: You can transcribe through audio or a microphone using several languages
  • Cloud-Vision: Integrate optical character recognition (OCR), and tagging of explicit content.
  • Cloud Natural Language API: This is for sentiment analysis, entity analysis, entity-sentiment analysis, content classification and syntax analysis.
  • Cloud Translation: Translate source text into any of over a hundred supported languages.
  • DialogFlow: Build conversational interfaces for websites. Mobile apps and messaging platforms.

 

 

Infrastructure Modernization with Google Cloud

New businesses that have fully been developing in the cloud are rightfully challenging older business models. Scale is no longer a competitive advantage, but rather has become the common place. Many organizations are aware of this threat coming from new organizations and it is leading to digital disruption. These legacy organizations want to know,

  • “How do we best respond to this threat?”
  • “How can we survive and thrive in this new cloud era?”

Central to an ability to thrive in this era is the way in which IT resources and structured and utilized. For example, this could mean moving from a model where we invest resources to run and maintain existing IT infrastructure, to focus on creating higher products and services. Within the cloud, we can develop and build new applications to build better engagement with customers and employees (while being faster, more secure and scalable). In order for us to leverage cloud technologies to really transform our business requires new collaboration, changing culture and processes, and enabling team productivity and innovation. When we move to the cloud, we also see significant financial benefit, including a model where we pay for fixed capacity to paying for only for what we use.

For many of us, Infrastructure Modernization is the foundation for digital transformation.

Office Building, Ottawa

Modernizing IT Infrastructure

For most organizations, owning and operating IT infrastructure on-premises does not differentiate our business, but rather can be a burden that limits our staff in several different ways. For example, we have to take on arduous tasks related to infrastructure procurement, provisioning and maintenance. Scale can also not easily be added as we are locked into what we have on-premises and we are also forced to over-pay for performance to plan for peak usage. One we can solve this is to out-source our IT resources and migrate to the cloud.

Co-Location: how did we make our resources more efficient before the cloud?

Traditionally, we housed all our IT infrastructure within our own buildings, which means we had to pay for real-estate, security and staff to operate and maintain the equipment. Prior to the cloud, we could make this model more efficient by sharing our data centers with other companies in what is known as co-location. Here, one company sets up a large data-center and other businesses, including ours would rent part of that data-center. This means we no longer had to pay for costs with hosting our infrastructure, but we still needed to pay to maintain it.

Server infrastructure in Irvine CA

With both on-premises and co-location models, value benefits only begin well after substantial amounts of capitol expenditure is dedicated.

Virtual Machines: how did we better utilize our hardware?

Even in the co-location model, our hardware servers were often heavily underutilized, we started to package operating systems and applications into a virtual machine (VMs). VMs share the same pool of physical processors, storage and network resources. This became a more and efficient and manageable way to utilize our hardware. Most of us remember using VMs to maximize our hardware regardless if we were on-premises or co-located. The problem, however, is that there is still a physical cap which is related to the capacity of the physical servers at the location, and there is still much upfront capital expenditure.

Infrastructure as a Service (IaaS)

Many of us are now outsourcing our infrastructure entirely. We are growing our services to deliver products to customers both regionally, and sometimes globally, therefor we need to scale securely and quickly. If we were to continue to try to use on-premises or co-location this would be very expensive. Why own our datacenters if we can out-source to a public cloud that offers Infrastructure as a Service (IaaS)? Now we no longer have the overhead of physical equipment and datacenters to maintain and can shift our costs from capital expenditure to operational expenditure. Public cloud providers such as Google Cloud, provide several services to help us modernize our infrastructure. We can choose to move all or just some of our infrastructure away from physical data centers to virtualized datacenters in the cloud. Google Cloud provides us compute, storage and network resources in a way that is familiar to how we handled these in our physical datacenters. Cloud Service Providers such as Google now also handle maintenance work as well. With this shift to out-sourcing the physical costs and maintenance from our datacenters, we now have money to focus on processes and applications that move our business forward.

Platform as a Service (PaaS)

Outsourcing physical resources to IaaS gives us plenty of flexibility but requires our teams to continue managing operating and application security such as web application security. If we wanted an even greater managed service, Cloud Service Providers offer Platform as a Service (PaaS). In this case, we no longer manage the infrastructure and in some cases we only pay for what we use.