Automating Data Transfer Between Cloud Storage Buckets on Google Cloud Platform

Discover how to streamline your data management by automating the transfer of data between Cloud Storage buckets on the Google Cloud Platform (GCP) using Cloud Functions and Cloud Pub/Sub.

Introduction

In a world increasingly driven by data, efficient management of data storage and transfer is paramount, especially for organizations leveraging cloud solutions like Google Cloud Platform (GCP). This article provides a comprehensive guide on automating data transfer between Cloud Storage buckets in GCP, a common task that can be simplified using Cloud Functions and Cloud Pub/Sub for improved data handling and operational continuity.

Understanding the Scenario

Let’s consider a situation where an organization requires regular transfer of newly uploaded data from one Cloud Storage bucket to another for processing or backup purposes. Manual handling of this process can be time-consuming and prone to human error, necessitating an automated solution.

Setting up the Environment

Before we dive into the solution, ensure that you have a Google Cloud Platform account and the gcloud command-line tool installed and configured. Additionally, create two Cloud Storage buckets (source and destination).

  1. Log into your GCP console.
  2. Navigate to Cloud Storage and create two buckets: source-bucket and destination-bucket.

Automating Data Transfer with Cloud Functions

The automation process involves creating a Cloud Function triggered by Cloud Pub/Sub to detect when new files are uploaded to the source bucket and subsequently initiate a transfer to the destination bucket.

Step 1: Setting up Cloud Pub/Sub Notification for the Source Bucket

First, create a Cloud Pub/Sub topic that the Cloud Function will subscribe to:

gcloud pubsub topics create my-topic

Then, configure the source bucket to send notifications to this topic:

gsutil notification create -t my-topic -f json gs://source-bucket

Step 2: Creating the Cloud Function

Navigate to the Cloud Functions section in GCP console and create a new function with the following settings:

  • Name: transfer-data-function
  • Trigger: Cloud Pub/Sub
  • Topic: my-topic
  • Runtime: Python 3.7

In the inline editor, paste the following Python code:


def transfer_data(event, context):
    from google.cloud import storage

    # Initialize the GCP Storage client
    storage_client = storage.Client()

    # Extract the file information from the event
    file_data = event['data']
    bucket_name = file_data['bucket']
    file_name = file_data['name']

    source_bucket = storage_client.bucket(bucket_name)
    destination_bucket = storage_client.bucket('destination-bucket')

    # Copy the file from the source bucket to the destination bucket
    source_blob = source_bucket.blob(file_name)
    destination_blob = destination_bucket.blob(file_name)

    # Perform the copy operation
    source_blob.copy_to(destination_blob)

    print(f"Transferred {file_name} from {bucket_name} to destination-bucket.")
    

Deploy the function by clicking “Deploy”.

Testing the Solution

To test the automated data transfer, upload a file to the source bucket:

gsutil cp myfile.txt gs://source-bucket

Once uploaded, the Cloud Function will automatically be triggered, and the file should be copied to the destination bucket shortly. Verify the transfer by listing the contents of the destination bucket:

gsutil ls gs://destination-bucket

If the setup was successful, you will see myfile.txt listed in the destination bucket.

Conclusion

Automating data transfer between Cloud Storage buckets on the Google Cloud Platform simplifies data management, reduces the potential for human error, and enhances operational efficiency. This guide has demonstrated how to leverage Cloud Functions and Cloud Pub/Sub to achieve seamless data transfers. By customizing and expanding upon this solution, organizations can significantly improve their data handling processes.

</>

Let’s cover some basic items that will give more clarification for success in the RRK portion of the Google Interview.

The Role Related Knowledge portion of the interview is Google’s interest in how your individual strengths combine with your experience to drive impact. Google is not just looking for how you can contribute today, but how you can grow into different roles, including one’s that haven’t even been invented yet.

Make sure to review your Job Description and assure you have studied up on items in the description. A good way to put this together is to:

  • Highlight Keywords / Phrases
  • Create a Streamlined Overview
  • Compile / Create Practice Interview Questions
    • How do you identify risk in a program?
    • Tell me about a time you identified risk in a program.

Here are some other key items to think about on how you will be evaluated during the RRK Portion of the interview process.

  • Individual Strengths – Tying your strengths to the specific job you are interviewing for. The strengths component shows up on the open-ended, high-level, generic level. For this, try and remove your strengths from your career and talk about them at a higher level.
  • Experience to Drive Impact – When you think about your examples, the more directly that example correlates with that position, the better it is. But make sure to focus on your greatest strengths.
  • Combine – How are we making sure we are staying consistent? Pull generic content out of your examples and tie those back to the position you are interviewing for to give role-alignment. This can see high, level and theoretical but just try and combine the behavioral with the high-level generic and this will give you better success.
  • Future Contribution – Google doesn’t just look for how you contribute today. For all roles at Google, you must pass the hiring committee. To determine if you get hired, the committee will look at your fit not only for the role, but also as a contributor for the future. This is constantly discussed in hiring committees. You can help display depending on how you solve vague questions, etc. (questions that have nothing to do with anything.
  • Growing into your role – Do you have a growth mindset? You can show this by talking about:
    • How you develop your skills
    • How you work hard
    • How you learn from mistakes
    • Embrace challenges
    • Welcome feedback
    • Celebrate other people’s success

 

 

The two main types of modern databases to choose from are relational and non-relational, also known as SQL or NoSQL (for their query languages). There are a few main differences to be familiar with when deciding which database works best for your needs.

Relational Databases (also known as RDBMS and SQ Databases)

Relational databases (RDBMS) have been around for a good 45 years. In the past, they’ve worked well, for the times when data structures were much simpler and more static. In a relational database, you are required to define your schema before adding data to the database. Relational databases are table-based, and were built during a time that data was mostly structured and clearly defined by their relationship

Examples include MySQL, Microsoft SQL Server, Oracle, MongoDB, Redis
What is a Relational Database? - YouTube

NoSQL

As informational and big data applications advanced, the traditional relational or SQL-based database was couldn’t really handle rapidly expanding data volumes and the growing complexities of data structures. Over the past 15 years, the non-relational, NoSQL databases became more popular for offering a more flexible, scalable, cost-efficient, alternative to the traditional SQL-based relational databases.

NoSQL databases feature dynamic schema, and allow you to use what’s known as “unstructured data.” This means you can build your application without having to first define the schema. Not needing a predefined schema makes NoSQL databases much easier to update as data and requirements change. Changing the schema structure in a relational database can be extremely expensive, time-consuming, and often involve downtime or service interruptions. NoSQL databases can be document based, graph databases, key-value pairs, or wide-column stores. NoSQL databases are designed to handle the more complex, unstructured data, (such as texts, social media posts, photos, videos, email) which increasingly make up much of the data that exists today.

Relational databases are vertically scalable but typically expensive. Since they require a single server to host the entire database, in order to scale, you need to buy a bigger, more expensive server. Scaling a NoSQL database is much cheaper, compared to a relational database, because you can add capacity by scaling horizontally over cheap, commodity servers.

Google Cloud datastore is a highly scalable low latency NoSQL database. It is built on top of Bigtable and Google Megastore. It provides the scalability of a NoSQL database and features of a relational database providing both strong consistency guarantee and high availability.

SQL/NoSQL architectures

  • Document databases store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general-purpose database. They can horizontally scale-out to accommodate large data volumes. MongoDB is consistently ranked as the world’s most popular NoSQL database according to DB-engines and is an example of a document database.
  • Key-value databases are a simpler type of database where each item contains keys and values. A value can typically only be retrieved by referencing its key, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don’t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Redis and DynamoDB are popular key-value databases.
  • Wide-column stores store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores.
  • Graph databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Neo4j and JanusGraph are examples of graph databases.

BigData

Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. Big Data analytics examples includes stock exchanges, social media sites, jet engines, etc. It is natural to host a big data infrastructure in the cloud, because it provides unlimited data storage and easy options for highly parallelized big data processing and analysis.

GCP Platform provides multiple services that support big data storage and analysis. Possibly the most important is BigQuery, a high-performance SQL-compatible engine that can perform analysis on very large data volumes in seconds. GCP provides several other services, including Dataflow, Dataproc and Data Fusion, to help you create a complete cloud-based big data infrastructure.

 

GCP has a rather robust and highly available architecture (24 regions and 73 availability zones) as it is. Compute Engine implements an abstraction layer between the availability zones and physical clusters of machines within the data centers where each cluster of physical machines has its own independent software, power, cooling, network and security infrastructure, as well as its own computing and storage resources.

We will review the Compute Engine and Cloud Database (Cloud SQL) high availability features available.

Google Compute Engine High Availability

When designing your Compute Engine applications, we want to have them tolerant of errors, network failures, and unexpected disasters. This will help minimize failures within your application. Designed properly, your applications should be able to handle errors correctly, for example by redirecting traffic from the failed instance to the active instance and be able to run automatically after a restart.

To design for high availability, create virtual machine instances across at least two availability zones located in two regions. This ensures that even if a zone or an entire region goes down, your application can continue working. If all of your instances are hosted in the same zone or region, your application will not be resilient to failure of Google infrastructure.

Instance Groups

GCP allows you to put your instances in Instance Groups. This is a group of instances that are designed for a common purpose and are designed to be used with load balances to route traffic between the instances. Instance groups also provide:

  • Autoscaling – automatically scale the number of VM instances in the group if demand (load) increases
  • Autohealing – If VM instance is unhealthy, automatically recreate it
  • Support for Multiple Zones – You can create instance groups across zone in the same region.

Load Balancing

Managed load balancing helps manage high volumes of traffic and can keep from a particular VM instance from getting overloaded. The load balancer provides:

  • Network Load Balancing: With forwarding rules – deploy applications across multiple regions (using regional Instance Groups) and have rules to distribute traffic to all VMs in that region. Each forwarding rule can stem off of just one external IP address so your users always appear to be accessing the same site.
  • Global Load Balancing, With HTTP(S) Load Balancing you can distribute traffic across regions, so you can ensure that requests are routed to the closest region or, if there were a failure (or capacity limitation), failover to a healthy instance in the next closest region. You could also use HTTP(S) Load Balancing to distribute traffic based on content type. For example, you might setup your servers to deliver your static content such as media and images from one server, and any dynamic content from a different server.

Startup and Shutdown Scripts

Virtual Machine instances can have start-up and shutdown scripts associated to them that are initiated when the instance starts or stops. You could use such a script to back data up, install software. The scripts run in any event an instance is shut down—even if it was unintentional.

The scripts can then be used to create a bootstrap procedure and shut them down cleanly. Instead of using a custom image to configure the instance, you can use a startup script. After each restart, the startup script runs and can be used to install or update software, and ensure the appropriate services are running.

The shutdown script can perform important actions like closing connections, saving state of transactions and backing up data.

Cloud SQL High Availability

Google Cloud SQL is a managed relational database service that supports database engines including SQL Server, MySQL and PostgreSQL, and can connect with most applications. It provides backup and replication capabilities for high availability.

You can create a highly available Cloud SQL instance in two types of locations:

  • Regional location—specific geographical locations, such as New York.
  • Multiregional location—an extended geographic area that includes two or more geographic locations, such as the United States.

The only difference between regional and multiregional locations is for backup purposes. A multiregional instance can save backups in multiple regions for higher resiliency.

 

To determine who gets hired, Google interviewers use a scientifically proven method called “structured interviewing,” where interviewers prepare a list of rigorous and relevant questions, and then come up with a scoring rubric to match those questions. This is a bit of a different method from typical job interviews in that instead of asking questions catered specifically to your resume, the same set of questions are used to assess every candidate interviewing for the same job. When drafting questions, interviewers must take into account Google’s four core attributes:

  • General cognitive ability: How a candidate thinks – smart people who can learn and adapt quickly
  • Leadership: Leadership skills – People who can step into leadership roles, but also know when to step back once the need for their skills have passed
  • Googleyness: Intellectual curiosity. People who show signs of comfort with ambiguity and have a collaborative nature
  • Role-related knowledge: People who have the experience and background for the specific job they’re applying for

General Cognitive Ability (0:45 Minutes Total)

Let’s discuss how a candidate thinks. After your resume screen, often times you’ll be invited to a GCA, or General Cognitive Ability interview at Google or over the phone. Although this may sound like convoluted terminology, it’s really just an assessment of how you break down complex problems and come up with really thoughtful solutions. A GCA interview is:

  • An assessment of your problem-solving skills: how are you using reason and rationale and data to solve complex issues.
  • Insight into your work style
  • Opportunity to talk through situations you may face at Google, or other Googler’s have faced in the past.

In the past, Google used to ask questions like, “How many golf balls could fit in a 747?” Google no longer asks these types of questions and have instead moved towards hypothetical or behavioral questions.

Introductions (0:05 Minutes)

The typical question here is, “Tell me about yourself.” Break this into three sections: Present, Past and Future. Here is how I would answer that question.

Present: I’ve been working as a Sr Customer Engineer at Microsoft. Recently, I’ve been working with customers to Identify Cloud Adoption opportunities for their endpoint management strategies (such as SCCM to Intune ) and focus on endpoint security and policy management. I strive to maintain a technical relationship where I help map and translate our product offerings to their current business objectives. As I work with the customer I’ll provide proof-of-concepts, technical deep dives, presentations, workshops and often help them implement solutions.

I’ve been designing deliveries as well, and have put together a number of projects around migrations to Intune and designed an assessment tool using different APIs that is designed to assess Azure Tenants, back them up, and provide insights on mergers. This tool and associated deliveries generates around $20M in revenue annually.

I’m about to graduate with my masters in Information Management Systems from Harvard’s Extension School. My current GPA is 3.7 and graduate is set for 2022.

Past:  I started with a passion for computers around 8 and recall some of my favorite moves as a child being Pirates of Silicon Valley and Sneakers. I’ve always really enjoyed dynamic and challenging situations in life, and have been pushing myself since a young age. While staying primarily in IT, I have continued to push my other interests and have been a professional photographer (traveling to North Korea and dozens of other countries), an auto mechanic, pianist, stand-up comedian and rock climbing instructor in New Zealand. I have always had a genuine passion to try and solve problems and find my current position as one of the most passionate things I have ever done. My clients often say that my interactions with them are some of the most beneficial they have ever had at Microsoft, and I’ve heard them attribute that to my passion, empathy, knowledge and humbleness.

Future: Looking forward, I strive to be in dynamic and challenging environments where we can really make an impact on customers. I’ve actually had quite a few opportunities to discuss the Google culture with other Googler’s over the past year (my good friend that is a TAM and fellow Harvard student that is a CE), and this interview is by no coincidence to what I learned. It sounds to me that these roles encourage plenty of dynamic thinking in challenging and changing environments, and that is something I have been excellent with during the course of my career.

From the introduction, the GCA interview is broken up into two parts:

Part 1: Behavioral (0:10 minutes)

These are past behaviors and assessing those past behaviors. A sample behavioral interview question could be:

  • “Tell me about a time when you lead a team.”
  • “Tell me about a time when you communicated effectively”
  • “Tell me about a time when you failed.”
  • “Tell me about a time when you received negative feedback?”
  • “Can you share your experience of working in an uncertain situation?”

Here are some tips on behavioral questions that might be asked, what you might speak to, and some examples how to answer them. Make sure to come prepared with at least one example of each, and use the S.T.A.R. method (which stands for situation, task, action, result) to describe your experience

  • Q: “Describe a time you took on a project risk and failed?” Speak about experiences where you took risks, made mistakes, and failed. They want to know if you were humble enough to accept and learn from those mistakes. You certainly don’t have to be perfect. Your life doesn’t have to be defined by experiences of success alone. So don’t be hesitant to reveal past failures. What matters is whether you learned from your failure.A: My very first solo client at Microsoft. Very eager to come into Microsoft and show my worth after training had finished, I took on a high-profile client in downtown Los Angeles. The client was looking for solutions to decrease downtime in their endpoint management application. As soon as I understood their problem, I sat with their engineering team and began to write automation scripts to run in their production environment that would resolve one of their largest bottlenecks. Everything seemed to go smoothly and the client was impressed with my quick ‘wit’ to solve their issue. A week later I received a follow-up call saying the script had stopped an important production migration task and forced them to have over a ½ day of downtime. I worked with the client to resolve the issue and set aside time with them to move the script to a test environment.Result: I was so convinced my script was solid, that I neglected to even suggest testing it in a dev environment. This experience taught me humility with my very first client and reminded me the importance of putting the customer’s needs before my ego. From there on, I like to be a lot clearer with customers helping them understand certain risks and to plan for contingencies, regardless of how solid a plan may seem.
  • Q: “Tell us about a time you executed a new task or project?” Speak about a past project that involved significant learning. If you found yourself in a situation where you successfully completed a project despite being unaware of certain functionalities at the start, mention your approach, how you implemented your learnings, and how you went about successfully completing it.A: Moving my customer deliveries from on-prem technology solutions to cloud-native solutions. When I joined Microsoft, I had little to no cloud technology knowledge and was hired to develop and deliver content for our on-prem endpoint management technology. Although I was successful in this role, I knew Microsoft was eagerly adopting a cloud-first strategy and I wanted to grow with Microsoft’s vision and reinvent myself for cloud-native solutions such as application modernization in azure, infrastructure modernization in azure and cloud endpoint management technologies like Intune and related technologies. This required me to essentially start from scratch, and I worked towards cloud-native accreditations and had shifted to 100% cloud-native deliveries in under a year.

    A2: Working for a large movie studio in LA that was having great difficulty in a merger with another organization, I developed a tenant merger and assessment tool from scratch that helped them with a huge Azure merger. When faced with this difficulty my client was experiencing during a multi-billion-dollar acquisition, I took it upon myself to understand their issues with an Azure tenant merger and built an assessment and migration tool.Result: The subsequent engagement was not only successful for my client but went on to make ~$18M in revenue annually as a sold consulting engagement using this tool for other companies facing similar issues.

  • Q: “What are three things you consider important to maintain work-life balance?” Speak about 3 productive activities that bring you joy every day, and why you’d like to do them despite having a busy work day. Talk about what you expect from your company to keep yourself energized and motivated when you come in the next day.A: Prioritize your time, maintain personal health and remain uniquely myself. It is important to make sure you know your limits and prioritize time for yourself and your family. I think of it as like a budget of time. Personal health, physical and mental, is also important because it helps me maintain boundaries where I can focus on work and life. For example, if I am plagued with unresolved personal or work problems, work and personal life will blend as my issues will permeate constantly in my consciousness. Finally, although it is important to adopt company attitudes and mission, remaining who I am at my core everyday helps give me a sense of balance that makes work seem less controversial to my personal self, and therefor less necessary to maintain a clear boundary for a health balance.Result: During my career, I think I have always been able to balance a healthy work-life balance. My trueness to character has kept me in good spirits and fun to work with, my physical and mental health has guided me through rough patches, and a good time management assures I am still meeting my career and life goals.
  • Q: “Tell us about an accomplishment you’re particularly proud of and why?” Speak about a past project that involved dedicated efforts from other members of your team and how you achieved goals as a team.A: Winning back a large Canadian company to Microsoft. A Microsoft Account manager reached out to me that she was about to lose a major contract with one of our “Canadian Big-Bets” due to a number of botched prior engagements and failed cloud migration initiatives. I took on the challenge and brought in previous delivery engineers and the account manager to understand the customer and their issues. We identified several problems, including our delivery style, their unique environment, a tarnished Microsoft image within the company and failed engagements due to personal health issues from one of the previous engineers that required them to cancel a delivery more than once. Using this information, I put together a strategy to win back the company that showcased our willingness at Microsoft to admit our faults and to once again listen closely to their concerns and get appropriate resources assigned.Result: Not only did my initiative work to win back the company’s trust, they also agreed to renew their annual DSE contract and let them account manager know their renewed trust was due specifically to my handling of their account.

    A2: Winning over a large government account on the West coast. After a successful Intune migration delivery with a large government account, the projects lead encouraged Microsoft to have me work with them on a long-term contract for a large-scale (30K clients) Windows 10 migration to the cloud. This included identity, security, deployment, application modernization, monitoring/logging and scaling.Result: Not only did I work with them to successfully migrate the 30K clients to Azure, the I was asked to return the subsequent year for a continued cloud migration project that involved a refactoring of their applications to Azure.

  • Q: “How are you going to impact the team after you join?” Speak about fun and interesting activities that you’d like to be part of or initiate.A: I will be curious to learn more about how the CE space functions and how I can contribute to our tram’s success. I’d be willing to work hard to spot the areas of potential improvements and have the willingness to do whatever it takes to make the customer successful. I would look for ways to contribute in a dynamic and fast paced environment where change is constantly a way to accelerate. I’d love to learn more about Google’s diverse culture and add diversity in thought and experience as a fundamental mover in my role.
  • Q: “Have you ever faced conflict while working as a team? If so, how did you deal with it?” Speak about how you were able to resolve a past conflict with a fellow colleague, and how you arrived at a consensus for the greater good of the project.A: I had a hunch an account manager at Microsoft did not like my delivery style with our customer so I followed up find a solution. I took my account manager out to lunch and told him it would be ok if he wanted to tell me candidly how I was handling the account. He did open up and told me the customer, all though they like me, are looking me to provide more technical feedback and guidance on the migration of an SCCM site into a new datacenter. I continued working with the account manager and eventually ended up seeking a new engineer to take over the account. I worked with both the engineer, customer and account manager to bring every one on the same page and that the end solution was to assure the customer had what they were looking for.Result: The transition to the new engineer went very smooth and the customer specifically reached out to me to thank me for how professional and courteous I was to help them find a replacement for myself with nobody asking. They built an excellent relationship with this new engineer and continued to bring me on for several other projects where my skills were relevant.
  • Q: “Why do you want to work at Google?” Speak about your favorite Google products and what you like best about Google’s work culture.A: I want to join Google based on my own discovery, looking for the right role. I have spoken to many people at Google over the past couple years, specifically narrowing in on a Customer or Sales Engineer role. One thing I especially like about what I have learned at Google is the need to solve customer problems in dynamic and challenging environments. I have worked especially well in these situations; however, some companies have a more mature cloud adoption strategy where the approach for delivery is more formalized and determinate. I am looking for a role where quick witted, dynamic thinking in an exciting and challenging environment is rewards and I believe Google to be the right path for me to foster this ability.

Part 2: Hypothetical / Situational (0:20 minutes)

These are questions that are assessing real-life situations you may actually face at Google. The goal of these types of questions is to see how you:

  • Understand the question – you are often given too much or too little information. Google wants to make sure you understand the core and central issue. This is your ability to get through the noise and get to the core issue at hand.
  • Prepare a strategy – with the information given, are you able to thoughtfully parse through that information and formulate a coherent, dynamic response.
  • Identify a solution(s) – these responses are often open ended, and there is no right or wrong way to answer GCA questions.
  • Determine justification for a solution – how can this solution be justified?
  • Communicate – how well have you been able to communicate solution to interviewer?

Strong Response Framework

Let’s look at a framework of how to build a really strong response. This framework does not have to be prescriptive, but should give you a general sense of how to tackle a GCA question. There may be some questions that require all elements of the framework, where other questions only require a couple.

How to build a strong response

Element More information
Take a moment before responding Write down question as it’s being asked.

“Can you repeat the question?”
“Can I have a moment before responding?”

Ask clarifying questions Typically, Google will give you too little, or too much information. Ask interviewer enough questions to effectively answer the question.
Share logical assumptions Because you don’t have enough information, make logical leaps that work for your response.
Show your work Communicate to interviewer your thought process
Consider Pros and cons -or- Think about how you measure success
Tie it back to the role Often these questions are role related to what you are applying for, so if you can tie it back to the role.

Sample Questions

Let’s look at an actual question from a prior interview. We will use a combination of GCA goals and framework elements to answer the question.

“Imagine you are in charge of organizing the grand opening event of a new Google office. How would you plan this event?”

Element How to Respond
Ask clarifying questions “Where is the new office?” – Cambridge

“Is there a budget?” – You can decide. There was a similar event last year in NY and their budget was $50k

“How many people are attending?” – 100 people

Share logical assumptions “I will assume there is a facilities team onsite to help me organize this event”

“I’ll also assume that the objective of this event is to welcome new Googlers to the Cambridge office”

“Since NYC is larger than Cambridge, I’ll assume this grand opening will be smaller and we will only need a $10k budget.”

Show your work “I am assuming we have a budget of $100 per person.”

“I’m also assuming I will have the capacity to coordinate with someone in the Cambridge office”

“I will also assume all logistical needs can be solved by local vendors”

Communicate your solution “I would recommend the following steps to plan the Grand opening of the Cambridge office…

  1. First, I’m assuming there is a facilities team on-site to assist with this project, and all logistical efforts can be supported internally. I would reach out to that team to begin planning, and to assess if I will have to use outside vendors.
  2. Second, I’ll assume we have a budget of $100 per person; therefore, I know I have ample budget for food, drinks and décor.
  3. Third, because the objective of the event is to welcome new Googlers to the office, I would ensure that I’m inclusive of all Googlers in my planning. I’d try to bring local food vendors or find some fun activities for nooglers and their families to do.
  4. Finally, I would collect an experience feedback from the NYX event and incorporate it into my planning.
Consider Pros and Cons -or- How You Would Measure Success “To measure success, I would…”

  1. Send out a post event survey to the attendees to measure impact against the intended objective
  2. Make sure capacity stayed within budget
  3. Record the number of attendees
  4. Ask, will my project plan be used to plan future grand openings?

Let’s look at other hypothetical questions that were used previously at Google:

  • “Imagine you’re working on an email product and a competitor starts charging a $5 monthly fee for their product. How would you assess the situation and what recommendation would you make to your team?”
  • “Imagine you’re working in a new company and you realized there is a dashboard showing the business metrics, but no one uses it. How will you convince your colleagues to use the dashboard and which communication tools will you use?”
  • “Imagine you’re working in a new company and you discover the employees are using online documentation, yet your department still receives 20% of calls. How can you decrease this by 10% and how will you measure the results?
  • “Imagine you’re working in a new company, and you discover they have 80% satisfied customers. How can you increase this to 90%?”
  • “Imagine you are working with a new customer. How will you help your customer to make a choice between IaaS and PaaS?”
  • “Imagine you are working with a Sales rep that has a new customer. What steps would you take if the sales rep request you for a deep-dive on Containers for that customer?”
  • “Image you are working with a new customer. What steps would you take to guide your customer if they want to develop an app and use GCP products?”

How would you answer these?

Candidate Evaluation – The Scoring Rubric

As you answer the questions, the interviewer will be using the structured interviewing scoring Rubric to assess you on the following items. As this is an internal process, I am making some assumptions that a Rubric may look like this.

Item Score 1-10
How well did the candidate understand the question, including the basic problem?
How well did they ask clarifying questions?
What relevant information, stakeholders, and variables were considered?
Did the candidate identify multiple solutions?
Were they able to reasonably justify why their solution was the best option?
Did the candidates listen to incorporate any feedback/hints from probing questions?
Total Score

Wrap-Up (0:05 Minutes)

Here, you’ll just wrap up the interview and wish each other a nice weekend, etc.

We will look at the creation of a Migration Factory – a scaled team (often offshore or outsourced) that drives large-scale migration of enterprise applications to the cloud. Google Cloud has a four-stage approach to Migration, Discover/Assess, Plan, Migrate and Optimize, and the Migration Factory is designed to help execute the Migrate stage. We also discussed these four migration phases here in Migrate Enterprise Workloads to Google Cloud.

You should have a high-level understanding of the concepts discussed in the Google Cloud Adoption Framework, and have a desire to migrate a large number of workloads to Google Cloud (in the order of hundreds or more of applications, or thousands or more of servers).

Overview

Many of you are looking to GCP to solve your on-premises infrastructure challenges. These could be capacity constraints, aging hardware, or reliability issues; or alternatively, you may be looking to capitalize on the value that cloud infrastructure can bring – saving money through automatic scaling, or deriving business value from large scale, cloud-native approaches to data processing and analytics.

With that said, moving to the cloud can be a complex and time-consuming journey. An inefficient migration program can significantly reduce the benefits realized from the migration, and a pure lift-and-shift approach can leave you with similar challenges and costs in the cloud as you were trying to escape from on-premises.

If you have already started this journey, you might find it harder than expected – with more than half of migration projects being delayed or over budget. Some typical challenges are:

  • Unclear goals
  • Lack of sponsorship
  • Poor planning
  • Wrong technology choice
  • Delivery capability and operating model

Migration Approach

Google Cloud Adoption Framework

When migrating to Google Cloud, it is the recommended to use the Google Cloud Adoption Framework when establishing the foundational aspects of a cloud migration program. Let’s review some of that again here.

There are three components of the framework Google Cloud uses to help you get to the cloud:

  1. Three Maturity Phases (applied to the Four Adoption Themes)
    1. Tactical– You have individual workloads in place but no solid plan bringing them all together with a strategy that builds out towards the future.
    2. Strategic– You have a broader vision that brings together the individual workloads which are designed and developed with a concern for future needs and scale.
    3. Transformational– With your cloud operations now functioning smoothly, you are integrating data and insights learned from working now in the cloud.
  2. Four Adoption Themes
    1. Learn – The value and scale of your learning programs that you have in place to enhance to skillset of your technical teams. It also refers to your ability to supplement your technical teams with the right partners.
    2. Lead – The degree to which your technical teams are supported from leadership to migrate to the cloud. Additionally, we need to consider how cross-functional, collaborative, and self-motivated these teams are.
    3. Scale – The degree to which you will use cloud-native services which will reduce operational overhead and automate manual processes and polices.
    4. Secure– Your capacity to protect your cloud services from unauthorized access using a multilayered, identity-centric security model.
  3. Epics
    1. The scope and structure of the program you will use for cloud adoption can be broken into workstreams, which Google refers to as epics. Epics are designed to not overlap one another, are aligned to manageable groups of stakeholders and can be further broken down into induvial user stories.

Migration Journey

Once you have assessed your migration journey with the Cloud Adoption Framework, part of that framework is to assess your Cloud Maturity. This will help you build a migration path, such as this one the migration factory:

Let’s review again what some of the migration paths are, which we also outlined in Migrate Enterprise Workloads to Google Cloud.

  • Lift-and-shift (rehost): “Moving out of a data center” – In a lift-and-shift migration, you move workloads from a source environment to a target environment with minor or no modifications or refactoring.
  • Improve and move (Replatform): “Application Modernization” – In a move and improve migration, you modernize the workload while migrating it. In this type of migration, you modify the workloads to take advantage of cloud-native capabilities, and not just to make them work in the new environment.
  • Rip and replace (Refactor): “Building in and for the Cloud” – In a rip and replace migration, you decommission an existing app and completely redesign and rewrite it as a cloud-native app

Combining cloud migration types with the Cloud Adoption Strategy maturity phases, you could summarize an approach for migrating each of your workloads as follows:

Tactical Strategic Transformational
Approach Lift and Shift Improve and Move Rip and Replace
Rehost Replatform Refactor
Business Objective Optimize costs; minimize IT disruption; achieve a scaleable, secure platform Maximize business value; optimize IT operations IT as a center of business innovation
Effort Low Medium High

The path you take for each of your applications will differ depending on your overall strategy. Generally, large organizations lift-and-shift 70-80% of their workloads initially, focusing their transformation efforts on the areas where they can maximize impact (ex. moving a data warehouse to BigQuery, or refactoring an e-commerce platform for scale.)

Migration Phases

Looing again at the four migration phases we discussed here in Migrate Enterprise Workloads to Google Cloud, the goal with a cloud migration is to get from point A (where you are now on-prem) to point B (in the cloud).

The journey from A to B can be summarized as:

These phases can be used to build a migration approach that is an agile, scalable pipeline of workload migration. A high-level overview of the methodology is shown here:

There is typically an initial sprint or series of sprints (a short, time-boxed period when a scrum team works to complete a set amount of work) of iteration through the Discover & Assess and Plan phases, in order to build a business case and a plan for the overall program. Subsequently, there you can build waves of migrations of workloads, which progress through migration using a sprint-based approach.

In a large-scale migration program, it’s recommended the migration sprints are managed through a Migration Factory.

Migration Factory

The migration factory is conceptualized to addresses the challenge of executing a large migration program and delivers a scalable approach aligned to the Google Cloud Adoption Framework in order to:

  • Migrate and manage large volumes of systems and applications
  • Initiate and drive new, cloud-native ways of working,
  • Establish a new collaborative, joint teamwork model within IT and the business

Very similar to the initial Sprint, you can see that the factory is a combination of the Scrum Sprint methodology and the Cloud Adoption Framework. It is especially well-suited for large-scale migration ( 500+ servers and 200+ applications) taking a Lift and Shift (Rehost) or Improve and Move (Replatform) approach.

The best way to think about the factory is as an iterative approach to the framework:

The migration factory is not a good fit when either the number of migrated workloads is too small to justify the effort building the factory or the migration approach is too individual by workload to establish an overarching holistic process.

Testing the factory

It’s important to schedule and execute some test-runs of the fully established factory including the team, the process, and all tools. Pick a couple of test cases/workloads and execute a test migration. It is recommended to repeat this exercise a couple of times until the end-to-end flow works as expected, with the predicted migration velocity and quality.

Establishing a Migration Factory

The migration factory can be divided into the three pillars of process, people, and technology; at Google, these are underpinned by the four themes of the Cloud Adoption Framework, as outlined earlier.

  • Processes are the elements that are carried out by people (who develop) with their knowledge.
  • People are the foundation. They are the surface and origin of the Knowledge Management, source of knowledge and actors for the next levels.
  • Technology streamlines people and processes to develop and accomplish the desired output

Process

Each migration factory should follow a well-defined end-to-end process. To establish this, it’s important to analyze all possible migration tasks for all workloads necessary

  • Tasks and Sub-Processes: An end-to-end process can have more than 100 individual process tasks in total. Individual tasks might have additional sub-processes and activities which should be analyzed, defined, and documented.
  • Automation and economies of scale: The individual tasks are the right level of detail to start looking for automation opportunities

People

Based on an understanding of the end-to-end migration process and the total migration scope, there are two important considerations: What expertise/which teams are needed to run the process, and what is the target for migration velocity/overall scale of the program?

  • Dedicated migration teams: Each team/domain should provide the right amount of skilled people and dedicate them to the migration factory. 100% dedication and assignment are strongly recommended.
  • Team Capacity Planning: As individuals might get sick or be on vacation it’s essential to plan enough spare capacity
  • Team Orchestration: This individual or team will oversee the process per individual migration workload, coordinate and initiate the individual tasks, manage the timely feedback, and provide regular status updates back to the migration dashboard.

Technology

There are a large number of technical tools to help to migrate workloads.

  • Migration management and communication tools: A Project Management tool must be used as the single source of truth for the whole team to understand what process steps have already been completed, what’s in progress, and who needs to take the next action.
  • Migration Execution Tools: Examples include Cloud Foundation Toolkit, Migrate for Compute Engine, Google BigQuery data transfer service, and CFT Scorecard.

Service-interrupting events can and will happen in your environment. Your network could have an outage, your latest application push might introduce a critical bug, or you might someday have to deal with a natural disaster. When things go wrong, it’s important to have a robust, targeted, and well-tested DR plan for your resources in Google Cloud.

DR Planning Fundamentals

Disaster Recovery (DR) is contained as a subset of business continuity planning. The start of a DR plan can really be simplified by analyzing the business impact of two important metrics:

  • Recovery Time Objective (RTO) is the maximum length of time you find acceptable that your application can be offline. Your RTO value is typically defined as part of your service level agreement (SLA).
  • Recovery Point Objective (RPO) is the maximum length of time you find acceptable that your application could lose data due to an incident.

In most scenarios, the shorter the RTO and RPO values the more expensive your application will cost to run. Let’s look at a ratio of cost to RTO/RPO

Business impact analysis for business continuity: Recovery time requirements

As these smaller RTO and RPO values typically lead to greater complexity, the correlated administrative overhead follows a similar curve. A high-availability application might require you to manage distribution between two physically separated data centers, manage replication, etc.

It’s likely that you are also considering and planning for high availability (HA). HA doesn’t entirely overlap with DR, but it’s important to take HA into account when you’re planning for RTO and RPO values. HA helps to ensure an agreed level of operational performance, usually uptime, for a higher-than-normal period.

Google Cloud in Relation to RTO and RPO

GCP can often reduce the costs associated to RTO and RPO compared to their costs on-premises. \

On-premises DR planning forces you to account for the following requirements

  • Capacity: securing enough resources to scale as needed.
  • Security: providing physical security to protect assets.
  • Network infrastructure: including software components such as firewalls and load balancers.
  • Support: making available skilled technicians to perform maintenance and to address issues.
  • Bandwidth: planning suitable bandwidth for peak load.
  • Facilities: ensuring physical infrastructure, including equipment and power.

Google Cloud, as a highly managed solution, can help you bypass many of these on-premises requirements, removing many of the costs from your cloud DR design.

GCP offers these features that are relevant to DR planning, including:

  • Global network: Google backbone network uses advanced software-defined networking and edge-caching services
  • Redundancy: Multiple points of presence (PoPs) across the globe.
  • Scalability: App Engine, Compute Engine autoscalers, and Datastore give you automatic scaling
  • Security: The site reliability engineering teams at Google help ensure high availability and prevent abuse of platform resources.
  • Compliance: Google undergoes regular independent third-party audits to verify that Google Cloud is in alignment with security, privacy, and compliance regulations and best practices.

The Three Stages of Disaster Recovery Sites

A backup site is a location where you can relocate following a disaster, such as fire, flood, terrorist threat or another disruptive event. This is an integral part of the DR plan and wider business continuity planning of your organization.

  • A cold site is an empty operational space with basic facilities like raised floors, air conditioning, power and communication lines etc. Following an incident equipment is brought in and set up to resume operations. It does not include backed up copies of data and information from the original location of the organization, nor does it include hardware already set up.
  • A warm site is a compromise between hot and cold. These sites will have hardware and connectivity already established, though on a smaller scale. Warm sites might have backups on hand, but they may not be complete and may be between several days and a week old
  • A hot site is a near duplicate of the original site of the organization, with full computer systems as well as complete backups of user data. Real time synchronization between the two sites may be used to completely mirror the data environment of the original site using wide area network links and specialized software.

The terms cold, warm and hot can also be used within DR context to describe patterns that indicate how readily a system can recover when something goes wrong.

Creating Your Disaster Recovery Plan

These are the basic components when creating your DR plan.

  • Design to your recovery goals: look at your RTO and RPO values and which DR pattern you can adopt to meet those values. For example, if you have historical non-critical compliance data, you with a large RTO value, a cold DR pattern is likely fine.
  • Design for end-to-end recovery: It’s important to make sure your DR plan covers the full recovery process, from backup to restore to cleanup
  • Make Disaster Recovery (DR) Tasks Specific: If you need to execute your DR plan, each task should be concrete and unambiguous. For example, “Run the restore script” is too general. In contrast, “Open Bash and run ./restore.sh” is precise and concrete.

Applying Control Measures

Another important component when thinking of DR is how you can potentially precent a disaster before it occurs. For example, add a monitor that sends an alert when a data-destructive flow, such as a deletion pipeline, exhibits unexpected spikes or other unusual activity. This monitor could also terminate the pipeline processes if a certain deletion threshold is reached, preventing a catastrophic situation.

Making Sure Software is Configured for Disaster Recovery

Part of the DR planning is to make sure your software is configured in the event a recovery is needed.

  • Verify software can be installed: Make sure that your applications can be installed from source or from a preconfigured image, licensing is available this these apps, and that any Compute Engine resources are available such as pre-allocating VM instances.
  • Think of the CD in CI/CD: The Continuous Delivery (CD) component of your CI/CD pipeline is integral to how you deploy applications. As part of your DR plan, consider how this will work in your recovered environment.

Security and Compliance Controls

Often with recovery we are just thinking of how to get our site back online with the least disruption. But don’t forget, security is important. The same controls that you have in your production environment must apply to your recovered environment. Compliance regulations will also apply to your recovered environment.

  • Make sure network controls provide the same separation and blocking from as your production environment offered. Think of Shared VPCs and Google Cloud Firewalls.
  • Replicate IAM policies to DR environment: IaC methods in Cloud Deployment Manager can help with this.
  • After you’ve implemented these security controls in the DR environment. Make sure to test everything.
  • Train your users on the DR environment and the steps in the plan.
  • Make sure DR meets compliance requirements: only those who need access have access, PII data is redacted and encrypted, etc.

Disaster recovery scenarios for Data

Disaster recovery plans should specify how to avoid losing data during a disaster. The term data here covers two scenarios. Backing up and then recovering database, log data, and other data types fits into one of the following scenarios:

  • Data backups: This involves copying od data in discrete amounts from one place to another, such as production site to DR site. Typically, data backups have a small to medium RTO and a small RPO.
  • Database backups: These are slightly more complex because they are often centered around a time component. When you think of your database, you might immediately think, from what moment in time is that data? Adopting a high-availability-first approach can help you achieve the smaller RTO and RPO values your DR plan will probably desire.

Let’s look at some different scenarios and how we could achieve a DR plan for these types.

Production Environment is On-Premises

In this scenario, your production environment is on-premises, and your disaster recovery plan involves using Google Cloud as the recovery site.

Data backup and recovery

  • Solution 1: Back up to Cloud Storage using a scheduled task
    • Create a scheduled task that runs a script or application to transfer the data to Cloud Storage.
  • Solution 2: Back up to Cloud Storage using Transfer service for on-premises data
    • This service is a scalable, reliable, and managed service that enables you to transfer large amounts of data from your data center to a Cloud Storage bucket.
  • Solution 3: Back up to Cloud Storage using a partner gateway solution
    • Use a partner gateway between your on-premises storage and Google Cloud to facilitate this transfer of data to Cloud Storage.

Database backup and recovery

  • Solution 1: Backup and recovery using a recovery server on Google Cloud
    • Backup your database to file backup and transfer to Cloud Storage Bucket. When you need to recover, spin up an instance with database capabilities and restore backup file to instance.
  • Solution 2: Replication to a standby server on Google Cloud
    • Achieve very small RTO and RPO values by replicating (not just a back up) data and in some cases database state in real time to a hot standby of your database server.
    • Configure replication between your on-premises database server and the target database server in Google Cloud

Production Environment is Google Cloud

In this scenario, both your production environment and your disaster recovery environment run on Google Cloud.

Data backup and recovery

A common pattern for data backups is to use a tiered storage pattern. When your production workload is on Google Cloud, the tiered storage system looks like the following diagram. You migrate data to a tier that has lower storage costs, because the requirement to access the backed-up data is less likely.

Conceptual diagram showing image showing decreasing cost as data is migrated from persistent disks to Nearline to Coldline

Database backup and recovery

If you use a self-managed database on Google Cloud such as MySQL, PostgreSQL, or SQL Server as an instance on Computer Engine, you will have similar concerns as with those same databases on-premise. The one bonus here is that you do not need to manage the underlying infrastructure.

A common pattern is to enable recovery of a database server that does not require system state to be synchronized with a hot standby.

If you are using a managed database service in Google Cloud, you can implement appropriate backup and recovery.

  • Bigtable provides Bigtable replication. A replicated Bigtable database can provide higher availability than a single cluster, additional read throughput, and higher durability and resilience in the face of zonal or regional failures.
  • BigQuery. If you want to archive data, you can take advantage of BigQuery’s long term storage. If a table is not edited for 90 consecutive days, the price of storage for that table automatically drops by 50 percent.
  • Firestore. The managed export and import service allows you to import and export Firestore entities using a Cloud Storage bucket
  • Spanner. You can use Dataflow templates for making a full export of your database to a set of Avro files in a Cloud Storage bucket
  • Cloud Composer. You can use Cloud Composer (a managed version of Apache Airflow) to schedule regular backups of multiple Google Cloud databases.

Disaster recovery scenarios for applications

Let’s frame DR scenarios for applications in terms of DR patterns that indicate how readily the application can recover from a disaster event.

  • Batch processing workloads: Tend not to be mission critical, so you typically don’t need to incur the cost of designing a high availability (HA) architecture. Take advantage of cost-effective products such as preemptible VM instances, which is an instance you can create and run at a much lower price than normal instances. (By implementing regular checkpoints as part of the processing task, the processing job can resume from the point of failure when new VMs are launched. This is a warm pattern.
  • Ecommerce sites: can have larger RTO values for some components. For example, the actual purchasing pipeline needs to have high availability, but the email process that sends order notifications to customers can tolerate a few hours’ delay. The transactional part of the application needs high uptime with a minimal RTO value. Therefore, you use HA, which maximizes the availability of this part of the application. This approach can be considered a hot pattern.
  • Video streaming: In this scenario, an HA architecture is a must-have, and small RTO values are needed. This scenario requires a hot pattern throughout the application architecture to guarantee minimal impact in case of a disaster.

 

Migrating a workload from your legacy on-premises environment to a cloud-native environment, such as a public cloud, can be challenging and risky. Successful migrations change the workload to migrate as little as possible during the migration operations. Moving legacy on-premises apps to the cloud often requires multiple migration steps

There are three major types of migrations that you can consider:

  • Lift and Shift
  • Improve and Move
  • Rip and Replace

Lift and Shift (Rehost)

Known as “Moving out of a data center”, the easiest of all workload migrations, lift and shift is the movement of your workload from your on-prem environment to the cloud with little to no modifications or refactoring. The only modifications that are necessary are those just required to get your applications working in the cloud environment.

Lift and shift migrations are best when the workload can still operate as-is in the cloud environment or where you have no business need for the change or where technical constraints won’t allow it any other way. This could be due to complicated source code that would be difficult to refactor.

On the down side, lift and shift migrations are considered non-cloud-native workloads that happen to be running in the cloud. These workloads don’t take full advantage of cloud platform features, such as horizontal scalability, more controlled pricing, and having highly managed services.

Improve and move (Replatform)

Known as “Application Modernization”, in an improve and move migration, you modernize much of your workload while migrating it. The idea is to modify the workloads to take advantage of cloud-native capabilities, as opposed to simply just trying to make them work in the cloud environment like we did with Lift and Shift. You can improve each workload for performance, features, cost, or user experience.

Improve and move migrations let your applications use features of a cloud platform, such as scalability and high availability. You can also architect the improvement to increase the portability of the application.

The downside here is that improve and move migrations take longer than lift and shift migrations, because they must be refactored in order for the applications to migrate.

Rip and Replace (Refactor)

Known as “Building in and for the cloud”, with rip and replace migration, you are not migrating your applications, but completely decommissioning them and rebuilding and rewriting them as a cloud-native app.

If your current applications are not meeting your goals, for example, you are sick of maintaining it or it would be too costly to migrate, or perhaps its not even supported on Google Cloud, you can do a rip and replace.

This migration allows your application to take full advantage of Google Cloud features, such as horizontal scalability, highly managed services and high availability.

However, rip and replace migrations can take longer than lift and shift or improve and move migrations. Further, this type of migration isn’t suitable for off-the-shelf applications because it requires rewriting the apps. You need to think about the extra time and effort to redesign and rewrite the apps as part of its lifecycle.

Migration Path

The goal with a cloud migration is to get from point A (where you are now on-prem) to point B (in the cloud). To get from A to B you can use any of the methods we just discussed.

The journey from A to B can be summarized as:

Assess

Perform a thorough assessment and discovery of your existing environment in order to understand your app and environment inventory, identify app dependencies and requirements, perform total cost of ownership calculations, and establish app performance benchmarks.

  • Take Inventory: databases, message brokers, data warehouses, network appliances and dependencies. Machines + OS + specs
  • Catalog Apps: Mission critical, non-mission critical
  • Educate: Train and certify engineers on Google Cloud – frameworks, APIs, libraries
  • Experiment / POC: Run a bunch of POCs such as firewall rules, performance on Cloud SQL, play with Cloud Build, play with GKE Clusters
  • Calculate total cost of ownership: Google Cloud vs On-Prem. Which is cheaper? Use the Google Cloud price calculator
  • Choose what workloads to first migrate: Non business critical, dependency-light workload, requires minimal refactoring

Plan

Create the basic cloud infrastructure for your workloads to live in and plan how you will move apps. This planning includes identity management, organization and project structure, networking, sorting your apps, and developing a prioritized migration strategy.

  • Establish Identities: Google Accounts, Service Accounts, Google Groups, Google Workspace Domains, Cloud Identity Domains
  • Design Resource Organization: Organizations, Folders and Projects
  • Define hierarchy: Environment-oriented, function oriented or granular access-oriented
  • Define groups and roles for resource access:
    • Org Admin: IAM policies
    • Network Admin: networks, subnetworks, Cloud Router, Cloud VPN, Cloud Load Balancing
    • Security Admin: IAM roles for projects, logs and resource visibility
    • Billing Admin: billing accounts, monitor resource usage
  • Design Network Topology / Establish Connectivity: Create VPC(s), cloud interconnect/peering/cloud VPN/Public internert

Deploy

Design, implement and execute a deployment process to move workloads to Google Cloud. You might also have to refine your cloud infrastructure to deal with new needs.

  • Fully manual deployment: do everything from provision, configuration and deployments manually
  • Configuration Management (CM) Tools: Deploy in automated, repeatable way. Is a bit complicated.
  • Container Orchestration: GKE to orchestrate workloads
  • Deployment Automation: CI/CD Pipeline to automate creation and deployment of artifacts
  • Infrastructure as Code (IaC): Terraform or Deployment manager

Optimize

Begin to take full advantage of cloud-native technologies and capabilities to expand your business’s potential to things such as performance, scalability, disaster recovery, costs, training, as well as opening the doors to machine learning and artificial intelligence integrations for your app.

  • Build and train your team: Train deployment and operation teams to know new cloud environment.
  • Monitor everything: Cloud Logging and Cloud Functions, Prometheus, Cloud Monitoring alerting
  • Automate everything: Automate critical activities such as deployments, secrets exchanges, and configuration updates. Automating infrastructure with Cloud Composer and Automating Canary Analysis on Google Kubernetes Engine with Spinnaker are examples of automation on Google Cloud.
  • Codify everything: Infrastructure as Code and Policy as Code, you can make your environment fully auditable and repeatable
  • Use managed services instead of self-managed ones: Cloud SQL for MySQL instead of managing your own MySQL cluster, for example.
  • Optimize for performance and scalability: Compute Engine autoscaling groups, GKE cluster autoscaler, etc.
  • Reduce costs: analyze your billing reports to study your spending trends, etc.

More information can be found here: https://cloud.google.com/architecture/migration-to-gcp-getting-started

 

CI/CD is the combined practices of continuous integration (CI) and either continuous and continuous deployment (CD). CI/CD is designed to bridge the gap between development and operation activities and teams by enforcing automation in building, testing and deployment of applications. Modern day DevOps practices involve continuous development, continuous testing, continuous integration, continuous deployment and continuous monitoring of software applications throughout its development life cycle. The CI/CD pipeline forms the backbone of modern day DevOps operations.

The main goal of this automation pipeline within your business is to be able to deploy your application into different environments such as Dev/QA/Production without manual intervention. This automation reduces the risk of errors during deploying, reduces the number of hours for deploying code changes in multiple environments, and helps to deploy the changes more frequently in development and QA environments as soon as possible after changes are made. The methods a CI/CD pipline allows you to apply are:

  • Version control of source code.
  • Automatic building, testing, and deployment of apps.
  • Environment isolation and separation from production.
  • Replicable procedures for environment setup.

Creating a CI/CD Pipeline on Google Cloud

Let’s look at how you can set up a continuous integration/continuous deployment (CI/CD) pipeline for processing data by implementing CI/CD methods with managed products on Google Cloud. The Google Cloud tools we will to build the pipeline is Google Cloud Build. A typical CI/CD setup will look like:

First, the developer checks in the source code into GitHub (any Version Control System is ok). Next, GitHub triggers a post-commit hook to a Cloud Build. The Cloud Build then builds the container imager and pushes it to the Container Registry. Cloud Run is then notified to redeploy and Cloud Run pulls the latest image from Container Registry and runs it.

To build a simple pipeline in Google Cloud using Cloud Run, we will go through the high level steps.

  1. Create a Dockerfile with the necessary steps to build your container image, such as line that might install Tomcat: “RUN wget https://apache.mirrors.nublue.co.uk/tomcat/tomcat-8/v8.5.54/bin/apache-tomcat-8.5.54.tar.gz” and pulls the source code for the application from your git repo.
  2. Create a gcpbuild.YAML file that will build the docker image in GCP, push the container image into the Google Cloud Registry and then Deploy the image to Google Cloud Run.
  3. Go to Cloud Build and connect your Git Repository first.
  4. Now, create a Trigger
  5. Ensure Cloud Build has access to deploy to Cloud Run. For that go to settings and Enable the service account permission for Cloud Run.

Test Your CI / CD Pipeline

Now you are ready to test. Make a small change to your code and push it to your repository. This should trigger the cloud build:

After a little while, you’ll see your new container listed in Cloud Build:

And you should finally be able to see your new service deployed in Google Cloud Run:

That’s it! You have setup a simple CI/CD Pipeline for automation in Google Cloud!

Let’s look at some of the high-level methods required to migrate your website from a monolithic platform to a container-based microservices platform on GCP. The goal is to migrate your application one feature at a time, avoiding a single large-scale migration methodology.

Our goal here is to make the website more agile and scalable for each of these individual features. Each of these features can now be independently managed and updated, leading to faster improvements for each migrated feature.

Why Microservices?

Let’s look at some of the biggest advantages of microserviced applications. Most of these advantages now stem from the fact that each feature or microservice is loosely coupled to one another.

  • Microservices can be tested and deployed independently to one another. Typically, the smaller each deployment, the easier each deployment.
  • Each microservice can be written in its own language or framework. Because microservices communicate over a network via API calls, they do not all need to be written in the same language.
  • Microservices can also be tasked to different teams, making it easier to have a team dedicated to one or any related microservices.
  • Microservices teams have loosened dependencies on one another. Each needs to focus on their on making sure their APIs made available to the other services stay consistent, but beyond that do not need to worry about release cycles, how services are implemented, etc.
  • You can design more cleanly for failure. With clearer boundaries between your services, it can be easier to have a backup in place for that particular service.

Some of the disadvantages of microservices include:

  • The complexity of the design can increase as your app is not an interconnect of microservices over a network.
  • Security concerns can arise as your services now talk over a network. Products like Istio were developed to try and address these issues.
  • Performance can take an impact as data usually has to traverse a more complex route over the program’s microservices network.
  • System design can get more complicated, making understanding your application more difficult.

Migration Overview

Our goal is to get to a microservices environment from a single monolithic application. Let’s first take a look at the beginning and end of our journey.

First, the beginning of our journey is a monolithic website that runs on prem with dependencies on traditional database communications and the app servers running the application code. The important thing to note about the application is its design. Even if this were lift-and-shifted into the cloud on a GCP VM Instance, it would still be considered a monolithic application and the same migration principles would still apply.

The end result would look something like this after fully migrated to a microservices framework. You can see that each service is now running as an independent microservice container image on Google Kubernetes Engine (GKE) and traditional databases have moved to a Cloud SQL model and content in a Cloud Storage model. Note the application can still work with your datacenter via a Cloud Interconnect or VPN for backend services like CRM. The Cloud CDN is just there to help you distribute cached content more efficiently to your customers and the Cloud Load Balancing is there to help distribute the workload across resources in GCP. Apigee is a managed API gateway. Apigee is not necessary in this migration, but it’s recommended that all of your site’s content be served by public APIs. An API gateway like Apigee provides many features for API management, such as quotas, versioning, and authentication.

Preparing your Google Cloud Environment

Before you begin your migration journey, it is important to have your GCP environment setup, with a strategy defined on how services will be accessed, deployed, etc. Here are some of the major steps involved to get your cloud environment setup:

  1. Setup your Google Cloud Organization, the environment that will host your cloud resources. During this process you’ll setup your Google Workspace and Cloud Identity.
  2. Design your Google Cloud policies for control of the cloud resources. This is setting thigs like Network configuration and security controls, and Organizational security controls to meet requirements of your application.
  3. Design a method to deploy cloud resources, such as using Infrastructure as Code (IaC) to deploy to your GKE Clusters which will host your new microservices. Cloud Deployment Manager is a perfect tool for this and will give you standardized, reproducible, and auditable environments.
  4. Prepare your GKE environment for production and harden your cluster security. This is things like thinking how your clusters will be load balanced across regions and disabling public endpoint access.
  5. Build your continuous integration/continuous delivery (CI/CD) tooling for Kubernetes. You can use Cloud Build to build your container images, and Container Registry to store them and to detect vulnerabilities.

Migration Step-by-step Approach

Each feature of the website should be migrated one by one to the new environment, created microservices where it makes sense. These new microservices can call back to the legacy system when needed. The idea is to transform one major migration and refactoring project into several smaller projects. The advantages of migrating this way are:

  • The smaller projects have a more definable bound and will be easier to get movement on than one grand migration project. If we do it all at once, we’d need to get all the teams involved and have everyone understand all the interactions between systems, 3rd party dependencies, etc.
  • These smaller projects give us lots of flexibility. Smaller projects mean smaller teams, and this way they can be tackled one by one without anyone getting overwhelmed. You could also parallelize some of the work leading to a faster migration.

Before you start migrating any particular feature to a microservice, it’s most important that you take into account dependencies between features and what relies on what. This will help you formulate a chronology of events as some features make more sense to migrate before others.

Let’s look at a shopping cart example and what the journey currently looks like within the monolithic application and the dependencies this makes us aware of:

  1. A user browses your site and clicks “Add to cart” on an item they like. This triggers an API call from their browser to the shopping-cart feature. This is a first dependency you need to pay attention to: the front-end is acting on the shopping cart
  2. When the shopping-cart receives the API call, the shopping cart does an API call to the system that handles stock. This is the second dependency: the shopping cart depends on the stock system.
  3. If it’s in stock, now this is stored in a database such as “user A has 1 instance of X in cart.” This is the third dependency: the shopping cart needs the database to store this information
  4. When the user finally checks out and pays, the shopping cart is queried by the payment subsystem to compute the total. This is the fourth dependency: the shopping cart is queried by the payment subsystem.

Taking these dependencies into consideration, our migration to a microservice would look like this:

  1. Create a new microservice that implements your shopping-cart API. Use Firestore to store the shopping cart data. Make sure this new microservice can call the stock subsystem (see dependency 1 and 2 above).
  2. Create a script that can be rerun as needed that copies shopping carts from the legacy shopping-cart system and writes them to Firestore.
  3. Create a similar script that does the same thing, but the other way around: it copies Firestone carts back to your legacy system. This is just in case you need to roll-back.
  4. Expose the shopping cart API with Apigee
  5. Modify the frontend and payment subsystem so they call this new shopping cart microservice rather than the legacy one.
  6. Run the script from step 2.

You may also want to test this in a non-production website environment first and then replicate the results to the production system when you have it working correctly. Your shopping cart feature is now a microservice hosted on GCP.

For a deeper breakdown of this process, see https://cloud.google.com/architecture/migrating-a-monolithic-app-to-microservices-gke