DevOps
Fast start with Google Cloud
Our out-of-the-box infrastructure solution for businesses based on Google Cloud Platform and its services. It’s cloud native, flexible, scalable, and ready to use. Here you can see the detailed description of how this infrastructure operates and which tools we use.
Architecture
For this solution, we use a hybrid architecture where different parts of the system are hosted on the platform that best suits their needs.
Building complex app components
We use Google Kubernetes Engine (GKE) to manage complex and multicomponent parts of the system.
Tech value:
1
GKE provides robust orchestration tools which help us manage different parts of the system and make them work as one.
2
The platform allows us to configure each part of the system individually.
3
GKE lets us control complex workloads, making the app able to perform many different tasks simultaneously. Such apps include stateful applications, which need to remember and use data from previous operations, and microservices with complex dependencies.
Building simple app components
For managing simple services and APIs, we use Cloud Run.
Tech value:
Cloud Run automates deployment and scaling, including scaling to zero. It automatically adjusts resources based on load, and shuts down the application when inactive to save resources and reduce costs.
It offers a manageable ready-made environment for APIs and apps that don’t retain data between operations (stateless apps), reducing operational costs.
Business value:
Cost optimization
GKE is efficient for complex apps running continuously, while Cloud Run is cost-effective for services with variable or low load as it works on a pay-as-you-go model.
Faster time-to-market
Simple services can be quickly deployed on Cloud Run, while key system components will reliably operate on the GKE platform.
Cutting operational costs
Managed services, especially Cloud Run, reduce expenses on infrastructure administration.
Scalability
The combination of these platforms allows us to effectively scale both complex and simple parts of the system as your business grows and load increases.
Approach
We manage the infrastructure using the Infrastructure as Code (IaC) approach with the help of Terraform and Terragrunt tools.
Infrastructure as Code
It’s a practice where infrastructure components are managed and provisioned using code, rather than through manual processes. Our infrastructure code is an asset that we hand over to the client, ensuring transparency. When you work with us, you own the full infrastructure and have complete control over it.
Tech value:
Automation
All infrastructure resources like networks, GKE clusters, Cloud SQL databases, GCS buckets, IAM policies, etc. are created and described in the form of declarative code. This approach focuses on describing the desired end state, allowing the system to decide how to achieve it.
Consistency
Infrastructure across development, staging, and production environments is created and updated consistently, eliminating 'configuration drift' that can happen when settings deviate from the original description.
Version control
Infrastructure code is stored in Git, a version control system, which allows us to track all changes, conduct code reviews, and easily roll back to previous states.
Transparency
The entire infrastructure configuration is described in Terraform and Terragrunt code. There are no hidden settings or manual changes.
Documentation as code
The code itself serves as up-to-date documentation of the infrastructure.
Reproducibility
Any team member, including your team, can understand how the infrastructure is created and reproduce it.
Simplified training
The code serves as a foundation for training and transferring expertise to your internal team.
Terraform + Terragrunt
Terraform is a tool we use for IaC, and Terragrunt enhances its features.
Tech value:
Reusing code modules
Terragrunt helps us to structure Terraform code, manage dependencies between modules, and effectively manage multiple environments or projects. It also helps us keep the code DRY (Don't Repeat Yourself) which means that instead of repeating code, we reuse it.
Change planning
Terraform allows us to preview all changes before applying them to the actual infrastructure.
State management
Terraform tracks the current state of deployed infrastructure, allowing us to implement changes safely.
Business value:
Faster time-to-market
The approach significantly reduces the time required to create and update infrastructure compared to manual operations or scripts. We can rapidly deploy new environments for testing or for new projects.
Cutting costs
Reduces labor costs for infrastructure management, minimizing errors that lead to downtime or extra resource usage. The complex and growing infrastructure can be effectively managed without increasing the DevOps team.
Minimizing risks
Reduces the likelihood of human errors during configuration. Since all changes are transparent and code can be reviewed, the system becomes more reliable. It also simplifies disaster recovery procedures.
Compliance with regulations
Since all system configurations and changes are stored in code, it’s easy to verify their compliance with regulations. Instead of manually applying security policies, they can be described using code using tools like Open Policy Agent.
Long-term value
Infrastructure as code is a valuable asset that remains with the client and can be easily adapted to future needs.
Full ownership and control
As our client, you gain full control over your infrastructure through code, no dependency on a 'black box' or specific engineers. You can develop and maintain the infrastructure with any team familiar with Terraform/GCP.
Transparency and clear outcome
The IaC code is a measurable and transferable result of the infrastructure development project. You know exactly what you’re paying for and how your infrastructure is configured.
Reduced risk of vendor lock-in
While the infrastructure is based on GCP services, owning the code provides greater flexibility and control over the configuration, simplifying potential migrations or changes in service providers.
Predictability
Infrastructure changes become more predictable and controlled.
Data
management
We manage data using databases and data storages. Databases are organized systems that store structured data, such as customer information, financial records, inventory details, and more. Data storages are more general and hold large amounts of data, like files, media or backups.
Databases
We use Google Cloud SQL databases for the production environment and databases deployed inside a Kubernetes cluster like PostgreSQL and MySQL at the development stage.
Tech value:
1
Cloud SQL provides high data availability, automatic backups and system updates, and scalability for mission-critical production data, relieving the team from database administration tasks.
2
In-cluster databases are easily and quickly deployed for development and testing, providing an isolated environment for each feature or developer without the need to set up external services.
3
Environments are separated — a stable, reliable, and secure environment for production and a flexible, lightweight environment for development.
Business value:
Data reliability and security
Cloud SQL guarantees that data is always available and secured which is critical for businesses.
Faster development
Developers can quickly obtain their own database instances for work and testing, without waiting for resource allocation and without impacting each other.
Lower development costs
Using cluster resources for temporary databases is cheaper than maintaining separate Cloud SQL instances for each development task.
Lower operational expenses
Delegating database management tasks to Google Cloud allows the team to focus on developing the app rather than managing the database infrastructure.
Data storages
We store all user data in Google Cloud Storage (GCS).
Tech value:
We can store and access any volume of data with virtually unlimited scalability and high throughput.
Data is highly reliable and durable, since it’s stored in multiple copies on multiple devices. If one piece of equipment fails, the data will still be accessible from other devices or copies. GCS provides 99.999999999% annual durability for the standard class.
Provides flexible storage classes — Standard, Nearline, Coldline, and Archive — to optimize costs based on how frequently the app needs to access data.
GCS can be integrated with other GCP services for data processing, analytics, CDN, etc. and standard APIs that apps will easily access.
Since data storage is separated from compute resources of GKE and Cloud Run, it’s simpler to scale and update the app.
GCS has built-in security mechanisms: identity and access management (IAM), encryption of data at rest and in transit.
Business value:
Economic efficiency
You pay only for the actual storage space and operations used, it’s also possible to reduce costs using different storage classes and lifecycle policies.
Data preservation guarantee
Business risks associated with the loss of user data are reduced, thanks to the high durability of GCS.
Unlimited scalability
Storage infrastructure automatically adapts to the growth of user data volumes without the need for planning and purchasing equipment.
Global availability
We can place data closer to users to reduce latency when using CDN.
Reduced operational costs
No need to manage physical storage infrastructure, backups, and maintenance.
Security and compliance
We ensure our infrastructure protects sensitive data from unauthorized access and breaches. It also complies with major regulations which helps our clients avoid legal penalties. To make it possible, we use several tools.
Access control
We use Identity and Access Management (IAM) for granular access control — this means detailed settings that define which actions are allowed for each user or group of users. To isolate the network, we use Virtual Private Cloud (VPC) and Firewall rules.
Tech value:
1
IAM allows us to implement the principle of least privilege by granting users, groups, and service accounts only the necessary permissions to perform their tasks at the level of individual GCP resources.
2
VPC provides logical isolation of resources in the cloud, allowing us to create private networks with their own range of IP addresses.
3
Firewall rules in VPC control incoming and outgoing traffic at the level of virtual machines and other resources, ensuring network segmentation and protection against unauthorized access.
4
It’s possible to create multiple VPCs or subnets for separating environments or different parts of an app.
5
We can use Private Google Access and VPC Service Controls to securely access Google APIs from a private network without going through the public internet.
Business value:
Increased security
Strict access control and network isolation minimize the attack surface and potential damage from compromised accounts or vulnerabilities.
Compliance
These measures help meet regulatory and industry requirements for data security and network architecture — e.g., GDPR, HIPAA, PCI DSS.
Risk reduction
Decreasing the likelihood of accidental or malicious actions that could lead to data breaches, service downtime, or financial losses.
Operational stability
The isolation of environments prevents issues in one environment from affecting another.
Audit and control
IAM and network logs help to track actions and investigate security incidents.
Secure authorization for services
We use workload identity mechanisms — Workload Identity in GKE and Service Accounts in Cloud Run — for secure service authorization without storing static keys. Static keys can be vulnerable to theft, instead, these mechanisms use temporary tokens that are automatically refreshed.
Tech value:
Business value:
Significant increase in security levels
There’s no risk of leakage of long-lived keys, which are a common target for attacks. The attack surface is also reduced.
Simplified secret management
Since there’s no need to manage the lifecycle of static keys, it reduces operational burden and the likelihood of errors.
Simplified compliance
Auditing and access control are simplified as authorization is based on standard IAM mechanisms, not on key management.
Higher automation reliability
CI/CD pipelines obtain the necessary permissions automatically and securely, without the need to embed keys in task configurations.
Reduced operational risks
Decreases the likelihood of security incidents related to key compromise.
Compliance with regulations
Infrastructure solutions based on Google Cloud Platform are developed in compliance with the leading security and compliance standards, simplifying the certification process for our clients.
Tech value:
Certified platform
GCP complies with numerous standards such as SOC 2, HIPAA BAA, GDPR, providing clients with a foundational, standards-compliant model.
Built-in controls
The services we use provide the necessary technical mechanisms to meet standard requirements:
Automation and IaC
Using Infrastructure as Code, specifically Terraform, enables repeatable, documented, and auditable infrastructure configurations, which is essential for passing audits.
Business value:
Faster and simpler certification
Using ready-made, standard-compliant components and practices significantly reduces the time and resources needed to prepare for and pass SOC 2, HIPAA, or GDPR audits.
Access to regulated markets
Companies can operate in industries like healthcare and finance or regions like the EU where compliance with these standards is mandatory or a competitive advantage.
Customer and partner trust
Demonstrating compliance with recognized security and privacy standards strengthens your company's reputation.
Reduced risks
The likelihood of data breaches, fines for non-compliance, and associated reputational losses is much lower.
Competitive advantage
Being ready for certification can be a key factor for customers handling sensitive data.
Secret management
Secrets are sensitive data like API keys, passwords, certifications, etc. We use Google Secret Manager and built-in mechanisms of Kubernetes Secrets to securely manage secrets.
Tech value:
Centralized and secure storage for all secrets with encryption and access control through IAM.
It’s possible to manage versions and rotate secrets, which means we can automatically update secrets.
We can integrate with Cloud Run and GKE via CSI driver to automatically mount secrets into applications without storing them in code or configuration.
It’s possible to audit access to secrets.
We can use Kubernetes Secrets for cluster-specific secrets, and integrate with external storages such as Secret Manager.
Business value:
Increased security
Using these tools reduces the risk of sensitive data leaks through centralized management, encryption, and access control.
Simplified compliance
It helps apps meet security standards requirements for secret management.
Reduced operational risks
It decreases the likelihood of incidents related to the mishandling of secrets.
Simplified management
Centralized management simplifies key rotation and access auditing.
DevOps tools
We use DevOps practices and tools to automate and accelerate development, testing, and deployment.
Automation
We launch CI/CD runners — executors for build, test, and deployment tasks — inside a Google Kubernetes Engine cluster.
Tech value:
Dynamic scaling
Kubernetes automatically creates and removes pods with runners as needed, ensuring resources are available for performing CI/CD tasks without downtime or excessive capacity.
Optimized resource usage
Runners consume cluster resources only during task execution, which is more efficient than constantly running dedicated virtual machines. Kubernetes effectively distributes the load across cluster nodes.
Environment consistency
Each CI/CD task runs in an isolated pod with a clearly defined environment — Docker image. Since the environment is always the same, repeat executions of the task will always yield the same result, so all builds and tests are reproducible.
Configuration flexibility
It’s easy to set up specific environments for different types of tasks like backend build, frontend build, or test execution using different images or pod configurations.
Simplified management
The management, monitoring, and logging of runners are centralized using Kubernetes tools. It means that all information and management tools are located in one place, which simplifies the processes of configuration and control.
Business value:
Reduced CI/CD costs
You pay only for the actual GKE resources consumed during task execution, especially when using cluster autoscaling. We don’t maintain idle dedicated machines for runners, so you don’t pay for them.
Faster time-to-market
Quick set up of runners shortens waiting times in CI/CD queues, making the building, testing, and delivery of changes to users faster.
Efficient use of Kubernetes investments
Instead of maintaining separate resources specifically for CI/CD tasks, we use the already available resources within the GKE cluster. It cuts the costs and maximizes the return on infrastructure investment.
Improved security
Runners operate within a secure VPC cluster network, using Kubernetes service accounts and IAM for secure access to other GCP resources. Secrets are managed via Kubernetes Secrets or integrated solutions.
Increased developer productivity
Less waiting time = faster feedback on code and tests.
Isolated clusters for development and testing
We create dynamic, isolated environments on demand — for example, for branch testing, demos, or different CI/CD stages. To do this, we use Kubernetes namespaces within a single GKE cluster or, if necessary, separate GKE clusters. Namespaces are a way to divide cluster resources between multiple users or applications.
Tech value:
Resource isolation
Kubernetes namespaces provide logical isolation for applications, configurations, secrets, and network policies within a single cluster. This prevents overlap or interference from one namespace to another.
Resource control
We can set quotas with ResourceQuotas and limits with LimitRanges on CPU/memory usage for each namespace, preventing environments from affecting each other. Each namespace uses only the resources that are allocated to it.
Network segmentation
Network Policies allow strict control of traffic between namespaces and pods.
Flexibility
We can easily create and delete environments via automation, which is ideal for temporary environments like feature branches and QA.
Efficiency
Using a single GKE cluster for multiple isolated environments saves resources compared to creating separate clusters for each environment.
Automation
We fully automate the environment lifecycle through CI/CD pipelines and IaC using Terraform and Terragrunt.
Full isolation (separate clusters)
Environments can be deployed in separate, fully independent GKE clusters for maximum isolation when necessary — for example, for production vs. non-production, or due to security/compliance requirements. These clusters are managed by the same IaC approach.
Business value:
1
Faster development, QA, and launch
Developers and QA teams can quickly create complete, isolated copies of the application to test new features or fixes without interfering with each other or waiting for shared testing environments. It also speeds up the launch of new features to the market.
2
Improved software quality
Thorough testing in an isolated, production-like environment before release reduces the number of errors in production.
3
Cost optimization
Since we create and delete dynamic environments on demand, you pay only for used resources. Using namespaces within a single cluster is often more cost-effective than separate clusters.
4
Reduced risks
Isolation prevents errors or experiments in one environment from affecting others, especially the production environment.
5
Business flexibility
It’s easy to create temporary environments for demonstrations, conducting PoCs, or A/B testing.
Resource management
We deploy apps in GKE using Helm package manager, and clearly define resource requests, limits and the number of replicas for all components of the application.
Tech value:
Helm simplifies managing the lifecycle of applications in Kubernetes: packaging in charts, installation, upgrade, rollback, and dependency management. Helm templating makes it easy to configure deployments for different environments from a single chart codebase.
Defining resource requests and limits for each container ensures predictable resource consumption, and helps the Kubernetes scheduler efficiently place pods. It prevents the ‘noisy neighbor problem' when one app starts using too many resources and affects other apps.
Specifying the number of replicas provides a basic level of app resilience and availability. Combined with the Horizontal Pod Autoscaler (HPA), which can automatically adjust the number of replicas based on load, we achieve automatic scaling. This way, the app is always available to users and effective no matter the load.
Helm charts provide a consistent, versioned way to describe and deploy applications.
Business value:
Reliability and stability
Clear definition of resources and replicas minimizes failures caused by resource shortages. It ensures stable operation of business applications and improves user experience.
Cost optimization
Setting limits prevents uncontrolled growth of resource consumption. Autoscaling in HPA lets us use resources efficiently by scaling up under load and down in its absence, which optimizes infrastructure expenses.
Faster deployment and time-to-market
Helm and standardized charts speed up and simplify the rollout process of new versions and apps, reducing the time from development to production.
Predictability
Managing deployments through Helm makes the process more controlled, repeatable, and less prone to errors compared to manual deployments.
Business scalability
The infrastructure is ready for increased load thanks to automatic scaling. As the number of users or transactions grows, the service will operate uninterrupted.
Deployment
We implement GitOps practices using tools like Argo CD and Flux CD for declarative management of app deployments in Kubernetes. GitOps uses Git, a version control system, for managing infrastructure and deployments.
Tech value:
1
Single source of truth
The Git repository becomes the sole source of truth for describing the desired state of applications in the cluster. It enhances the reliability, traceability, and security of application deployment processes.
2
Automatic synchronization
The GitOps operator in the cluster automatically monitors changes in the Git repository and aligns the cluster's state with the description.
3
Audit and traceability
All changes in the state of applications like commits and pull requests go through Git, ensuring a complete history and rollback options.
4
Enhanced security
Developers rarely need to access the cluster directly using commands like kubectl apply, since all changes are initiated through Git.
5
Consistency
The cluster's state always matches the description in Git. If something in the cluster changes, the system will automatically bring the cluster into alignment with what's described in Git. You can always be confident that your system operates as it was configured in the repository.
Business value:
Faster and more reliable deployments
Automation and a declarative approach make the process of rolling out new versions faster, safer, and more predictable.
Improved stability
The ease of rolling back to a previous working version via Git reduces risks during updates.
Increased developer productivity
Developers focus on code and its description in Git, rather than manual deployment operations.
Improved compliance
Transparent change history in Git simplifies the audit of deployment processes.
Infrastructure management
Infrastructure should efficiently respond to changing demands and potential issues, so we need to constantly monitor its performance and optimize it.
Monitoring
We ensure comprehensive monitoring, logging, and alerting using Google Cloud Operations Suite which includes tools like Cloud Monitoring, Cloud Logging, Error Reporting, Cloud Trace. We can also use integrated Open Source solutions like Prometheus, Grafana, and Alertmanager.
Tech value:
All metrics, logs, and traces from all infrastructure components and apps are collected and stored in one place.
We can set up dashboards for visualizing system status and key performance indicators.
Errors and performance bottlenecks are automatically detected and diagnosed using Error Reporting and Cloud Trace.
We can flexibly configure alerting rules for proactive problem notification via Email, Slack, PagerDuty, etc.
It’s possible to integrate the system with the Open Source stack if the project requires specific metrics or the team prefers so.
Business value:
Increased reliability and availability of services
The system quickly detects and responds to incidents, which minimizes downtime and impact on users.
Improved performance
The analysis of metrics and traces helps optimize the performance of apps and infrastructure, improving the user experience.
Proactive management
Alerts allow issues to be resolved before they affect customers.
Reduced operational costs
The time spent on diagnosing and resolving problems is decreased.
Informed decisions
Monitoring data helps make informed decisions about scaling, optimization, and system development.
Load balancing
We use Cloud CDN and Cloud Load Balancing to quickly and reliably deliver app content to users around the world.
Tech value:
Cloud Load Balancing distributes traffic among app instances to ensure high availability and scalability. It supports HTTP(S), TCP, and UDP traffic.
Cloud CDN caches static and dynamic content at the edge of Google's network, closer to users, reducing latency and load on backends.
The system is integrated with Google Cloud Armor which offers Web Application Firewall (WAF) functions for protection against DDoS attacks and other web threats.
The system automatically acquires and renews the SSL/TLS certificates. These are Google-managed certificates that provide authentication for web apps and enable an encrypted connection between the app and its users.
Business value:
Improved user experience
Fast page and content loading enhances user satisfaction and conversion.
Global reach
These tools ensure low latency for users in different geographical regions.
Increased availability and reliability
Load balancing and attack protection ensure that service operates stably.
Reduced outgoing traffic costs
Caching content on CDN reduces the volume of data transferred from origin servers.
Improved SEO
App loading speed is an important ranking factor in search engines.
Costs
A very rough estimate of the minimum starting cost for such an infrastructure is in the range of $50 - $70 per month. This estimate is for very low usage, and actual costs will depend on the resource consumption.
To find out the cost of infrastructure for your business, contact us and we will calculate the precise costs and provide you with an estimate.