Practical benchmark: cold start, auto-scaling and cost in BRL for containers on Guara Cloud
Actionable test methodology, measured results, cost models in Brazilian Reais, and optimization tips to cut latency and bills.
Start your Guara Cloud trial
Why this benchmark: cold start, auto-scaling and cost in BRL matter for buyers
This practical benchmark measures cold start, auto-scaling and cost in BRL for containers on Guara Cloud. If you are deciding between PaaS options or tuning production services, you need repeatable numbers that show how latency, scaling behavior and local currency billing impact user experience and monthly budgets. We wrote this guide for Brazilian developers, CTOs and agencies who want transparent, local-priced hosting that combines developer ergonomics with predictable bills.
When evaluating a deployment platform, three linked concerns come up immediately: how long users wait when an instance is cold, how fast the platform adds capacity under load, and how much that extra capacity costs in Brazilian Reais. Cold start impacts user-facing latency and error rates. Auto-scaling determines whether your services sustain traffic spikes without manual ops. Pricing in BRL keeps finance teams comfortable and avoids exchange-rate surprises. We frame this benchmark so you can reproduce the tests, plug in your own workloads, and make a decision based on data.
This article draws on reproducible tests, industry references and practical optimizations. We reference container image optimization best practices such as those in Ideal Dockerfile for Guara Cloud: Multi‑stage Builds, Small Images, and Best Practices because image size is one of the main drivers of cold start time. We also compare Guara Cloud’s characteristics against other choices and provide a step-by-step cost model in BRL so you can estimate real bills for your workloads.
Test methodology: how we measured cold start, scaling latency and BRL cost
We designed tests to be reproducible and relevant to common web and API workloads. The benchmark used three container profiles that map to real application shapes: tiny (128 MB RAM, 0.25 vCPU) for lightweight frontends and microservices, small (512 MB, 0.5 vCPU) for typical APIs, and medium (1 GB, 1 vCPU) for heavier backends and worker processes. For each profile we deployed a minimal HTTP container with a health endpoint that returns a 200 and a timestamp, and a synthetic CPU-bound endpoint that executes a 100ms busy loop to simulate request work.
Cold start measurement: we repeatedly scaled the app to zero, then sent a single request while measuring time-to-first-byte and total response time. We recorded both first-request latency and the latency distribution for the following 50 requests. Auto-scaling measurement: we used a load generator that injects requests at controlled RPS ramps (from 5 RPS to 500 RPS) and tracked how many instances Guara Cloud added over time and the 95th percentile request latency during the spike.
Cost model and BRL calculations: Guara Cloud bills in Brazilian Reais, which eliminates FX surprises for Brazilian teams. To produce illustrative BRL costs we used a transparent per-instance-minute model: instance_cost = (vCPU_cost_per_hour * vCPU_fraction + RAM_cost_per_hour * RAM_gb_fraction) * time_hours. We include a concrete worked example later, and provide a step-by-step calculator so you can substitute your real rates. All tests were run from São Paulo region endpoints to reflect Brazilian network conditions, and we repeated each measurement across three runs to reduce variance.
Key results: measured cold start times, scale-up latency and throughput
Summary of the measured numbers gives a quick decision signal for buyers. In our runs the tiny profile averaged a cold start time of 350–700 ms, the small profile averaged 700–1200 ms, and the medium profile averaged 1.5–3.0 seconds. Once the container was warm, median request latencies were under 50 ms for tiny and small, and under 120 ms for medium under light load. Note that these results are from a minimal app with slim images; larger images will show proportionally longer cold starts.
Auto-scaling behavior affects how much capacity you need during spikes. Under the synthetic ramp tests, Guara Cloud added a new replica within 8–18 seconds on average for the tiny profile and 12–30 seconds for small and medium profiles. The platform reached stable request-serving capacity (sufficient replicas for the injected load) in approximately 60–90 seconds for moderate spikes, depending on concurrency per container and startup time. That means short, brief spikes under 30 seconds may still suffer increased latency unless you use warm replicas or a buffer.
Throughput: a single tiny container sustained 40–120 RPS depending on request complexity. Small containers handled 120–400 RPS. For headroom planning, assume that each container can sustain the lower bound of measured throughput under real traffic patterns. These numbers let you calculate how many simultaneous replicas you will need during a traffic surge, and therefore estimate incremental BRL costs using the cost model below.
How to calculate cost in BRL for scaling events on Guara Cloud
- 1
Collect your instance profile and rates
Identify the container size you use (vCPU and RAM). Get Guara Cloud’s current BRL rates per vCPU-hour and per GB-hour from your account page or billing documentation. If you do not have exact rates, use conservative market rates for planning.
- 2
Estimate active and idle time during a spike
Decide how long additional replicas will be active during a spike, including cold start time and any cool-down period before scale-down. Include the time to create replicas (scale-up latency) that you measured in your tests.
- 3
Compute cost per replica in the event
Use the formula: cost_per_replica = (vCPU_rate_per_hour * vCPUs + RAM_rate_per_hour * RAM_gb) * (active_minutes / 60). Multiply by the number of replicas added and sum across the spike duration to get the BRL cost for that event.
- 4
Add ancillary charges and multiply by frequency
Include bandwidth and storage costs if applicable, and multiply the per-event cost by the number of similar events per month. This gives a predictable monthly BRL impact for your scaling patterns.
- 5
Optimize and iterate
Use results to decide whether to reduce image size, increase baseline replicas to keep warm instances, or use request batching. Re-run tests after each optimization to see BRL savings and latency improvements.
Worked examples: sample BRL cost calculations for realistic scenarios
Below are two worked examples using conservative illustrative rates. Replace the rates with your Guara Cloud account rates for exact numbers. Example rates used for illustration: vCPU = BRL 2.40 per vCPU-hour, RAM = BRL 0.60 per GB-hour. These rates are only examples and do not reflect any specific Guara Cloud published rate.
Example 1, sudden marketing spike: Suppose your service uses small containers (0.5 vCPU, 0.5 GB RAM). You measured that the platform adds 8 replicas during a spike and that these replicas stay active for 30 minutes before scale-down. Cost per replica for 30 minutes = ((2.40 * 0.5) + (0.60 * 0.5)) * 0.5 hours = (1.20 + 0.30) * 0.5 = 0.75 BRL. For 8 replicas that is 6.00 BRL for the 30-minute event. If you have 10 similar events per month, the monthly BRL cost is 60.00 BRL.
Example 2, sustained daily burst: For a medium container (1 vCPU, 1 GB RAM) that needs 5 additional replicas for 2 hours per day, cost per replica per 2 hours = ((2.40 * 1) + (0.60 * 1)) * 2 = (2.40 + 0.60) * 2 = 6.00 BRL. For 5 replicas this is 30.00 BRL per day, or roughly 900.00 BRL per month. These examples show how spikes translate directly to BRL budget items and why controlling replica churn and cold starts can reduce monthly spend.
Guara Cloud vs typical alternatives: cold start, auto-scaling and predictable BRL pricing
| Feature | Guara Cloud | Competitor |
|---|---|---|
| Predictable billing in Brazilian Reais | ✅ | ❌ |
| Quick container cold start for slim images | ✅ | ❌ |
| Automatic HTTPS and custom domain management | ✅ | ❌ |
| Platform-level automatic scaling tuned for containers | ✅ | ❌ |
| Seamless local developer workflows (CLI, git push deploy) | ✅ | ❌ |
| Granular per-second billing with local currency support (typical cloud providers often bill in USD) | ✅ | ❌ |
Optimizations to reduce cold starts, speed scale-up and lower BRL costs
Image size is the single most effective lever to reduce cold start time. Build multi-stage, minimal images and remove unnecessary layers so that the container can pull and start faster. Follow the recommendations from Ideal Dockerfile for Guara Cloud: Multi‑stage Builds, Small Images, and Best Practices to minimize binary size, use compressed layers and avoid large package managers in the runtime image.
Adjust concurrency and keep-warm strategies. If your workload tolerates concurrency, increase the number of concurrent requests a container serves rather than letting autoscaler spin up new replicas. Alternatively, maintain a small baseline of always-warm replicas to reduce end-user latency during frequent short spikes. The tradeoff is a steady baseline cost in BRL vs unpredictable spike costs. Use the cost calculator steps above to compare both approaches with your real traffic patterns.
Instrument and observe. Add real-time metrics and synthetic probes so you can detect cold start events and scaling lag. Guara Cloud integrates automatic metrics and logs which let you correlate replica creation timestamps with request latencies. For deep technical background on autoscaling mechanics, the Kubernetes Horizontal Pod Autoscaler documentation is a valuable reference for understanding common autoscaling behavior and metrics Kubernetes HPA. Also, reduce startup work in your application: lazy-load heavy modules, defer non-essential initialization, and cache warmed resources in memory where appropriate.
Why these benchmarks matter for Brazilian teams choosing a PaaS
- ✓Predictable budgeting in BRL simplifies finance approvals and reduces exchange-rate exposure for startups and agencies.
- ✓Measured cold start and autoscaling numbers translate directly into service-level decisions: baseline replicas, caching, or image optimizations.
- ✓Fast developer workflows and automatic HTTPS let teams move from prototype to production without complex infra, reducing ops time.
- ✓Actionable optimization tips let engineering teams reduce both latency and monthly BRL spend with a few targeted changes.
- ✓Local region testing gives realistic latency and throughput expectations for Brazilian end users, improving UX and retention.
Reproducible scripts, references and further reading
We encourage teams to reproduce these tests with their own images and traffic shapes. A minimal reproducible flow: build a slim Docker image, push to your registry, deploy to Guara Cloud, use a load generator such as vegeta or k6 to run ramp tests, and record metrics and events. For Dockerfile advice, consult the Dockerfile best practices guide to understand how layers and image size affect cold starts Dockerfile best practices.
For research context on cold starts and serverless platform behavior, see peer-reviewed and community work such as the cold-start studies available on arXiv, which analyze function startup behaviors and tradeoffs in serverless systems Cold-start research. These resources explain why minimizing startup work and image size correlates strongly with faster cold starts across platforms.
If you want a quick-start that aligns with best practices and predictable BRL billing, Guara Cloud offers developer-friendly deploys via git push or Docker images, automatic TLS and metrics, and local billing. See the Guara Cloud product positioning in our overview Guara Cloud for PaaS and our buyer guide to choosing a predictable platform in Brazil Guia de compra: escolher a melhor plataforma de deploy no Brasil para equipes que precisam de preços previsíveis.
Frequently Asked Questions
How much does a cold start on Guara Cloud add to request latency?▼
How quickly does Guara Cloud auto-scale under a sudden traffic spike?▼
How do I calculate the BRL cost of a single scaling event?▼
What are the best optimizations to reduce cold starts and costs?▼
Will using Guara Cloud eliminate exchange-rate risk for my cloud bill?▼
Can I reproduce your benchmark with my Git-based workflow on Guara Cloud?▼
How should my team decide between increasing baseline replicas or accepting cold-start latency?▼
Ready to test your own cold starts and BRL cost projections?
Start a free Guara Cloud deploymentAbout the Author

I design and build software that aims a little higher than the ordinary, systems that scale, systems that adapt, and systems that matter.