

Temporal's workflow orchestration capabilities enable developers to build resilient systems, but achieving optimal performance requires thoughtful worker configuration. Workflow logic improvements, retry configuration and failure handling can significantly improve your workflow performance. This article moves beyond that to provide actionable strategies for tuning the Temporal workers, with special attention to reducing start-to-schedule latency.
Start-to-Schedule Latency
This critical metric measured as workflow_task_schedule_to_start_latency and activity_schedule_to_start_latencyindicates how quickly tasks move from scheduling by Temporal server on an individual worker to actual execution on that worker. This metric is one of the best ways to confirm your workers are configured properly and alert to issues with your system.
If all workers listening on a Temporal task queue are out of available slots, either because of load or slow workflows/activities, they will be unable to begin new tasks after polling which can result in latency. As a starting point we recommend alerting on these four metric targets.
Metric | Target |
activity_schedule_to_start_latency | < 500ms p99 |
workflow_task_schedule_to_start_latency | < 500ms p99 |
workflow_task_execution_latency | < 1s p99 |
worker_task_slots_used | 70-80% capacity |
Once you load test your service bottlenecks will become clear. Without having completed tuning it is common to run out of task slots under high load.
Task Slot Management
Once you have identified bottlenecks from load testing and monitoring metrics, a likely next step is to optimize your workers' task slots. Selecting the correct slot allocation strategy for your workers' use case is foundational to how workers handle concurrency:
Strategy | Best For | Latency Consideration |
Fixed Size | Predictable workloads | Requires accurate capacity planning to avoid queue buildup |
Resource-Based | Dynamic environments | Automatically scales to maintain schedule-to-start targets |
Custom | Specialized requirements | Enables micro-optimization for latency-sensitive tasks |
In the majority of cases for the maxConcurrentWorkflowTaskExecutionSize is best set to a fixed size whereas maxConcurrentActivityExecutionSize and maxConcurrentLocalActivityExecutionSize can benefit from the other options.
The resource-based slot supplier excels at managing fluctuating workloads with low per-task resource consumption and provides crucial protection against out-of-memory errors and over-subscription in environments where task resource consumption is unpredictable. Unless you find your workloads extremely predictable or want maximum control we would encourage you to at least test out resource based auto-tuning.
In some cases the resource-based slot supplier maybe not be specific enough. For example, if your service requires intelligent scaling of database connections/pools a custom slot supplier could add an element of database connection scale to the basic resource-based slot supplier.
If your application has activities that always take the same amount of time/resources fixed size slot suppliers make sense. Keep in mind if change your workflows/activities is wise to re-evaluate your slot sizing values.
Task Poller Configuration
Another way to tune the performance of your workers is to set poller configuration on your workers. Generally, using the default setting for task pollers is good enough; however, in certain cases you may find your application latency responds well to poller configuration:
maxConcurrentWorkflowTaskPollers: Controls workflow task concurrency (default 2)
maxConcurrentActivityTaskPollers: Manages activity task throughput (default 5)
Increasing pollers reduces schedule-to-start latency but raises network overhead
Implementation Checklist
In most cases tuning the above parameters once simply won’t get your workers performing as desired and an iterative approach is required. Please keep in mind the following steps are predicated on defining solid resource usage/limits/threads for your service runtime and container in advance.
Baseline Measurement
Record initial latency metrics during low/peak loads
Profile memory/CPU usage patterns
Incremental Tuning
Change one configuration at a time
Adjust task slot allocation strategy/settings
Adjust poller counts in 25% increments
Load Testing
Simulate 2x expected peak traffic
Validate key metrics are in good range
Conclusion
Effective Temporal worker tuning requires balancing three competing priorities: resource utilization, throughput and latency. By using the ideal slot allocation strategy and poller counts you can maintain sub-second latency even during high load. The key lies in continuous monitoring of execution patterns and adapting slot allocation configuration to match your workload characteristics.
Establish regular performance review cycles and leverage Temporal's built-in metrics to guide ongoing optimizations. When implemented correctly, these tuning techniques enable Temporal to handle everything from steady payment processing workflows to volatile inventory management systems while meeting strict SLAs