Temporal Worker Optimization

Feb 14

3 min read

Temporal's workflow orchestration capabilities enable developers to build resilient systems, but achieving optimal performance requires thoughtful worker configuration. Workflow logic improvements, retry configuration and failure handling can significantly improve your workflow performance. This article moves beyond that to provide actionable strategies for tuning the Temporal workers, with special attention to reducing start-to-schedule latency.

Start-to-Schedule Latency

This critical metric measured as workflow_task_schedule_to_start_latency and activity_schedule_to_start_latencyindicates how quickly tasks move from scheduling by Temporal server on an individual worker to actual execution on that worker. This metric is one of the best ways to confirm your workers are configured properly and alert to issues with your system.

If all workers listening on a Temporal task queue are out of available slots, either because of load or slow workflows/activities, they will be unable to begin new tasks after polling which can result in latency. As a starting point we recommend alerting on these four metric targets.

Metric	Target
activity_schedule_to_start_latency	< 500ms p99
workflow_task_schedule_to_start_latency	< 500ms p99
workflow_task_execution_latency	< 1s p99
worker_task_slots_used	70-80% capacity

Once you load test your service bottlenecks will become clear. Without having completed tuning it is common to run out of task slots under high load.

Task Slot Management

Once you have identified bottlenecks from load testing and monitoring metrics, a likely next step is to optimize your workers' task slots. Selecting the correct slot allocation strategy for your workers' use case is foundational to how workers handle concurrency:

Strategy	Best For	Latency Consideration
Fixed Size	Predictable workloads	Requires accurate capacity planning to avoid queue buildup
Resource-Based	Dynamic environments	Automatically scales to maintain schedule-to-start targets
Custom	Specialized requirements	Enables micro-optimization for latency-sensitive tasks

In the majority of cases for the maxConcurrentWorkflowTaskExecutionSize is best set to a fixed size whereas maxConcurrentActivityExecutionSize and maxConcurrentLocalActivityExecutionSize can benefit from the other options.

The resource-based slot supplier excels at managing fluctuating workloads with low per-task resource consumption and provides crucial protection against out-of-memory errors and over-subscription in environments where task resource consumption is unpredictable. Unless you find your workloads extremely predictable or want maximum control we would encourage you to at least test out resource based auto-tuning.

In some cases the resource-based slot supplier maybe not be specific enough. For example, if your service requires intelligent scaling of database connections/pools a custom slot supplier could add an element of database connection scale to the basic resource-based slot supplier.

If your application has activities that always take the same amount of time/resources fixed size slot suppliers make sense. Keep in mind if change your workflows/activities is wise to re-evaluate your slot sizing values.

Task Poller Configuration

Another way to tune the performance of your workers is to set poller configuration on your workers. Generally, using the default setting for task pollers is good enough; however, in certain cases you may find your application latency responds well to poller configuration:

maxConcurrentWorkflowTaskPollers: Controls workflow task concurrency (default 2)
maxConcurrentActivityTaskPollers: Manages activity task throughput (default 5)

Increasing pollers reduces schedule-to-start latency but raises network overhead

Implementation Checklist

In most cases tuning the above parameters once simply won’t get your workers performing as desired and an iterative approach is required. Please keep in mind the following steps are predicated on defining solid resource usage/limits/threads for your service runtime and container in advance.

Baseline Measurement
- Record initial latency metrics during low/peak loads
- Profile memory/CPU usage patterns
Incremental Tuning
- Change one configuration at a time
- Adjust task slot allocation strategy/settings
- Adjust poller counts in 25% increments
Load Testing
- Simulate 2x expected peak traffic
- Validate key metrics are in good range

Conclusion

Effective Temporal worker tuning requires balancing three competing priorities: resource utilization, throughput and latency. By using the ideal slot allocation strategy and poller counts you can maintain sub-second latency even during high load. The key lies in continuous monitoring of execution patterns and adapting slot allocation configuration to match your workload characteristics.

Establish regular performance review cycles and leverage Temporal's built-in metrics to guide ongoing optimizations. When implemented correctly, these tuning techniques enable Temporal to handle everything from steady payment processing workflows to volatile inventory management systems while meeting strict SLAs