Real-Time Inference
← Back to Model Serving
Serving model predictions on demand via APIs with low latency requirements. Fresh results but constrained by model inference time and infrastructure cost.
Related
- Batch Inference (offline alternative)
- Serving Frameworks (tools for serving)
- Model Optimization (reduce latency)