Verificar estado da encomenda

Faça parte de uma comunidade de amantes de livros de todo o mundo e tenha acesso a uma série de benefícios. Crie uma conta gratuitamente

Correio DHL 7.99 € Correio DPD 4.49 € Correio MRW 3.99 € Ponto DPD 3.99 €

Contacto

Como comprar

Ajuda

A minha conta

▸ Vazio :-(

AI Inference Optimization Engineering

Name: AI Inference Optimization Engineering
Brand: Independently published
SKU: 52770465
Price: 9.59 EUR
Availability: InStock
Author: ChatVariety Team
ISBN: 9798199720021

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

ChatVariety Team

Língua

Inglês

Livro Capa mole

Código Libristo: 52770465

Editoras Independently published, junho 2026

Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a mass... Descrição completa

Código Libristo: 52770465

23 b

Em breve

Novo

9.59 €

Reabastecimento esperado Lançamento 07. 06. 2026

Política de devolução de 30 dias

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Atriz & Poliglota

EWA KASP para

Reproduzir vídeo

A Libristo tem a maior seleção de literatura estrangeira. É por isso que compro os meus livros aqui.

Sobre o livro

Nome completo AI Inference Optimization Engineering

Autor ChatVariety Team

Língua

Inglês

Encadernação Livro - Capa mole

Data de emissão 2026

Número de páginas 96

EAN 9798199720021

Código Libristo 52770465

Editoras Independently published

Peso 142

Dimensões 152 x 229 x 5

Categorias

Informática & tecnologia da informação > Ciência informática > Inteligência artificial > Linguagem natural e tradução mecânica

Ofereça este livro hoje

É fácil

1 Adicione ao carrinho e escolha Entregar como presente ao finalizar a compra 2 Receberá um vale 3 O livro chegará ao endereço do destinatário

Procurado com frequência

Categories

Authors

Publishers

Procurado com frequência

Mercadoria

Categories

Authors

Publishers

Entrega

Guia de compras

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Sobre o livro

Categorias

Ofereça este livro hoje

É fácil

Procurado com frequência

Categories

Authors

Publishers

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Sobre o livro

Categorias

Ofereça este livro hoje

É fácil

Não tem uma conta? Descubra os benefícios de ter uma conta Libristo!