Choosing the Right Generative AI Deployment: A Comprehensive Guide for CTOs and IT Leaders

This article is intended to guide CTOs and other IT leaders in making informed decisions about which deployment approach to choose for their specific use cases.

Key Findings and Recommendations

  • IT leaders are overwhelmed by the rapid development of generative AI capabilities.
  • There is an exponential increase in pre-trained generative AI models and applications, but aligning them with enterprise use cases and AI governance remains a challenge.
  • Many leaders are not fully aware of the variety of deployment approaches and their respective pros and cons.
  • Popular approaches include consuming models as embedded applications, embedding model APIs, and steering them via prompt engineering. An emerging approach is extending them via a retrieval-augmented generation architecture.
  • Gartner recommends understanding technical differences between approaches, analyzing pros and cons of each, accounting for critical decision factors, and monitoring emerging trends.

Strategic Planning Assumptions

  • By 2026, over 80% of enterprises will use generative AI APIs, models, or deploy AI-enabled applications in production, a significant increase from less than 5% today.
  • More than 70% of independent software vendors (ISVs) will embed generative AI in their enterprise applications by 2026.
  • By 2026, nearly 80% of prompting will be semi-automated.
  • By 2028, over 50% of enterprises building their own models from scratch will abandon their efforts due to costs, complexity, and technical debt.

Analysis and Deployment Approaches

  • Consume Generative AI Embedded in Applications
    • Example: Using design software with image generation capabilities.
    • Pros: Easier to deploy, no fixed costs, improvements translate to increased utility, easy integration.
    • Cons: Lack of flexibility, potential for inaccurate responses, less control over security and data privacy.
  • Embed Generative AI APIs in Custom Applications
    • Approach: Integrating generative AI via APIs in custom applications.
    • Pros: Easier implementation, lower fixed costs, effective few-shot learning.
    • Cons: Limitations in data transmission, nascent field of prompt engineering, backward compatibility issues.
  • Extend Generative AI Models via Data Retrieval
    • Approach: Using retrieval-augmented generation (RAG) to improve accuracy and response quality.
    • Pros: Incorporates up-to-date and domain-specific data, balanced approach, improved accuracy.
    • Cons: Limited by context window, increased latency, additional costs for new technology components.
  • Extend Generative AI Models via Fine-Tuning
    • Approach: Training a large pre-trained model on a new dataset.
    • Pros: Quick improvement for specific use cases, doesn’t require large amounts of data.
    • Cons: Significant inference costs, restricts future flexibility, potential overspecialization.
  • Build Custom Foundation Models from Scratch
    • Approach: Creating fully customized models for specific use cases or domains.
    • Pros: Highest potential accuracy, complete control over training and parameters.
    • Cons: High training and maintenance costs, need for ongoing access to AI researchers, risk of being outpaced by external innovation.

Decision Framework

  • Consider training and usage costs, with custom models being the most expensive.
  • Organizational and Domain Knowledge. Importance of injecting specific knowledge into models.
  • Control of Security and Privacy. Assessing the need for ownership and control over data and models.
  • Model Output Control. Addressing model quality, hallucination risks, and need for human oversight.
  • Implementation Simplicity. Weighing the simplicity and time-to-market benefits of different approaches.

In conclusion, is important to carefully evaluate each approach based on specific enterprise requirements and to stay updated with the rapidly evolving AI landscape.

Future Trends

  • The rise of open-source models.
  • Prompt tuning as a balance between prompt engineering and model fine-tuning.
  • Development of agents to maximize generative AI potential