Multimodal Foundation Models
From Specialists to General-Purpose Assistants
Seiten
2024
now publishers Inc (Verlag)
978-1-63828-336-2 (ISBN)
now publishers Inc (Verlag)
978-1-63828-336-2 (ISBN)
- Lieferbar (Termin unbekannt)
- Versandkostenfrei innerhalb Deutschlands
- Auch auf Rechnung
- Verfügbarkeit in der Filiale vor Ort prüfen
- Artikel merken
A comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants. The focus encompasses five core topics, categorized into two classes.
This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants.
The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics – methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics – unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs.
The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.
This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants.
The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics – methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics – unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs.
The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.
1. Introduction
2. Visual Understanding
3. Visual Generation
4. Unified Vision Models
5. Large Multimodal Models: Training with LLMs
6. Multimodal Agents: Chaining Tools with LLM
7. Conclusion and Research Trends
Acknowledgments
References
Erscheinungsdatum | 14.05.2024 |
---|---|
Reihe/Serie | Foundations and Trends® in Computer Graphics and Vision |
Verlagsort | Hanover |
Sprache | englisch |
Maße | 156 x 234 mm |
Gewicht | 330 g |
Themenwelt | Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik |
ISBN-10 | 1-63828-336-2 / 1638283362 |
ISBN-13 | 978-1-63828-336-2 / 9781638283362 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
Mehr entdecken
aus dem Bereich
aus dem Bereich
was sie kann & was uns erwartet
Buch | Softcover (2023)
C.H.Beck (Verlag)
18,00 €
von absurd bis tödlich: Die Tücken der künstlichen Intelligenz
Buch | Softcover (2023)
Heyne (Verlag)
20,00 €
dem Menschen überlegen – wie KI uns rettet und bedroht
Buch | Hardcover (2023)
Droemer (Verlag)
24,00 €