kaizuberbuehler 's Collections Foundation Models
updated
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published • 85
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published • 65
StarCoder: may the source be with you!
Paper
• 2305.06161
• Published • 33
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published • 61
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
• 2404.12387
• Published • 40
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
• 2404.07839
• Published • 48
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper
• 2404.07413
• Published • 38
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
Handling Resolutions from 336 Pixels to 4K HD
Paper
• 2404.06512
• Published • 30
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published • 40
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
• 2404.06395
• Published • 24
YaART: Yet Another ART Rendering Technology
Paper
• 2404.05666
• Published • 18
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
• 2404.03413
• Published • 27
Advancing LLM Reasoning Generalists with Preference Trees
Paper
• 2404.02078
• Published • 46
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published • 259
CogVLM: Visual Expert for Pretrained Language Models
Paper
• 2311.03079
• Published • 27
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published • 126
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published • 33
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published • 112
Tele-FLM Technical Report
Paper
• 2404.16645
• Published • 18
What matters when building vision-language models?
Paper
• 2405.02246
• Published • 103
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper
• 2405.12107
• Published • 29
Paper
• 2406.09414
• Published • 103
OpenVLA: An Open-Source Vision-Language-Action Model
Paper
• 2406.09246
• Published • 44
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual
Visual Text Rendering
Paper
• 2406.10208
• Published • 22
GEB-1.3B: Open Lightweight Large Language Model
Paper
• 2406.09900
• Published • 21
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
• 2406.11931
• Published • 69
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published • 117
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and
Illumination Disentanglement
Paper
• 2408.00653
• Published • 31
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published • 78
Paper
• 2408.07009
• Published • 62
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
• 2408.12570
• Published • 32
OLMoE: Open Mixture-of-Experts Language Models
Paper
• 2409.02060
• Published • 80
Paper
• 2409.00587
• Published • 33
Qwen2.5-Coder Technical Report
Paper
• 2409.12186
• Published • 153
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
• 2409.12191
• Published • 79
NVLM: Open Frontier-Class Multimodal LLMs
Paper
• 2409.11402
• Published • 74
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
• 2409.17146
• Published • 121
Making Text Embedders Few-Shot Learners
Paper
• 2409.15700
• Published • 29
EuroLLM: Multilingual Language Models for Europe
Paper
• 2409.16235
• Published • 29
stabilityai/stable-diffusion-3.5-large
Text-to-Image
• Updated • 60.9k
• • 3.38k
Paper
• 2412.16720
• Published • 37
NVILA: Efficient Frontier Visual Language Models
Paper
• 2412.04468
• Published • 60
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper
• 2412.03555
• Published • 133
Open-Sora Plan: Open-Source Large Video Generation Model
Paper
• 2412.00131
• Published • 33
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
• 2411.15124
• Published • 67
Cosmos World Foundation Model Platform for Physical AI
Paper
• 2501.03575
• Published • 82
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published • 302
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper
• 2501.06282
• Published • 53
Text-to-Speech
• Updated • 8.75M
• • 5.82k
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video
Understanding
Paper
• 2501.13106
• Published • 91
Qwen2.5-1M Technical Report
Paper
• 2501.15383
• Published • 72
Baichuan-Omni-1.5 Technical Report
Paper
• 2501.15368
• Published • 60
Atla Selene Mini: A General Purpose Evaluation Model
Paper
• 2501.17195
• Published • 35
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published • 257
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive
Modality Alignment
Paper
• 2502.04328
• Published • 29
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published • 58
NatureLM: Deciphering the Language of Nature for Scientific Discovery
Paper
• 2502.07527
• Published • 20
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
Paper
• 2502.09082
• Published • 32
mmE5: Improving Multimodal Multilingual Embeddings via High-quality
Synthetic Data
Paper
• 2502.08468
• Published • 16
Qwen2.5-VL Technical Report
Paper
• 2502.13923
• Published • 217
Magma: A Foundation Model for Multimodal AI Agents
Paper
• 2502.13130
• Published • 58
YuE: Scaling Open Foundation Models for Long-Form Music Generation
Paper
• 2503.08638
• Published • 72
Gemini Embedding: Generalizable Embeddings from Gemini
Paper
• 2503.07891
• Published • 46
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
Beyond
Paper
• 2503.10460
• Published • 30
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
• 2503.14456
• Published • 154
Qwen2.5-Omni Technical Report
Paper
• 2503.20215
• Published • 172
Wan: Open and Advanced Large-Scale Video Generative Models
Paper
• 2503.20314
• Published • 60
Paper
• 2503.19786
• Published • 55
Command A: An Enterprise-Ready Large Language Model
Paper
• 2504.00698
• Published • 29
SmolVLM: Redefining small and efficient multimodal models
Paper
• 2504.05299
• Published • 207
Paper
• 2504.07491
• Published • 138
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
• 2504.05599
• Published • 85
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Paper
• 2504.08685
• Published • 130
BitNet b1.58 2B4T Technical Report
Paper
• 2504.12285
• Published • 83
PerceptionLM: Open-Access Data and Models for Detailed Visual
Understanding
Paper
• 2504.13180
• Published • 20
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper
• 2503.14734
• Published • 6