EXEED AI

Chip Huyen's Recent LinkedIn Posts

Chip Huyen

Chip Huyen

@chiphuyen

AI x stuff

en50 postsLinkedIn

Posts

Chip Huyen

Tech & AI

7mo

Had a great time chatting with Lenny Rachitsky on building AI products! Some of the key points I tried to make: 1. Many AI product problems aren’t because of AI. It’s usually because of user experience, data quality, or organizational structure. A chatbot failed to get traction because their targeted users simply couldn’t type (because their hands were usually busy -- taking care of kids or driving), so showing pre-populated questions and adding a voice option significantly improved traction. Another team told me their lead scoring model was broken. It turns out that it’s because the marketing team wasn’t asking the right questions to get data. The biggest product improvements still come from understanding your users, preparing your data, and investing in your team! 2. Senior engineers see the most productivity improvement with AI coding because they have more experience with writing design docs and API specs, which help them write better instructions. However, they’re also more resistant to using AI for coding. Senior folks are often more opinionated and get frustrated easily when AI doesn’t do what they want.     3. Many teams spend a lot of time debating which tool to use, which can be counter-productive. When teams ask me which of the 2 tools to use, I usually ask 2 questions: “How much performance improvement will the optional tool give over the less optimal one?” --> If the improvement is small, then spend less time debating. “How hard is it to change from one tool to another once you’ve adopted it?” --> If the tool is new and not yet battle tested, I’d think twice about adopting something that I can’t get out later. 4. Many people know that the most effective way to learn AI is to build with AI. Yet, people keep asking me: “But what should I build?” We seem to be having an “idea crisis”. We have all these wonderful tools to help us build things, and no idea what to build. An exercise I often recommend is to spend a week noticing what frustrates you in your daily work, then build small tools to solve those specific pain points.
780

Chip Huyen

Tech & AI

27mo

To understand the open source AI landscape, I went through the most popular AI repositories on GitHub, categorized them, and studied their growth trajectories. Here are some of the learnings: https://lnkd.in/exR-PGZX 1. There are 845 generative AI repos with at least 500 stars on GitHub. They are built with contributions from over 20,000 developers, making almost a million commits. 2. I divided the AI stack into four layers: application, application development, model development, and infrastructure. The application and application development layers have seen the most growth in 2023. The infrastructure layer remains more or less the same. Some categories that have seen the most growth include AI interface, inference optimization, and prompt engineering. 3. The landscape exploded in late 2022 but seems to have calmed down since September 2023. 4. While big companies still dominate the landscape, there’s a rise in massively popular software hosted by individuals. Several have speculated that there will soon be billion-dollar one-person companies. 5. China's open source ecosystem is rapidly growing. 6 out of 20 GitHub accounts with the most popular AI repos originate in China, with two from Tsinghua University and two from Shanghai AI Lab. #generativeai #aiapplications #llmops
3.1K

Chip Huyen

Tech & AI

17mo

Really enjoyed the conversation with Matt Turck. We covered many topics: 1. Pre-training and post-training Different models (e.g. Claude and ChatGPT) have fairly similar pre-training since everyone uses the same Internet data and optimizes for the same metric (perplexity), post-training is where these models differ. 2. Perplexity and the end of pre-training? For a while now, we’ve observed that lowering a language model’s entropy/perplexity leads to better downstream applications’ performance. How long can this trend continue? Is there a lowest possible lower bound for entropy? 3. Sampling The process of a model picking an output out of so many different outputs, and how it's a quick and easy way to get the performance boost without prompting or finetuning. 4. Evaluating AI systems While entropy is important for model training, most application developers only care about application performance. What metrics should we use to evaluate an application? 5. System prompt vs. user prompt And why application developers need to write system prompts in a way that increase application safety. 6. AI agent and planning One challenge to training a model for planning is in obtaining labeled data. For a given task, we can get humans to annotate the best plan, but what’s efficient for humans isn’t necessarily efficient for AI, and vice versa. For example, summarizing 1000 webpages must be very tedious for humans, but straightforward for an agent that can browse webpages in parallel. Matt also asked many fun questions that are often taken for granted, such as: 1. What is special about language models that allow them to scale? 2. Why do larger models need more data to train? Random: I’m super impressed with Matt’s pronunciation of my last name. https://lnkd.in/gY7tFqAN I spoke a little too fast, and it was quite an exercise to explain these very technical concepts without a whiteboard. If you find any part of the podcast confusing, the book goes into all these topics in much more detail, with more examples and visualization! #AIEngineering #AIApplications #MLEngineering
741

Chip Huyen

Tech & AI

26mo

I’m often asked what problem I’d solve if I were to start another company. I probably won’t do a startup any time soon (because startups are hard), but here are some of the problems I find interesting. If you’re solving any of them, I’d love to chat. 1. Data synthesis: AI has become really good both at generating and annotating data. The challenge now is to make sure that the generated data is safe and legal, e.g. not violating any IP. 2. Evaluation: evaluation has gotten so much harder with LLMs, both because many people treat models as blackboxes (we deploy models someone else developed for us) and because outputs can be open-ended. At the same time, investment in evaluation is nowhere close to investment in model or application development. I’d like to see more of arena-style evaluation, embedding evaluation, human-in-the-loop evaluation, as well as small, specialized scorers (instead of using large models like GPT-4 as judges). 3. Energy: the bottleneck to scaling AI is no longer compute but electricity. I’m interested in all energy-related problems, including both new energy sources and energy trading. 4. Any application that allows you to collect unique data that nobody has. I’ve heard concerns about building applications that seem to be “wrappers” around popular APIs. If you can get to the market early and gather sufficient data to continually improve your product, data is your moat. 5. GPU-native everything: many data science toolings, including scikit-learn, pandas, and Spark, aren’t built to run natively on GPUs. There have been efforts to make these tools more efficiently leverage GPUs, but I think there’s still a lot of room for the software layer for GPUs (and not just NVIDIA GPUs). 6. Heat recovery and distribution for GPU data centers: GPUs produce a massive amount of heat. We need better technologies to harness and utilize this excess heat. 7. Curated Internet: bots are already ruining dating apps, search (bots are incredibly good at SEOs), and social media. I’d like to be able to set a boundary for my Internet, e.g. to limit the search results to those written by people I trust, or sources verified to be human. #aiengineering #startups #aiapplications
3.7K

Chip Huyen

Tech & AI

17mo

My 8000-word note on agents, covering: 1. An overview of agents 2. How the capability of an AI-powered agent is determined by the set of tools it has access to and its capability for planning 3. How to select the best set of tools for your agent 4. Whether LLMs can plan and how to augment a model’s capability for planning 5. Agent’s failure modes Link: https://lnkd.in/gkCgk38F AI-powered agents are an emerging field with no established theoretical frameworks for defining, developing, and evaluating them. This post is a best-effort attempt to build a framework from the existing literature, but it will evolve as the field does. As always, feedback is much appreciated! #AIengineering #AIApplications #agents
4K

Chip Huyen

Tech & AI

28mo

Very proud of our team for their work on ibis streaming which is released this week! GitHub repo: https://lnkd.in/eMcBZQvu ibis’ core idea is to decouple the interface from the execution engine. For many data engines, the interface is the execution engine. To run your code in Snowflake, you write SnowflakeSQL. To use pandas, you write pandas syntax. ibis introduced an intermediate layer (compiler) that lets you write your code once, in Python, and run it in any execution engine. Two example use cases of ibis: 1. Transition workloads from experimentation to production. During experimentation, run your code locally in pandas/polars. When ready for production, run your code in Snowflake/Spark/Flink. No code rewriting is needed. 2. Move your workloads between data platforms. If your team is on multiple data platforms, or if one day you decide to migrate from Databricks to Snowflake, you just need to swap out the underlying execution engine without having to rewrite code. Today, ibis supports 20 backends, with 2 streaming engines added this week (Flink and Rising Wave). AFAIK, this makes ibis the first Dataframe API that supports both batch and streaming use cases, across workloads of different scales, from local experimentation to distributed workloads (and yes, we did the 1 billion row challenge!). A bit of history: ibis was originally started by Wes McKinney (pandas creator) to create a better abstraction for dataframes. The project was later picked up by a group of experienced maintainers of Arrow, dask, and pandas. Our team found ibis when we were trying to unify batch and streaming, and realized that to do so, we needed an intermediate layer like ibis. ibis is fully open-sourced! Would love to hear your feedback on the project! #dataengineering #dataframe #pandas
1.6K

Chip Huyen

Tech & AI

36mo

Hi there, I’m incredibly excited to be speaking at DataConnect Conference in July, and honored to share the stage with so many leaders in AI and data science that I’ve long been a big fan of. I’ll be talking about the challenge of deploying traditional ML and LLM applications. The organizers will be giving away 100 copies of Designing Machine Learning Systems at the conference. This will be my first time in Ohio, and I'm hoping to meet a lot of people in person. Come join us! #mlops #datascience #conference2023 #midwest
547

Chip Huyen

Tech & AI

35mo

Micro-mentorship Many young people have asked me questions such as: “Should I do a master/PhD?” or “How do I prepare for this interview?” My first response is usually: “Do you have friends you can talk to about this?” What surprised me is how often the answer is no. Most people understand the value of mentorship: learning many things from one mentor. Yet, most people underestimate the value of micro-mentorship: learning something from everyone. It’s very hard to find mentors who are much ahead of us career-wise: they are at different phases in life and care about different things. Even if we find one, it might be hard for mentors to relate to us, or for us to be completely open with them. I find myself learning the most from peers who are better than me at something. A coworker who spoke at a conference I want to speak at. A college friend who got a job at my dream company. A friend who has a lot more experience hiring than me. What helped me the most in the first few years after college was these friends/micro-mentors. I had a pact with a few friends: we checked in regularly to share the challenges we are facing, what we want to achieve, and how we are moving toward those goals. These friends kept me accountable, helped me talk things through, and occasionally helped me regain my confidence and perspective after the poor decisions I made. So, what if you don’t have friends you can talk to about important career and life decisions? Find new friends. You don’t have to stop spending time with your existing friends. You can have both. Put yourself in an environment where you can meet people who care about the same things you do: take online courses, befriend the course assistants, attend guest lectures and ask questions, talk to organizers about how you can help, join communities, volunteer to run those communities. Reach out to acquaintances. All good things take time. It took me two years after graduation and many many weekends to finally feel like I’d found my people. But now, looking back, I believe the effort was more than worth it. #careergrowth #mentorship
1.8K

Chip Huyen

Tech & AI

24mo

As an engineer who've learned so much from writing, I love meeting other engineers who write. It's great seeing Joe Reis 🤓 (Fundamentals of Data Engineering), Matt Topol (In-memory Analytics with Apache Arrow), Alex Merced (Apache Iceberg: A Definite Guide), and so many other great folks in town for #SnowflakeSummit! Matt and I will be at Snowflake Dev Day the whole day and we'll have a few copies of our books to give away. Come say hi!! #dataengineering
1.3K

Chip Huyen

Tech & AI

29mo

New post: Sampling for Text Generation Link: https://lnkd.in/ex5Nh4ye Many challenges (and opportunities) in working with AI today stem from the way models sample their outputs. The sampling process causes models to be probabilistic, which is a feature for creative tasks but a bug for many tasks that depend on factuality. This post covers: 1. Sampling strategies and variables including temperature, top-k, and top-p. 2. How to sample multiple outputs to improve a model’s performance. 3. How to get models to generate outputs in a certain format. As always, feedback is much appreciated! #aiengineer #llm #probability #sampling #foundationmodels
868

Chip Huyen

Tech & AI

25mo

I’m making a list of things to consider when using open source models and commercial models. Here’s what I have currently. What else should I add? Considerations for commercial models 1. Data privacy: employees might accidentally include company’s private data in the prompt, e.g. when Samsung employees accidentally leaked the company’s secrets using ChatGPT. 2. Functionality: proprietary models might have important features like function calling and JSON mode. However, most model providers have no or limited logprobs (log probabilities) API. Logprobs are very useful for classification tasks, evaluation (confidence scoring), and interpretability. 3. API cost: API calls can get expensive at scale. 4. Finetuning: model providers might not let you finetune their models. Off-the-shelf, commercial models might be better for your use case, but might not be as good as open source + finetuning. 5. Edge use cases: can’t work for use cases on devices that have no Internet connection. Considerations for open source models 1. Data lineage/copyright: people are less likely to sue open source model builders for training on copyrighted data. However, if you use these models to make money, you can get in trouble. 2. Functionality: hosting your models gives you access to logprobs and other intermediate outputs. There are external tools that provide function calling and constrained sampling for certain open source models, but these features might be limited. 3. Engineering cost: hosting and optimizing large models takes nontrivial time, talent, and engineering effort. APIs are expensive, but engineering can be even more so. This can be mitigated by using model hosting services if they support the models you want to use. 4. Finetuning: in theory, you can finetune open source models, but it might not be easy to do so. #aiengineering #aiapplications #llms
1.6K

Chip Huyen

Tech & AI

36mo

A little known benefit of writing is that it introduces you to other writers! Machine learning starts with data. Time and time again, I’ve seen companies wanting to build a machine learning team to capture all the exciting new AI use cases, only to realize that their data is a mess, and they have to first start with a data team. Kudos to Joe Reis 🤓 and Matthew Housley for writing this comprehensive guide to modern data engineering! #dataengineer #machinelearning #writing
4.4K

Chip Huyen

Tech & AI

29mo

Claypot AI is joining Voltron Data! AI has to start from data. By joining forces, we can further help companies leverage both batch and real-time data for AI applications, on top of Voltron Data’s GPU-native distributed engine Theseus. https://lnkd.in/gzjFqmch For AI, GPUs are mostly being used for training and inference. We enable large-scale data processing on GPUs, which can significantly reduce costs, latency, and data I/O bottlenecks. We look forward to expanding the capabilities of Theseus and contributing to open-source products like Apache Arrow, Ibis, and Substrait. I met Joshua Patterson, CEO of Voltron Data, when he was leading the RAPIDS AI team at NVIDIA. He’s been my mentor. Over the last year, the Claypot team has worked closely with the Voltron Data team. We’re thrilled about officially joining the team. Thank you to everyone who has been with us through this journey, especially team members who have taken us where we are today, and investors who have supported us from day 1. #dataprocessing #aiengineering #distributedsystems #theseus #gpu
1.1K

Chip Huyen

Tech & AI

26mo

I have this hypothesis that the most popular enterprise AI applications today aren’t the ones that solve the most important problems or make the most money. The most popular applications are the ones that are easiest to evaluate. Let’s look at some common enterprise AI use cases: recommender system, fraud detection, coding, and LLM-powered classification. 1. Recommender system: evaluated by increase in engagement or purchase through rate. 2. Fraud detection: evaluated by how much money is saved from prevented fraud. 3. Coding is a common LLM use case. Unlike other text generation tasks, coding can be evaluated using functional correctness. Generated code is correct if it compiles and outputs the expected values. 4. Even though LLMs are open-ended, two friends estimated ~⅓ of the LLM applications they see are close-ended (classification, e.g. intent classification). It’s much easier to evaluate classification tasks than open-ended tasks. From a business perspective, this makes sense, as companies don’t want to invest in anything without a measurable return on investment. However, this hypothesis, if true, has two consequences. 1. Focusing only on applications whose outcomes can be measured is similar to looking for the lost key under the lamppost (at night). It’s easier to do, but it doesn’t mean we'll find the key. We might be missing out on many potentially game-changing applications because there is no easy way to evaluate them. 2. Open-ended evaluation is the biggest bottleneck to AI adoption. #aiengineering #evaluation #aiapplications
2.3K

Chip Huyen

Tech & AI

34mo

Never before had I seen so many smart people working on the same goal: making LLMs better. After talking to many people working in both industry and academia, I noticed the 10 major research directions that emerged. Link: https://lnkd.in/gxiimJdg The first two directions, hallucinations and context learning, are probably the most talked about today. I’m the most excited about numbers 3 (multimodality), 5 (new architecture), and 6 (GPU alternatives). Some of the directions are harder than others. For example, I think that number 10, building LLMs for non-English languages, is more straightforward with enough time and resources.  Number 1, reducing hallucination, will be much harder, since hallucination is just LLMs doing their probabilistic thing. Number 4, making LLMs faster and cheaper, will never be completely solved. There is already so much progress in this area, and there will be more, but we will never run out of room for improvement. Number 5 and number 6, new architectures and new hardware, are very challenging, but are inevitable with time. Because of the symbiosis between architecture and hardware – new architecture will need to be optimized for common hardware, and hardware will need to support common architecture – they might be solved by the same company. Some of these problems won’t be solved using only technical knowledge. For example, number 8, improving learning from human preference, might be more of a policy problem than a technical problem. Number 9, improving the efficiency of the chat interface, is more of a UX problem. We need more people with non-technical backgrounds to work with us to solve these problems. I referenced a lot of papers here, but I have no doubt that I still missed a ton. If there’s something you think I missed, please let me know. What research direction are you most excited about? What do you see as promising solutions? I’d love to hear from you. #llm #airesearch #generativeai
3.7K

Chip Huyen

Tech & AI

25mo

The rapid adoption of GPUs had made GPU optimization one of the most sought-after engineering skills. I'm excited for the GPU optimization workshop our community is hosting this Thursday with stellar speakers from Meta, NVIDIA, OpenAI, and Voltron Data. RSVP: https://lu.ma/1wu5ppl5 [12:00] Crash course on GPU optimization (Mark Saroufim, PyTorch core developer, Meta) Mark will give an overview of why GPUs, the metrics that matter, and different GPU programming models (thread-based CUDA and block-based Triton). He promises this will be a painless guide to writing CUDA/Triton kernels! [12:45] High-performance LLM serving on GPUs (Sharan Chetlur, TensorRT-LLM core developer, NVIDIA) Sharan will discuss how to build performant, flexible solutions to optimize LLM serving given the rapid evolution of new models and techniques. The talk will cover optimization techniques such as token concatenation, different strategies for batching, and cache. [13:20] Block-based GPU Programming with Triton (Philippe Tillet, Triton lead, OpenAI) Philippe will explain how Triton works and how it differs from CUDA. Triton aims to be higher-level than CUDA while being more expressive (lower-level) than common graph compilers like XLA and Torch-Inductor. [14:00] Intro to data processing on GPUs (William Malpica, Voltron Data co-founder) Most people today use GPUs for training and inference. A category of workloads that GPUs excel at but are underutilized for is data processing. William will discuss why large-scale data processing should be done on GPUs instead of CPUs and how different tools like cuDF, cuPY, RAPIDS, and Theseus leverage GPUs for data processing. #gpu #distributedsystems #mlengineering
1.5K

Chip Huyen

Tech & AI

26mo

Excited to show what our team Voltron Data has been working on over the last 2.5 years: Theseus, our GPU-native query engine! This benchmark compares data queries of different scales -- 10TB, 30TB, and 100TB -- on Spark (run on CPUs) and Theseus (run on GPUs). Moving the same queries from CPUs to GPUs can significantly reduce both runtime and costs. GPUs are great for data processing for 2 reasons: 1. GPUs are optimized for parallel processing, and data processing workloads are highly parallelizable (think about processing billions of rows in parallel). 2. GPUs have higher memory bandwidth, which increases the speed of loading data and moving data between nodes in a distributed system. And if your AI workloads are already running on GPUs, moving data processing to GPUs can reduce data I/O time (less moving data from CPUs to GPUs) and increase GPU utilization (less GPUs sitting around waiting for data processing jobs to finish). GPUs can especially help if you have large scale queries (30TB+) and doing a lot of large joins. https://lnkd.in/gvMqkrMa It takes years to build a distributed query engine, and it’s been really cool to see how the team was able to optimize Theseus for the 100TB scale in just a short amount of time. Would love to hear your feedback on our benchmark! #gpus #dataengineering #distributedsystems
1.3K

Chip Huyen

Tech & AI

18mo

During the process of writing AI Engineering, I went through so many papers, case studies, blog posts, repos, tools, etc. This repo contains ~100 resources that really helped me understand various aspects of building with foundation models. https://lnkd.in/gwdK4tNu Here are the highlights: 1. Anthropic’s Prompt Engineering Interactive Tutorial The Google Sheets-based interactive exercises make it easy to experiment with different prompts and see immediately what works and what doesn’t. I’m surprised other model providers don’t have similar interactive guides: https://lnkd.in/gqr5uQqg 2. OpenAI’s best practices for finetuning While this guide focuses on GPT-3, many techniques are applicable to full finetuning in general. It explains how finetuning works, how to prepare training data, how to pick training hyperparameters, and common finetuning mistakes: https://lnkd.in/g7_kspz4 3. Llama 3 paper The section on post-training data is a gold mine as it details different techniques they used to generate 2.7 million examples for supervised finetuning. It also covers a crucial but less talked about topic: data verification, how to evaluate the quality of synthetic data: https://lnkd.in/g3ZaMAYZ 4. Efficiently Scaling Transformer Inference (Pope et al., 2022) An amazing paper co-authored by Jeff Dean about inference optimization for transformers models. It covers not only different optimization techniques and their tradeoffs, but also provides a guideline for what to do if you want to optimize for different aspects, e.g. lowest possible latency, highest possible throughput, or longest context length: https://lnkd.in/gq2N7AUb 5. Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models (Lu et al., 2023) My favorite study on LLM planners, how they use tools, and their failure modes. An interesting finding is that different LLMs have different tool preferences: https://lnkd.in/g-F3Ayab 6. AI Incident Database For those interested in seeing how AI can go wrong, this contains over 3000 reports of AI harms: https://lnkd.in/gvSWwTJ7 7. I find case studies from teams that have successfully deployed AI applications extremely educational. Here are some of useful enterprise case studies. I'll add more case studies soon! - LinkedIn: https://lnkd.in/gsRf2Sw6 - Pinterest's Text-to-SQL: https://lnkd.in/gF3zKHMf - Gmail’s Smart Compose (2019): https://lnkd.in/gWS9gqcE  - Grab: https://lnkd.in/g7qRV4Fn #aiengineering #aiapplications #llms
5.1K

Chip Huyen

Tech & AI

27mo

I'm happy to share that I've joined Voltron Data as their VP of AI & Open source software. I'm excited about Voltron's vision and their strong engineering expertise. Voltron Data's mission is to make data processing cheap, fast, and easy on GPUs. We believe in a future where any company should be able to leverage any hardware and any data platform for their needs. We contribute to and maintain open source projects around open data standards and formats, including Apache Arrow, Ibis, and Substrait. Our team will be at GTC next week. Hit us up if you want to talk about data processing on GPU, GPU optimization, or composable data systems! #gtc2024 #gpu #dataprocessing
3.3K

Chip Huyen

Tech & AI

17mo

Wrote a quick note (with examples) of common pitfalls that I’ve seen, both from public case studies and from my personal experience. Link: https://lnkd.in/g6KWz8ef 1. Use generative AI when you don’t need generative AI Gen AI isn’t a one-size-fits-all solution to all problems. Many problems don’t even need AI. 2. Confuse ‘bad product’ with ‘bad AI’ For many AI product, AI is the easy part, product is the hard part. 3. Start too complex While fancy new frameworks and finetuning can be useful for many projects, they shouldn’t be your first course of action. 4. Over-index on early success Initial success can be misleading. Going from demo-ready to production-ready can take much longer than getting to the first demo. 5. Forgo human evaluation AI judges should be validated and correlated with systematic human evaluation. 6. Crowdsource use cases Have a big-picture strategy to maximize return on investment. Would love to hear from your experience about the pitfalls you've seen! #AIEngineering #AIApplications #LLMs
1.3K

Chip Huyen

Tech & AI

28mo

I never dreamed of this happening, but I walked into a public library in Tokyo and found a translated copy of my book!! Many people asked me why I chose to work with a publisher (O’Reilly) instead of self-publishing. Because of the opportunity to see my book in different countries, like this. Like most people, I don’t write books for financial reasons. I write because I love writing, and it makes me really happy to see my writing reaching people in a way I never thought possible. #mlengineer #mlops #technicalwriters
4.9K

Chip Huyen

Tech & AI

37mo

Bet on attitude vs. bet on experience Last week, I was on a panel with other founders, and the moderator asked: “What are some of the biggest mistakes you made building a company?” My answer was: “Underestimating experience. When we started, I told my co-founder that I don’t care about someone’s pedigrees or experience. As long as that person is smart and has the right attitude, I’d want to bet on that person. However, as we built our product, I realized that there are many situations where we need experience.” After the panel, one person asked me: “Are you saying that you wouldn’t hire yourself straight out of college?” Realizing that my answer can cause misunderstanding, I want to give more context. I started working in high school. I started out translating articles from English to Vietnamese, then I was given the opportunity to write my own articles and was promoted to editor when I was still in high school. By 21, I was the Creative Director of a 400-person company. I’m grateful for the opportunity people have given my younger self, and I want to give other young, hungry people the same opportunities. However, when building a team, we need to consider team composition. It’s not about hiring only for attitude or only for experience, but about combining both to build a strong team. Would I hire my younger self? I’d hire my younger self for jobs that my younger self would be good at: building prototypes, running experiments, and writing. I wouldn’t hire myself to architect a distributed system or build a compiler. Don’t laugh, but here is a picture of me, 16-year-old, sitting at my first press conference in Vietnam as a reporter. I couldn’t imagine that one day, I’d be in Silicon Valley building an AI startup. I wouldn’t be here today without so many people taking bets on me, someone who lacked experience and pedigrees, but was hungry to learn and grow. I hope that as I gain more experience, I wouldn’t lose that attitude. #building #team #startup
2.4K

Chip Huyen

Tech & AI

10mo

What people think will improve AI applications vs. what actually does Made this slide for a talk and folks seemed to like it so I shared it here! From talking to teams, what I've found to consistently improve application performance include: 1. Talk to users and look at data Understand how they interact with your application, what features they actually use, where they drop off, etc. User understanding >> news coverage. 2. Build a more reliable platform ... that allows your team to quickly iterate and improve the application. Iteration speed >> shiny new tool. ** A reliable platform should also make it easier for you to swap tools in and out as needed! 3. Prepare data better For many retrieval system, performance improvement can come from: - better retrieval mechanism - better data preparation (data is cleaned and processed in a way that makes it easier for the right data to be discovered). From many use cases that I've seen, the biggest performance boost comes from data preparation, such as better chunking algorithm, rewriting data into Q&A format, or adding the right context/summary/metadata to each chunk. Data preparation >> vector database debate ** I'm not saying vector databases aren't important. They are! It's just that we should focus on the most impactful axis first. 4. Optimize for the end-to-end workflow AI is usually a very small part of the whole system. Constant evaluation of models can take up valuable time that could be spent on optimizing the rest of the system, such as end-to-end latency, systematic evaluation, and data flywheel. System >> models. #aiapplications #aiengineering
2.2K

Chip Huyen

Tech & AI

4mo

I built a tool to help me stay-up-date with new AI stuff. It's tracking 14K open source repos so far, with contributions from over 145K developers. Link: https://goodailist.com It searches for new AI repos every day (based on 123 keywords and topics), surfaces repos that are gaining traction, and automatically categorizes each repo. It also lets me see where the open source contributors are, so when I travel, I can find folks doing cool stuff in a new city or country. The annotations are done by AI so they are not super accurate, but they've helped me find some useful stuff for my work! #aiengineering #aiapplications
4.6K

Chip Huyen

Tech & AI

6mo

OMG there's a bookstore dedicated to technical books in Taipei! #aiengineering
2.3K

Chip Huyen

Tech & AI

36mo

“Leadership needs us to do generative AI. What do we do?” I had a lot of fun preparing the talk for Fully Connected yesterday. Thanks everyone for responding to my post with your suggestions! Slides: https://lnkd.in/gZfaJv7Z The idea for the talk came from many conversations I’ve had recently with friends who need to figure out their generative AI strategy, but aren’t sure what exactly to do. This talk is a simple framework to explore what to do with generative AI. Many ideas are still being fleshed out. I’d love to hear about your experience through this process. #genai #llms #mlops
2.1K

Chip Huyen

Tech & AI

10mo

Very useful tips on tool use and memory from Manus's context engineering blog post. Key takeaways: 1. Reversible compact summary Most models allow 128K context, which can easily fill up after a few turns when working with data like PDFs or web pages. When the context gets full, they have to compact it. It’s important to compact the context so that it’s reversible. Eg, removing the content of a file/web page if the path/URL is kept. 2. Tool use Given how easy it is to add new tools (e.g., with MCP servers), the number of tools a user adds to an agent can explode. Too many tools make it easier for the agent to choose the wrong action, making them dumber. They caution against removing tools mid-iteration. Instead, you can force an agent to choose certain tools with response prefilling. Ex: starting your response with <|im_start|>assistant<tool_call>{"name": “browser_ forces the agent to choose a browser. Name your tools so that related tools have the same prefix. Eg: browser tools should start with `browser_`, and command line tools should start with `shell_` 3. Dynamic few shot prompting They cautioned against using the traditional few shot prompting for agents. Seeing the same few examples again and again will cause the agent to overfit to these examples. Ex: if you ask the agent to process a batch of 20 resumes, and one example in the prompt visits the job description, the agent might visit the same job description 20 times for these 20 resumes. Their solution is to introduce small structured variations each time an example is used: different phrasing, minor noise in formatting, etc Link: https://lnkd.in/gHnWvvcZ #AIAgents #AIEngineering #AIApplications
903

Chip Huyen

Tech & AI

35mo

Everyone: we can't use memes in our deck because it's not professional Our designer: hold my 🍺 #dataengineering #dataanalysis
448

Chip Huyen

Tech & AI

24mo

In many conversations, I noticed several common misperceptions about generative AI. 1. Technologies behind generative AI are new While many applications made possible by GenAI are new, the technologies surrounding it are not. - Retrieval, the backbone of RAG, is also the backbone of search and recommender systems. The first information retrieval system was described in the 1920s. - Vector search has been around since early 2010s. - Language modeling was first introduced in 1951. - The attention mechanism was introduced in 2015. - Inference optimization techniques (quantization, low-rank factorization, distillation) have been around for a while. While many temporary fixes will become outdated, the fundamentals will remain important. The trick is to separate the temporary fixes from the fundamentals. 2. Foundation models will completely replace classical ML In my observation, most GenAI applications in production have classical ML components. Outside of leveraging information retrieval, 30 - 50% of applications have a classification component, such as: - Intent classification: predicting the intent of a query so that you can route it to the right model. - Scoring: evaluating each output by giving it a score, e.g. from 1 to 5. - Next action prediction: if a model has access to multiple tools, predict which tool to use next. Foundation models won’t replace classical ML. They should be used together with classical ML models. 3. Hallucinations make GenAI applications unusable Models hallucinate because they are probabilistic. However, a model is much more likely to hallucinate when it doesn’t  have access to the right information. Multiple studies have shown that hallucinations can be significantly reduced by giving the model the right context via retrieval or tools that the model can use to gather context (e.g. web search). While hallucinations are dealbreakers for many applications, they can be sufficiently curtailed to make GenAI usable for many more. #llms #aiengineering #aiapplications
3.1K

Chip Huyen

Tech & AI

30mo

Gemini is finally out. Its technical report, while 60 pages long, is light in details. I did a quick read-through and here's the summary. 1. Gemini was written in Jax and trained using TPUs. The architecture, while not explained in details, seems similar to that of DeepMind's Flamigo, with separate text encoder and vision encoder. 2. Gemini Pro's performance is similar to GPT-3.5 and Gemini Ultra is reported to be better than GPT-4. Nano-1 (1.8B params) and Nano-2 (3.25B params) are designed to run on-device. 3. 32K context length. 4. Very good at understanding vision and speech. 5. Coding ability: the big jump in HumanEval compared to GPT-4 (74.4% vs. 67%), if true, is awesome. However, the Natural2Code benchmark (no leakage on the Internet) shows a much smaller gap (74.9% vs. 73.9%). 6. On MMLU: using COT@32 (32 samples) to show that Gemini is better than GPT-4 seems forced. In 5-shot setting, GPT-4 is better (86.4% vs. 83.7%). 7. No information at all on the training data, other than they ensured "all data enrichment workers are paid at least a local living wage." #gemini #llms #multimodal #aiengineering
4K

Chip Huyen

Tech & AI

25mo

LinkedIn has published one of the best reports I’ve read on deploying LLM applications: what worked and what didn’t. 1. Structured outputs They chose YAML over JSON as the output format because YAML uses less output tokens. Initially, only 90% of the outputs are correctly formatted YAML. They used re-prompting (asking the model to fix its YAML responses), which increased the number of API calls significantly. They then analyzed the common formatting errors, added those hints to the original prompt, and wrote an error fixing script. This reduced their errors to 0.01%. 2. Sacrificing throughput for latency Originally, they focused on TTFT (Time To First Token), but realized that TBT (Time Between Token) hurt them a lot more, especially with Chain-of-Thought queries where users don’t see the intermediate outputs. They found that TTFT and TBT inversely correlate with TPS (Tokens per Second). To achieve good TTFT and TBT, they had to sacrifice TPS. 3. Automatic evaluation is hard One core challenge of evaluation is coming up with a guideline on what a good response is. For example, for skill fit assessment, the response: “You’re not a good fit for this job” can be correct, but not helpful. Originally, evaluation was ad-hoc. Everyone could chime in. That didn’t work. They then have linguists build tooling and processes to standardize annotation, evaluating up to 500 daily conversations and these manual annotations guide their iteration. Their next goal is to get automatic evaluation, but it’s not easy. 4. Initial success with LLMs can be misleading It took them 1 month to achieve 80% of the experience they wanted, and additional 4 months to surpass 95%. The initial success made them underestimate how challenging it is to improve the product, especially dealing with hallucinations. They found it discouraging how slow it was to achieve each subsequent 1% gain. #aiengineering #llms #aiapplication
5K

Chip Huyen

Tech & AI

26mo

Absolutely loved the discussions and the energy at MLOps Learners’ RAG workshop this week. We had over 200+ comments/questions during the 90 minute workshop! Here are some of the takeaways: 1. Long context length and RAG have pros and cons, and neither will kill the other. A model can take a long context doesn’t mean that it can efficiently leverage all the information. Reranking is needed. 2. Today, most RAG systems are still text-based, but we’re seeing exciting work on RAG for tabular data and multimodal data. There are also discussions on new techniques for RAG such as RAFT (RAFT: Adapting Language Model to Domain Specific RAG) which combines finetuning and RAG. 3. RAG evaluation is still challenging. For RAG, we need to evaluate not only the system end-to-end but also different components of the system, such as the embedding quality and retrieval quality. Many are using AI to both evaluate RAG quality and generate evaluation data. 4. RAG scalability: there were questions around how to make RAG work with a lot of data, such as millions of text chunks or 200K of lines of code. For large data, you might need to filter out data by metadata before doing semantic search for retrieval. 5. There were also a lot of discussions on the optimal configuration for RAG such as the optimal chunk sizes or the optimal number of chunks to retrieve. Here is the recording with the slides and notebooks used for the workshop: https://lnkd.in/gca8MvUv Thanks Val Andrei Fajardo, Lance Martin, and Harpreet Sahota 🥑 for your great presentation and for the great discussions on the server. Thanks Shahul ES for spontaneously jumping in to answer questions about RAGAS. Thanks Samuel Reiswig for hosting the event! Some other resources from the workshop: 1. Andrei‘s awesome RAG cheatsheet: https://lnkd.in/g3a6Wep9 2. Langchain’s multimodal RAG template: https://lnkd.in/g-MczRXJ  3. RAFT paper: https://lnkd.in/gzd7JmNP  4. Self-RAG: https://lnkd.in/gxxFbZBs 5. Corrective-RAG: https://lnkd.in/g3i2STbZ #RAG #AIengineering #LLMs
1.4K

Chip Huyen

Tech & AI

21mo

Hello Australia 🦘 👋 I'll be in Sydney, Perth, Melbourne, and Auckland for the next two weeks speaking about data and AI. I'll be sharing the stage with Joe Reis 🤓 Eevamaija Virtanen Zach Wilson Adi Polak and many more wonderful data folks. If you're around, please say hi! I've heard wonderful things about the tech community here, and can't wait to get to know you. This is my first time in both countries. What must I eat/do/see? Thanks Peter Hanssens and DataEngBytes for inviting me.
1.1K

Chip Huyen

Tech & AI

27mo

In conversations with companies, I notice a few questions that keep coming up about model evaluation. 1. How much data do we need for training? I think of data in orders of magnitude. I’d train my model using different orders of magnitude of data, e.g. with 10K, 100K, and 1M examples, and plot the model performances. This gives me a rough understanding of the effect of data on my model performance, which helps me estimate how much more data I need to reach the desirable performance. 2. How to create an evaluation set? You should have multiple evaluation sets. One evaluation set should be representative of the actual production data. Other sets should focus on the areas you want to closely monitor. For example, one set can contain examples that the model is known to frequently fail on. Another set can contain high-value examples (e.g. from the highest-paying customers). 3. What benchmarks can I trust? None. A benchmark becomes questionable as soon as its data becomes public, as it can then be included in the training data of models. Benchmarks help you weed out generally bad models, but won’t be sufficient to help you find the best model for your specific use cases. I’d use benchmarks to select a handful of models I want to closely evaluate, then evaluate them on my own data using my own metrics. If you have other questions, feel free to post them here and I'll respond to the most common ones!
1.1K

Chip Huyen

Tech & AI

11mo

I open sourced Sniffly (https://lnkd.in/geEk3HgN), a tool that analyzes Claude Code logs to help me understand my usage patterns and errors. Key learnings from spending so much time looking at the logs. 1. The biggest type of errors Claude Code made is Content Not Found (20 - 30%). It tries to find files or functions that don't exist. So I restructured my code base for discoverability, and the average number of steps Claude Code needs for each instruction went from 8 to 7 steps. 2. Traditional metrics of engineering hours/days don’t work for AI. Two metrics I use to evaluate the complexity of a project: - how many instructions I need to give AI - how often I have to interrupt it because it goes into the wrong direction Across my projects, the interruption rate is about 1 in 4 instructions. This means I still need to actively monitor the agent. 3. While most of the time, Claude Code can only go up to 10 steps before I need to interrupt it, it can occasionally go close to 100 steps. Just a year ago, people told me it was hard to get an agent to go above 5 steps! Claude Code’s favorite tools are, unsurprisingly, search tools (grep, ls, glob), which make up ⅓ of tool calls.
1.9K

Chip Huyen

Tech & AI

27mo

Had a blast participating in this GTC panel with Chris Deotte (NVIDIA), Ruchi Bhatia (HP), and Ken Jee (HP). We talked about how generative AI is changing the machine learning engineering stack, and what skills will be needed for data science in the future. Here are some takeaways: 1. The fundamentals of how to bring ML models into production remain the same. First, AI solutions still have to solve business problems, and matching business metrics to ML metrics is still hard. Second, we still need to systematically experiment and evaluate ML systems. With classical ML, experiment tracking is about hyperparameter tuning. With generative AI, it’s about experimenting with prompts and sampling variables. 2. The data stack sees many new changes. - Data synthesis: AI is pretty good at generating data for a wide range of tasks. We’re already seeing many companies successfully leveraging AI to create cheap and fast training data. - Data management: many classical ML algorithms expect tabular data, and foundation models put the focus on unstructured data. Managing large-scale unstructured data throughout its lifecycle (storing raw text/image with corresponding embedding, vector search and retrieval, and loading large-scale datasets) is challenging. - Data processing: traditionally, data processing was done on CPUs. However, as many companies now use both CPUs and GPUs, there is a huge opportunity for companies to move their data processing to GPUs. Many ETL workloads are naturally parallelizable, making them ideal for GPUs. A pattern we see: the most powerful GPUs (H100s) are reserved for training, less powerful GPUs (A10s) are used for inference, and older GPUs can be used for data processing. Of course, this highly depends on the application and the scale. Thanks Jamil Semaan and NVIDIA for organizing this! #aiengineering #dataengineering #gpus #aiapplications
924

Chip Huyen

Tech & AI

17mo

Finally got my copy! “AI Engineering” is officially out 🙏 🎉 It’s heavier than I expected (500 pages) and I’m so glad O’Reilly decided to publish it in color. Thanks everyone for making this happen! Thank you for giving this book a chance! #aiengineering #aiapplications #book
9K

Chip Huyen

Tech & AI

18mo

It’s done! My editor just told me that the manuscript has been sent to the printers! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links. This wouldn’t have been possible without the help of so many people who reviewed the early drafts, answered my thousands of questions, introduced me to fascinating use cases, or helped me see the beauty of overlooked techniques. * The Kindle is already out! https://amzn.to/49j1cGS * Paperback copies should be available in a few weeks, hopefully before the end of the year, but you can preorder on Amazon * The full manuscript is now on O’Reilly platform: https://lnkd.in/gYtF2taE #aiengineering #aiapplications #llms
9.5K

Chip Huyen

Tech & AI

27mo

It’s been so inspiring to hear about the amazing things people are working on in the AI & compute space at #GTC2024! - Christian Szegegy: who’s currently working on reasoning for xAI’s Grok. He believes that solving math is the best path towards intelligence. Math is more than just computing numbers. Math is essential for many tasks: logic, coding, science discovery… His team is working on making Grok the superhuman mathematician. Fun fact: at xAI, they don’t have data annotators but they have AI tutors. - Clément Farabet: VP of Research at DeepMind who’s leading the development of Gemma. He got me access to Gemini 1.5 Pro, a small mixture-of-expert model that can process 1 million token context length. Let me know what tasks you’d like me to experiment with! - Sameer Raheja: senior director of engineering at NVIDIA. His team, RAPIDS AI, works on this challenging problem of making data processing cheaper and faster on GPUs. Companies like Paypal already saw up to 70% cost savings by moving data processing from CPUs to GPUs — check out Ilay Chen’s talk on it today! - Kyle Kranen: engineering manager at NVIDIA who works on optimization for NIM, NVIDIA’s Inference Microservices. NIM is a suite of pre-built, optimized containers for AI models that allow users to deploy AI models anywhere. - Adel El Hallak: senior director of product at NVIDIA who leads the development of NVIDIA’s API catalog that allows users to quickly try out open source and close source models, before deploying them with NIM. If you want to learn more about data processing on GPUs, our team at Voltron Data will be presenting at Lambda Labs booth today (Wed) at 2pm and tomorrow (Thu) at 11am! #gpu #aiengineering #llmops
1.6K

Chip Huyen

Tech & AI

17mo

I'm using AI so much for work that I can tell how productive I am by how many conversations I've had with AI. Here's the script to generate this heatmap: https://lnkd.in/gY8ExNBg Should I write a post to share how I use AI in my work? #aiengineering #llms
2.9K

Chip Huyen

Tech & AI

28mo

A hard part of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt? Predictive human preference aims to predict which model users might prefer for a specific query. https://lnkd.in/gxy3KYGG One use case is model routing. If we know in advance that for a prompt, users will prefer Claude Instant’s response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing costs and latency. One pattern is that for simple prompts, weak models can do (nearly) as well as strong models. For more challenging prompts, however, users are much more likely to prefer stronger models. Here’s a visualization of predicted human preference for an easy prompt (“hello, how are you?”) and a challenging prompt (“Explain why Planc length …”). Preference predictors make it possible to create leaderboards unique to any prompt and domain. As always, feedback is much appreciated! #aiengineering #llm #aiapplications
1.6K

Chip Huyen

Tech & AI

11mo

I’m slowly beginning to accept that my productivity, when working with AI coding agents, is limited by my human brain. AI can do many tasks in parallel, but I can only track the context of a few, so I only run a few tasks at a time. I am the bottleneck. #AIagents #AIcoding
3.4K

Chip Huyen

Tech & AI

32mo

New blog post: Multimodality and Large Multimodal Models (LMMs) Link: https://lnkd.in/gJAsQjMc Being able to work with data of different modalities -- e.g. text, images, videos, audio, etc. --  is essential for AI to operate in the real world. Many use cases are impossible without multimodality, especially those in industries that deal with multimodal data such as healthcare, robotics, e-commerce, retail, gaming, etc. Not only that, data from new modalities can help boost model performance. Shouldn’t a model that can learn from both text and images perform better than a model that can learn from only text or only image? OpenAI noted in their GPT-4V system card that “incorporating additional modalities (such as image inputs) into LLMs is viewed by some as a key frontier in AI research and development.” This post covers multimodal systems, including LMMs (Large Multimodal Models). It consists of 3 parts. * Part 1 covers the context for multimodality, including use cases, different data modalities, and types of multimodal tasks. * Part 2 discusses how to train a multimodal system, using the examples of CLIP, which lays the foundation for many LMMs, and Flamingo, whose impressive performance gave rise to LMMs. * Part 3 discusses some active research areas for LMMs, including generating multimodal outputs and adapters for more efficient multimodal training. Even though we’re still in the early days of multimodal systems, there’s already so much work in the space. At the end of the post, I also compiled a list of models and resources for those who are interested in learning more about multimodal. As always, feedback is appreciated! #llm #lmm #multimodal #genai #largemultimodalmodel
2.7K

Chip Huyen

Tech & AI

25mo

A big issue I see with AI systems is that people aren't spending enough time evaluating their evaluation pipeline. 1. Most teams use more than one metrics (3-7 metrics in general) to evaluate their applications, which is a good practice. However, very few are measuring the correlation between these metrics. If two metrics are perfectly correlated, you probably don't need both of them. If two metrics strongly disagree with each other, either this reveals something important about your system, or your metrics just aren't trustworthy. 2. Many (I estimate 60 - 70%?) use AI to evaluate AI responses, with common criteria being conciseness, relevance, coherence, faithfulness, etc. I find AI-as-a-judge very promising, and expect to see more of this approach in the future. However, AI-as-a-judge scores aren’t deterministic the way classification F1 scores or accuracy are. They depend on the model, the judge's prompt, and the use case. Many AI judges are good, but many are bad. Yet, very few are doing experiments to evaluate their AI judges. Are good responses given better scores? How reproducible the scores are -- if you ask the judge twice, do you get the same score? Is the judge's prompt optimal? Some aren’t even aware of the prompts their applications are using, because they use prompts created by eval tools or by other teams. Also fun fact I learned from a (small) poll yesterday: some teams are spending more money on evaluating models’ responses than on generating responses 🤯 #aiengineering #llms #aievaluation
2K

Chip Huyen

Tech & AI

35mo

“Data product” vs. “Data as a product” People keep telling me that data is valuable, and a question I’ve been trying to figure out is: “How valuable is data? Can we put a dollar sign on it?” If we can figure out a way the measure the value of a company’s brand because brand names are important, why can’t we put a value on data? Many companies today have “data products”: products built using data, and the value of data is approximated based on the value of these products. “Data as a product” attempts to calculate the intrinsic value of data itself. Putting the value of data on the balance sheet directly can: 1. Give people working on data more autonomy to improve data quality and data infrastructure. 2. Motivate leadership to take data issues more seriously. For this reason, I was excited about Morgan Templar's talk on Data ROI at DataConnect Conference, which attempts to answer the same question. Kantor's study of 52 US brands shows that, on average, data value accounts for 19.7% of intangible assets. For comparison, that number for branding is around 30%. According to the study, the industries where data has the highest value are insurance, finance, fintech. The industries where data has the lowest value are hotels and telecom. The framework for calculating data value isn’t perfect. I suspect that, if another firm does the study again, they’ll likely come up with different numbers. However, I think it’s a great place to get the conversations started. Curious what do you think: should we attempt to measure the intrinsic value of data, or should data value be evaluated only based on products? #data #dataanalytics #dataengineering
1.7K

Chip Huyen

Tech & AI

23mo

Building a platform for generative AI applications Link: https://lnkd.in/gDUcwQ-v After studying how companies deploy generative AI applications, I noticed many similarities in their platforms. This post outlines these common components, what they do, and implementation considerations. This post starts from the simplest architecture and progressively add more components. 1. Enhance context input into a model by giving the model access to external data sources and tools for information gathering. 2. Put in guardrails to protect your system and your users. 3. Add model router and gateway to support complex pipelines and add more security. 4. Optimize for latency and costs with cache. 5. Add complex logic and write actions to maximize your system’s capabilities. I try my best to keep the architecture general, but certain applications might deviate. As always, feedback is appreciated! #mlengineering #genai #aiapplications
4K

Chip Huyen

Tech & AI

26mo

I’m excited to share that I’m working on a new book about building applications with foundation models. AI Engineering builds upon Machine Learning Systems Design, but with a focus on large scale, ready made models. The book covers: - The new AI stack (e.g. how it differs from traditional ML engineering) - Different approaches to evaluate open-ended systems - Dataset engineering - Prompt engineering, RAG, agents - Finetuning - Compute infrastructure, including how to mitigate latency and cost AI Engineering is scheduled for late 2024. An early draft of the first 3 chapters are available on the O'Reilly platform: https://lnkd.in/geh2x4Tw I’ve learned a lot during the research and writing process for this book. I hope you’ll find the learnings useful. Feedback is much appreciated! #aiengineering #aiapplications #mlengineering
12.6K

Chip Huyen

Tech & AI

5mo

Super impressed by the projects at the Agentic Hackathon last weekend! Many teams work on really hard/important problems: * Long running tasks: memory management, recovering from mid-task failures, and maintaining consistency across steps and sub-agents * Adaptive retrieval from multiple sources: databases, search indices, and websites * Agents that work with voice, video, and even 3D environments Will share a post with key takeaways soon, but if you are in SF, come check out the finalist demos tomorrow! https://luma.com/6bd4bt9j There will also be talks by Douglas Eck, who is doing amazing work with Veo and Imagen, as well as founders of Factory, Voyage AI, MongoDB, and many other awesome folks. Thanks MongoDB and Cerebral Valley for hosting and for letting me serve as a judge for these fantastic projects. #aiengineering #agent
358

Chip Huyen

Tech & AI

10mo

13 years. 6 books. All metrics are flawed, but it still makes me happy to see this. I love writing, and I hope to improve at it over time.
2.3K

Chip Huyen

Tech & AI

36mo

Everywhere I go, I see companies trying to figure out their generative AI strategy, and few seem to know exactly what to do. I'm giving a talk this afternoon on the key questions to ask when considering what to do with generative AI. What questions do you think I should cover? #genai #llms #mlops
1.2K