AI-Assisted Data Science (MEAP V05)
AI-Assisted Data Science (MEAP V05)
English | 2024 | ISBN: 9781633437647 | 174 pages | True PDF | 17.01 MB
Speed up common data science tasks with AI assistants like ChatGPT and Large Language Models (LLMs) from Anthropic, Cohere, AI21, Hugging Face, and more!
Using ChatGPT and other AI-powered tools, you can analyze almost any kind of data with just a few short lines of plain English. In AI-Assisted Data Science, you’ll learn important techniques for streamlining your data science practice, expanding your skillset and saving you hours—or even days—of time.
Inside, you’ll learn how to use AI assistants to
Analyze text, tables, images, and audio files
Extract information from multi-modal data lakes
Classify, cluster, transform, and query multimodal data
Build natural language query interfaces over structured data sources
Use LangChain to build complex data analysis pipelines
Prompt engineering and model configuration
This practical book takes you from your first prompts through advanced techniques like building automated analysis pipelines and fine-tuning existing models. You’ll learn how to create meaningful reports, generate informative graphs, and much more.
about the book
AI-Assisted Data Science teaches you to use a new generation of AI assistants and Large Language Models (LLMs) to simplify and accelerate common data science tasks. Cornell professor and long-time LLM advocate Immanuel Trummer reveals techniques he’s pioneered for getting the most out of LLMs in data science, including model selection and specialization, techniques for tuning parameters, and reliable prompt templates.
You’ll start with an in-depth exploration of how LLMs work. Then, you’ll dive into no-code data analysis with LLMs, creating custom operators with the OpenAI Python API, and building complex data analysis pipelines with the cutting edge LangChain framework.
about the reader
For data scientists, data analysts, and others who are interested in making their work easier through the use of artificial intelligence techniques. Readers should have a basic understanding of the Python programming language.
about the author
Immanuel Trummer is an assistant professor for computer science at Cornell University and leader of the Cornell Database Group. His papers have been selected for “Best of VLDB”, “Best of SIGMOD”, for the ACM SIGMOD Research Highlight Award, and for publication in CACM as CACM Research Highlight. Immanuel’s online course on data management has reached over a million views on YouTube. Over the past few years, his group has published extensively on projects that apply large language models in the context of data science.