Inspecting and transforming data
Once data is loaded into a table, the real work begins. In practical programs and AI pipelines, we rarely use raw data as-is. We inspect it, narrow it down, and reshape it so it is suitable for analysis or downstream processing.
This lesson exists to orient us to the everyday operations used to look at tabular data and prepare it for use, without turning pandas into a world of its own.
Examining data values and column types
Before changing anything, we usually want to understand what we have. With pandas, this means quickly inspecting values and seeing how each column is typed.
import pandas as pd
df = pd.read_csv("planets.csv")
df.head()
df.dtypes
These operations give us a snapshot of the data and how pandas is interpreting it, which informs every transformation that follows.
Selecting subsets of rows and columns
Most programs work with only part of a dataset at any given time. Pandas lets us select specific columns or slices of rows directly from a DataFrame.
planet_names = df["name"]
first_five = df.iloc[:5]
This kind of selection is foundational for focusing on the data that actually matters to the current task.
Filtering data based on conditions
Filtering allows us to keep only rows that meet a logical condition. This is how we narrow large datasets into meaningful subsets.
large_planets = df[df["radius_km"] > 30000]
Here, the condition is applied across the entire column, producing a filtered table that can be used immediately.
Applying simple transformations to columns
Data often needs light reshaping before it is useful. Pandas makes it straightforward to compute new values or adjust existing columns.
df["radius_m"] = df["radius_km"] * 1000
Transformations like this operate column-wide, keeping the code compact and expressive.
Using transformations to prepare data for analysis
Inspection, selection, filtering, and transformation usually work together. The goal is not the transformation itself, but producing a clean, meaningful table that is ready for analysis or reuse.
prepared = df[df["type"] == "planet"][["name", "radius_m"]]
At this point, the data reflects the structure we want, not the structure we inherited.
Conclusion
We now have a clear orientation to how tabular data is inspected and shaped in pandas. These operations form the backbone of most data preparation steps in analytical and AI-related Python programs.
With this mental model in place, we are ready to treat tabular data as something we actively shape, rather than something we simply load and observe.