Reading Data in D3
Reading CSV files into javascript objects
- These notes give an introduction to reading data into D3 scripts. The process
is a bit different than
read_csv
in R, but the pattern described here can be used quite generally. - There is a function called
d3.csv
that at first might seem like a simple solution to reading in files. It would be great if we could write something like,// this doesn't work! let data = d3.csv("gapminder.csv") console.log(data) visualize(data)
where the function
visualize
does all the DOM manipulation to make the visualization. - Unfortunately, this approach does not work. The reason is that javascript is
designed with reactive programming in mind, so code doesn’t necessarily run
sequentially. We need to explicitly tell javascript to only run
visualize
after all the data has been read in. This can be accomplished using the.then()
method,d3.csv("gapminder.csv") .then(data => console.log(data))
- On some browsers, you can now navigate to the HTML page that includes this
script, and the data will be loaded. But on many modern browsers, you will
see an error like this in the console,
Access to fetch at 'file:///Users/ksankaran/Desktop/teaching/stat679_code/examples/week5/week5-3/gapminder.csv' from origin 'null' has been blocked by CORS policy: Cross origin requests are only supported for protocol schemes: http, data, chrome, chrome-extension, chrome-untrusted, https.
The browser is trying to protect us from downloading a file from a suspicious link. To resolve this, we need to run our own server. From your terminal, go to the directory with the HTML page you want to run. Then, execute,
python3 -m http.server
You should see a message like,
Serving HTTP on :: port 8000 (http://[::]:8000/) ...
If you go the page with the port number in the message (
http://localhost:8000
in the example here), you should see your HTML page. When you click on that page, you will no longer see the data loading error — the file is now trusted for download. -
You should now be able to see the data in the console,
We are almost ready for visualization! But there is still one detail we should account for. If we look closely, we can see that all the numbers are being read in as character strings. We need to parse these into numbers before they can be used.
d3.csv
can take as an argument a function to modify each row of thecsv
as it is being read in. This is handy in case we want to manually change the types of object items as they are read in. We will usually not have to do this work ourselves – we can rely on a built in function,d3.autoType
, to automatically recognize the number columns in the CSV. Using this function ind3.csv
, we now see a version of the data with numbers correctly parsed.d3.csv("gapminder.csv", d3.autoType) .then(data => console.log(data));
- We are now done with loading data in D3. The main ideas are,
- Data can’t be easily read into local variables, like in R. Instead, use
.then()
. - To ensure the data are not blocked by the browser, we need to start a backend server with
python3 -m http.server
. - Numeric variables need to be parsed using
d3.autoType
.
- Data can’t be easily read into local variables, like in R. Instead, use
- Since we’ve gone through all the work of reading in these data, let’s make a
quick visualization. Let’s visualize log population,
lpop
, against life expectancy,life_expectancy
, for the year 1965. We first filter the data to this year,data = data.filter(d => d.year == 1965)
- Next, we will bind a circle for each country in the filtered array. We create a selection of circles on the parent SVG and bind the data,
d3.select("svg") .selectAll("circle") .data(data)
Since there are not yet any HTML tags in this selection, we can refer to each array element using
.enter()
. The next.append()
line draws each of these entered elements.d3.select("svg") .selectAll("circle") .data(data).enter() .append("circle")
We then set the
cx
andcy
coordinates for all circles on the page. I’ve multiplied the log population by 10, sincecx
is specified in terms of pixel coordinates. Since log populations are all relatively small, the points would all overlap if I didn’t manually multiply them like this.d3.select("svg") .selectAll("circle") .data(data).enter() .append("circle") .attrs({ cx: d => 10 * d.lpop, cy: d => d.life_expectancy, r: 2 })
- Finally, we wrap this block into a single function which is called as soon as the data are loaded.
function visualize(data) { data = data.filter(d => d.year == 1965) d3.select("svg") .selectAll("circle") .data(data).enter() .append("circle") .attrs({ cx: d => 10 * d.lpop, cy: d => d.life_expectancy, r: 2 }) } d3.csv("gapminder.csv", parse_row) .then(visualize);
- The result is very primitive looking, but it does use a real dataset. We’ll revisit these data in the next lecture to see how we can support more realistic visual queries.