Show HN: I visualized the entire history of Citi Bike in the browser
109 points
1 day ago
| 15 comments
| bikemap.nyc
| HN
Each moving arrow represents one real bike ride out of 291 million, and if you've ever taken a Citi Bike before, you are included in this massive visualization!

You can search for your ride using Cmd + K and your Citi Bike receipt, which should give you the time of your ride and start/end station.

Everything is open source: https://github.com/freemanjiang/bikemap

Some technical details: - No backend! Processed data is stored in parquet files on a Cloudflare CDN, and queried directly by DuckDB WASM

- deck.gl w/ Mapbox for GPU-accelerated rendering of thousands of concurrent animated bikes

- Web Workers decode polyline routes and do as much precomputation as possible off the main thread

- Since only (start, end) station pairs are provided, routes are generated by querying OSRM for the shortest path between all 2,400+ station pairs

chem83
1 day ago
[-]
Relevant callout from https://bikemap.nyc/about:

* Limitations *

The data only contains the start and end station for each trip, but does not contain the full path. Route geometries are computed for each (start station, end station) pair using the shortest path from OSRM.

This means that the computed routes are directionally correct but inexact. Trips that start and end at the same station are filtered out since the route geometry is ambiguous.

reply
jotaen
1 day ago
[-]
This limitation comes with more interesting implications: e.g., I noticed that some bike trips are noticeably slower than average. For those I’d assume that the rider either took a detour or made a stop in between. The animation, however, makes it appear as if it was a very slow ride. Maybe worth considering to filter out all rides that are essentially walking speed or slower.

It also would be interesting to learn how many rides had been excluded altogether, just to put things into perspective.

reply
freemanjiang
21 hours ago
[-]
Yeah there is a filter between 1.2 and 20 mph
reply
kyleee
1 day ago
[-]
Hmm, definitely too bad. Essentially fictional
reply
mattmm11
19 hours ago
[-]
This is now top of my list as one of my favorite data visualizations I've ever seen. I remember spending some time with data for Capital Bikeshare data in DC, which was also public at one point, though looks like it only goes through 2016: https://capitalbikeshare.com/system-data. Would love to see the Lime/Bird version of this. Thanks for sharing.
reply
lazarus01
1 day ago
[-]
Cool project. Thanks for sharing!

The link above points to a 404 error page on GitHub. Looks like you forgot the hyphen in the name part of the url.

I’m working with subway data, particularly the A subway line, 32 mi long with about 2million trips over 6 months across 66 stations. Trying to train a convlstm to learn the spatiotemporal propagation of train headways.

reply
jdlyga
1 day ago
[-]
I really wish Lyft invested in maintenance. I used Citibike this week for the first time in about a year, and the Hudson River Greenway dock by NY Waterway had 1/3 of its empty docks broken with flashing red lights, then about 5 ebikes that needed service.
reply
crazygringo
1 day ago
[-]
Are you sure that wasn't the "staggered" bike dock? It forces you to dock in the rear row if the neighboring two front row spaces are free. This is to fit more bikes. The blinking red docks aren't broken. They're intentionally unavailable.

https://www.reddit.com/r/MicromobilityNYC/comments/v457x0/9_...

Also, the 5 e-bikes probably didn't need "service", they were just waiting for battery swaps. This is by design. The docks don't charge them.

CitiBike maintenance is generally fine. They're not leaving any significant number of broken bikes or docks. I think you may have just misunderstood how it works.

reply
wiredfool
1 day ago
[-]
Interesting that citibike publishes trip level data. The bike share schemes in Dublin only publish station counts or free bike locations. So you can see the overall pattern of bike motion, but there’s no way to see how many north side trips go to the docks vs Heuston station vs the city center.
reply
jeffbee
1 day ago
[-]
All of the Lyft-operated systems in America publish this kind of data at least monthly.
reply
big_toast
1 day ago
[-]
non corrupted github link: https://github.com/freeman-jiang/bikemap.nyc

Cool visualization.

Do you find the OSRM shortest path routes probable for bikes? Not living in NYC, I expected pretty different paths. Say the "Hudson River Greenway" or whatever that's called.

reply
rorylawless
1 day ago
[-]
This is awesome. I had no idea Lyft publishes ride data, time to explore the DC version!
reply
7777777phil
1 day ago
[-]
This is just so cool! Not much more to add. Thanks a lot for sharing!! Great work :)
reply
IvoCrnkovic
1 day ago
[-]
I've seen many visualizations of the citibike data over the years, this is one of the most charismatic for sure!
reply
nadis
1 day ago
[-]
+1 to this comment! I used to work in this space and have similarly seen many projects and professional attempts at visualizing this kind of trip data.

This is beautifully done!

reply
freemanjiang
1 day ago
[-]
Thank you so much! That means a lot.
reply
frakkingcylons
1 day ago
[-]
this is really nice. One request: when searching for a station name, let me type "and" instead of "&" e.g. typing "E 47th St and 2 Ave" would still return "E 47th & 2 Ave".
reply
pimlottc
1 day ago
[-]
It says “entire history” but seems to start at Jan 1, 2025?
reply
ge96
1 day ago
[-]
How was the data gathered? They just publicly show the bike's locations?
reply
RIMR
1 day ago
[-]
reply
ge96
1 day ago
[-]
That's cool it actually came from citibike
reply
netsharc
1 day ago
[-]
They show a bike at a location, if it's rented it will disappear off the map, if it's "returned" (available to hire again) it will show back up on the map, but at a different location.

So "represents one real bike ride" is... I guess a lawyer would say technically true.

I was recording similar location data of a Car2Go-like service for a year or two some years ago, I realize considering they charge rentals by the minute, I could estimate how much they earn by analyzing how long the cars disappear for.

reply
leros
1 day ago
[-]
How is MapBox going for this free tool? Is it costing you money?
reply
freemanjiang
1 day ago
[-]
It definitely will if it blows up more. I'm willing to eat it for now because I think it's art that more people should see!
reply
timeisapear
1 day ago
[-]
Is MapLibre GL a cheaper (free?) open source alternative?

Cool stuff btw. I’m trying to visualize weather model data myself (millions of points) at https://futureradar.net and have been researching client-side techniques like yours.

reply
leros
1 day ago
[-]
It is very cool art!
reply
gnfargbl
1 day ago
[-]
It's often interesting to observe the different ways that privacy is approached in the US and Europe.

In Europe we often accept pretty grave restrictions of our liberty like the UK's Online Safety Act, which would never fly in the US, and we do so without much public comment.

On the other side of things, organisations in the US happily expose datasets like this one, which would give a most EU Data Protection Officers a heart attack, and nobody bats an eyelid.

reply
tennysont
1 day ago
[-]
This data is mandated by NYC law: https://intro.nyc/local-laws/2015-99

I've heard that releasing these sorts of data sets help competitors do market research, and thus mitigates "winner takes all" forces. NYC also tends to be fairly pro-public-datasets: https://data.cityofnewyork.us/browse?%3BsortBy=most_accessed...

reply
freemanjiang
1 day ago
[-]
In Lyft's defense, they are providing it anonymized under the NYCBS Data Use Policy. They also aren't providing the exact GPS routes, which is why OSRM is used to calculate the shortest path instead.
reply
jeffbee
1 day ago
[-]
I don't see anything problematic about start-end pairs from one public facility to another.
reply
wxw
1 day ago
[-]
Awesome work!
reply