AskGit recently added support for querying against data in the GitHub API. Repositories and pull requests can be listed by calling several table-valued functions in your SQL queries. More resource types from the GitHub API are planned and will be integrated soon!
github_pull_requests('REPO_OWNER', REPO_NAME') in the
FROM clause of a query will list all pull requests for the specified repository. There are optimizations in place to avoid a full table scan when possible (i.e. a paginated API walk for every pull request object, even if it won’t be used in the query). For instance:
Will know to only download the latest 100 pull requests in the repo (even though there are 10k+ in total, which wouldn’t show up in the result set).
One interesting use case of this new table is querying for the average time between PR creation and PR merge.
For instance, the following will tell us what the average “time to merge” (in hours) is of the last 500 pull requests to the
At the time of writing, this outputs the following:
| COUNT(*) | HOURS TO MERGE |
| 500 | 63.574957778044045 |
Roughly two and a half days from PR creation to PR merge for the react codebase.
Why is this useful?
That’s a good question! I’ll leave it to you to explore your own use cases, but here are some ideas that come to mind:
- You just opened a PR on an open-source repo and want to know how long you might have to wait before it merges (maybe an “average time to comment” is more relevant here).
- You want to use time-to-merge as a way of measuring an open-source project’s responsiveness, and compare it with other projects.
- You want some proxy measure for “cycle time” to help you understand the velocity of your team in a codebase.
- You want to look for indicators for what might make a PR merge time short or long (do big changes increase review time?).
- You want to break out time-to-merge by GitHub user to understand how quickly your own PRs merge relative to teammates.