I often see teams pull GitHub user data in the most fragile way possible: a quick curl call during a hack day that slowly turns into a production dependency. That path works until it doesn’t—rate limits kick in, user records change shape, or a single missing header turns into a cascade of failed jobs. If you’ve ever built a dashboard, onboarding flow, or internal tooling that depends on GitHub identities, you’ve felt this pain. I want to show you a safer, modern way to fetch GitHub users using the GitHub REST API (v3) while keeping your code readable and your requests respectful.
The plan is simple: I’ll start with the core endpoints, then walk through a concrete profile fetch, listing users with pagination, and searching users with qualifiers. From there I’ll layer on authentication, caching, and rate-limit handling so your code stays stable under load. I’ll also call out common mistakes I still see in 2026 and how I avoid them in my own systems. By the end, you should be able to fetch user data confidently, know when not to hit the API, and have drop-in examples for JavaScript and Python.
Getting the mental model right
GitHub’s REST API is an HTTPS JSON API rooted at https://api.github.com. Every response is JSON, and most of what you’ll do with users boils down to three workflows:
1) Fetch a specific user by login.
2) List users in a range, page by page.
3) Search users using qualifiers, then fetch details for each result you actually need.
The important mental shift is that GitHub treats the user list as a gigantic sequence rather than a strict “page 1, page 2” dataset. That’s why the list endpoint uses a since parameter: you ask for “users after ID X,” then keep going. It’s like walking through a long bookshelf where each book has a number on its spine. You don’t ask for “shelf 3,” you ask for “books after 14000000.” That model affects pagination, caching, and how you store checkpoints.
Also, there are two kinds of data you might care about:
- Lightweight user summaries from search or list endpoints.
- Full user profiles from the dedicated user endpoint.
The summary is enough for previews or quick UI lists. The full profile is what you need for detail pages or decisions based on public data such as company, location, or blog. I always design my pipeline so I only fetch full profiles when I truly need them.
Finally, remember that public data is still personal data. Treat it with care. If your app is storing user details, document your purpose and retention. I’ve seen teams pull everything because the API returns it, then later scramble to justify why it’s stored. You should be deliberate from day one.
Fast path: fetch a user profile in seconds
The single-user endpoint is the cleanest way to get a profile:
- GET /users/{username}
You can call it directly from the command line:
Shell (curl):
curl https://api.github.com/users/ayushsnha
That response includes fields like login, id, avatarurl, htmlurl, public_repos, followers, and timestamps. I recommend skimming the JSON once so you understand the shape, but in code you should only read what you actually use. It’s very tempting to map everything to a giant model in your app. I don’t do that anymore. Instead, I define a small interface that mirrors the fields I care about and ignore the rest. It keeps my clients stable when the API adds new fields.
Here is a compact example response shape you might extract:
Profile fields I usually keep:
login
id
avatar_url
html_url
name
company
blog
location
public_repos
followers
following
created_at
updated_at
If you want a quick sanity check, open the user’s profile in a browser and compare the pieces. I treat that step like a visual test: I confirm the API fields match what I see on the web profile, then move on.
Once you have the user profile, the next logical hop is to their repositories and commits. The repo list endpoint is:
- GET /users/{username}/repos
And the commit list endpoint is:
- GET /repos/{owner}/{repo}/commits
I’m including these because many “fetch a user” flows evolve into “fetch a user and then the recent work they’ve done.” If that’s you, make sure you fetch the user profile first and then decide which repos are relevant. It’s better than pulling all repos and all commits by default.
Listing users and pagination without surprises
If you need to walk across users—for example, to build a discovery dataset or to check for name collisions—you’ll use:
- GET /users?since={id}&per_page={n}
This is not a search. It’s an ordered list, and that order is by internal user ID, not by signup time. The since parameter means “start after this user ID.” A typical flow looks like this:
1) Start with since=0 (or your last checkpoint).
2) Read the array of users.
3) Store the id of the last user in the page.
4) Request the next page using since=last_id.
The result is a stable stream you can resume later. If you store your checkpoint, you can stop and continue without overlap. That’s a big difference from page-based pagination where page boundaries can shift.
I’ve seen teams build a batch process that requests page=1, page=2, page=3 and later wonder why they got duplicates or missed records. The since pattern avoids that. It’s like using a bookmark rather than counting pages in a book.
Pagination also shows up in search results. For search, you do have page and perpage parameters. In 2026, I still keep perpage between 30 and 100 depending on response size, but I avoid 100 unless I really need it because larger payloads amplify slow networks and error retries.
A helpful tip: always parse the Link header when it’s present. GitHub uses it to tell you the next and last pages. In most HTTP client libraries, it’s a single header value you can parse into rel=next and rel=last. I keep a tiny helper for that rather than re-implementing ad hoc logic in every service.
Searching users like a pro (and when not to)
Search is powerful, but it’s also the most expensive way to get user data, so I use it sparingly. The endpoint is:
- GET /search/users?q={query}
A good query uses qualifiers. For example:
- language:javascript location:"New York" followers:>100
- type:user in:login john
- repos:>10 followers:>50
Search results are summaries, not full profiles. That’s where many teams overfetch: they run a search, then immediately fetch every user profile in the list. That can be heavy, and it usually breaks under rate limits. My approach is to filter the search results first, then fetch full profiles only for the final candidates.
There are a few cases where I avoid search entirely:
- If I already have a login. Use /users/{login}.
- If I need a deterministic list. Use the /users?since stream or your own stored list.
- If I’m building an internal tool. I often store the data I need and update it incrementally instead of re-searching every time.
I also keep in mind that search results can be fuzzy. For example, searching by name is often less reliable than searching by login or email. GitHub intentionally limits some search capabilities for privacy reasons, and you should respect that. If your product requires high accuracy, confirm with the user or use OAuth to let them connect their own account.
Auth, rate limits, and responsible access
Unauthenticated requests work, but the rate limit is tight. I rarely ship production code without authentication. There are two main options:
- Personal access tokens (fine for server-to-server or internal tools).
- OAuth apps or GitHub Apps (better for user-facing features).
I usually start with a GitHub App for anything public, because it scales better and keeps permissions scoped. For internal automation, a fine-grained personal access token is okay as long as you store it securely.
When you authenticate, include the token as a Bearer header:
HTTP header:
Authorization: Bearer YOUR_TOKEN
GitHub exposes rate limit info in headers such as X-RateLimit-Remaining and X-RateLimit-Reset. I treat those headers as first-class signals. If remaining is low, I back off and either delay or skip requests. If I’m running a batch job, I stop early, store my checkpoint, and resume later. You should do the same rather than pushing until you get blocked.
I also recommend conditional requests with ETag. GitHub supports If-None-Match, which lets you re-check a user profile without downloading it if it hasn’t changed. In practice, this cuts bandwidth and keeps you under rate limits. It’s like asking, “Has this file changed since I last looked?” instead of re-downloading it each time.
Another modern pattern I use in 2026 is to front the API calls with a small cache service. For example, a Redis cache keyed by user login with a TTL of a few hours. That way, your UI or back office doesn’t hammer the GitHub API repeatedly for the same user profile. It also gives you a reliable fallback when the API is slow or temporarily unavailable.
Production-ready client patterns (JavaScript and Python)
I prefer to show two examples: JavaScript for most web stacks, and Python for backend automation. Both examples assume you have a token in an environment variable. I keep the models minimal, parse the data I need, and add basic error handling with a readable control flow.
JavaScript (Node.js, fetch API):
import process from "node:process";
const BASE_URL = "https://api.github.com";
const TOKEN = process.env.GITHUB_TOKEN;
async function fetchUser(login) {
const res = await fetch(${BASE_URL}/users/${login}, {
headers: {
"Accept": "application/vnd.github+json",
"Authorization": Bearer ${TOKEN},
"User-Agent": "user-fetch-demo"
}
});
if (res.status === 404) {
return null; // user does not exist
}
if (!res.ok) {
const body = await res.text();
throw new Error(GitHub API error ${res.status}: ${body});
}
const data = await res.json();
return {
login: data.login,
id: data.id,
avatarUrl: data.avatar_url,
profileUrl: data.html_url,
name: data.name,
company: data.company,
blog: data.blog,
location: data.location,
publicRepos: data.public_repos,
followers: data.followers,
following: data.following,
createdAt: data.created_at,
updatedAt: data.updated_at
};
}
const user = await fetchUser("ayushsnha");
console.log(user);
A few things to note in this example:
- I include Accept and User-Agent headers to be explicit.
- I treat 404 as a real case, not an error.
- I parse only the fields I use in my app.
Python (requests):
import os
import requests
BASE_URL = "https://api.github.com"
TOKEN = os.environ.get("GITHUB_TOKEN")
def fetch_user(login: str) -> dict | None:
headers = {
"Accept": "application/vnd.github+json",
"Authorization": f"Bearer {TOKEN}",
"User-Agent": "user-fetch-demo",
}
url = f"{BASE_URL}/users/{login}"
res = requests.get(url, headers=headers, timeout=10)
if res.status_code == 404:
return None
if not res.ok:
raise RuntimeError(f"GitHub API error {res.status_code}: {res.text}")
data = res.json()
return {
"login": data["login"],
"id": data["id"],
"avatarurl": data["avatarurl"],
"htmlurl": data["htmlurl"],
"name": data.get("name"),
"company": data.get("company"),
"blog": data.get("blog"),
"location": data.get("location"),
"publicrepos": data.get("publicrepos"),
"followers": data.get("followers"),
"following": data.get("following"),
"createdat": data.get("createdat"),
"updatedat": data.get("updatedat"),
}
if name == "main":
user = fetch_user("ayushsnha")
print(user)
If you’re using TypeScript or a typed Python stack, define a narrow interface instead of dumping the full response into a giant type. I also recommend adding a tiny retry wrapper with exponential backoff for transient 502 or 503 responses. The API is stable, but every external service has hiccups. A simple backoff can save you a lot of support tickets.
Traditional vs modern approaches
I’ve used both styles over the years, and I now default to the modern column unless I have a strong reason not to.
Traditional approach
—
Call /users/{login} every time
Page numbers and offsets
Search and fetch every profile
One retry or none
Logs only
Edge cases, mistakes, and safety checks
Here are the pitfalls I see most often, along with how I avoid them:
1) Treating login as case-sensitive. GitHub logins are case-insensitive, but the API returns a canonical case. I normalize logins to lowercase for storage and display the canonical login for UI.
2) Confusing user and organization endpoints. Some logins are organizations. The /users/{login} endpoint returns type: "User" or "Organization". Check that field before you assume you’re dealing with a person.
3) Ignoring 304 Not Modified. If you use ETag but still parse the body when you get 304, you’ll break. A 304 has no body. I store the cached data and return it on 304.
4) Assuming created_at is “signup time.” It usually is, but there are rare cases where migrations or imported accounts can make it tricky. If the exact date is critical, treat it as approximate.
5) Over-fetching followers. The followers endpoint is paginated and can be massive. If you just need a count, use the followers field on the profile instead of walking the followers list.
6) Forgetting that deleted users disappear. If a user is deleted, the login might return 404. You should handle that gracefully and remove or archive the record in your system.
7) Not validating user-provided input. If you accept a username, validate it before calling the API. I use a simple regex that matches allowed characters and length, then reject anything else. This prevents weird API errors and reduces support issues.
8) Treating the API as a database. It’s an API, not your primary data store. Pull what you need, store what you must, and keep a refresh strategy.
If you’ve ever built a cron job that hammers the API once per minute, you know the pain. A better approach is to schedule updates based on user activity or on a weekly cadence. In my experience, most profiles don’t change daily, so a daily refresh is often overkill.
Practical next steps
If you’re starting from scratch, I recommend you build a small “user service” module that hides the HTTP details. Keep the raw API calls in one place, expose a small function like fetchUser(login), and add caching from day one. That keeps your app logic clean and gives you a single spot to upgrade when GitHub changes headers or behavior. Next, decide how fresh your data needs to be. For a public profile badge, a 6–12 hour cache is usually fine. For a hiring workflow, you might want a shorter TTL or a manual refresh button.
I also suggest you document your rate-limit strategy in the repo. It doesn’t need to be fancy—just a short note describing how you handle 403 responses, what headers you read, and when you back off. This makes on-call debugging faster and helps new engineers avoid accidental traffic spikes. If you’re pulling users for analytics, add a checkpoint table with last_id and a timestamp so you can resume after failures.
Finally, be deliberate about data use. Only store the fields you need, and be ready to delete them if a user asks. If you’re building a public feature, consider letting users connect their account with OAuth so you can access richer data without scraping. That approach also builds trust with your users and avoids surprises.
When you’re ready, take the code examples above, swap in a real token, and test with a few known logins. Then add one more layer: a small retry wrapper and a cache. That combination is what turns a “works on my machine” script into a stable user-data pipeline. If you do that, you’ll fetch GitHub users reliably in 2026 and beyond without waking up to broken jobs.



