How to Extract Profile Data Correctly from Linkedin
meta ai
Almost all companies today rely on LinkedIn to extract candidate profiles during hiring or onboarding. However, despite widespread use, even large enterprises frequently fail to extract complete and accurate profile data. The result is broken or partial imports, dozens of mismatches and formatting errors, and missing sections like certifications, experience, or education. This often forces candidates to manually re-enter or correct the information—costing them time, creating frustration, and negatively impacting their experience.
To read LinkedIn profile details (including licenses and certifications) after authorization, follow this short and structured approach:
✅ Prerequisites
-
LinkedIn Developer Account
-
A registered LinkedIn app
-
OAuth 2.0 access token with
r_liteprofile,r_emailaddress, andr_fullprofile(requires special permission)
🔐 OAuth Authorization (Basic Steps)
-
Redirect user to LinkedIn Auth URL:
https://www.linkedin.com/oauth/v2/authorization?response_type=code
&client_id=YOUR_CLIENT_ID
&redirect_uri=YOUR_REDIRECT_URI
&scope=r_liteprofile%20r_emailaddress%20r_fullprofile
-
Exchange code for access token:
POST https://www.linkedin.com/oauth/v2/accessToken
Content-Type: application/x-www-form-urlencoded
grant_type=authorization_code&
code=AUTHORIZATION_CODE&
redirect_uri=YOUR_REDIRECT_URI&
client_id=YOUR_CLIENT_ID&
client_secret=YOUR_CLIENT_SECRET
📥 API Call to Fetch Profile Data
⚠️ The Licenses & Certifications section is part of Member Profile API (v2), which requires LinkedIn Partner Program access.
Endpoint to fetch certifications (partner-only):
GET https://api.linkedin.com/v2/licenses
Authorization: Bearer ACCESS_TOKEN
Or using the profile projections endpoint (partner access):
GET https://api.linkedin.com/v2/me?projection=(id,firstName,lastName,licensesAndCertifications)
Authorization: Bearer ACCESS_TOKEN
📌 Note
-
Regular apps do not have access to
r_fullprofileorlicensesAndCertifications. -
To access them, apply to LinkedIn Partner Program.
Here’s a complete Streamlit-based LinkedIn OAuth and profile fetch demo, including guidance on Partner access and alternatives.
📁 Folder Structure
linkedin_profile_app/
├── app.py
├── .env
└── requirements.txt
📄 .env
CLIENT_ID=your_linkedin_client_id
CLIENT_SECRET=your_linkedin_client_secret
REDIRECT_URI=http://localhost:8501
📄 requirements.txt
streamlit
requests
python-dotenv
📄 app.py
import streamlit as st
import requests
import os
from urllib.parse import urlencode
from dotenv import load_dotenv
load_dotenv()
CLIENT_ID = os.getenv("CLIENT_ID")
CLIENT_SECRET = os.getenv("CLIENT_SECRET")
REDIRECT_URI = os.getenv("REDIRECT_URI")
AUTH_URL = "https://www.linkedin.com/oauth/v2/authorization"
TOKEN_URL = "https://www.linkedin.com/oauth/v2/accessToken"
PROFILE_URL = "https://api.linkedin.com/v2/me"
SCOPES = "r_liteprofile r_emailaddress"
def get_auth_url():
params = {
"response_type": "code",
"client_id": CLIENT_ID,
"redirect_uri": REDIRECT_URI,
"scope": SCOPES
}
return f"{AUTH_URL}?{urlencode(params)}"
def get_token(auth_code):
data = {
"grant_type": "authorization_code",
"code": auth_code,
"redirect_uri": REDIRECT_URI,
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET
}
response = requests.post(TOKEN_URL, data=data, headers={"Content-Type": "application/x-www-form-urlencoded"})
return response.json().get("access_token")
def fetch_profile(access_token):
headers = {"Authorization": f"Bearer {access_token}"}
response = requests.get(PROFILE_URL, headers=headers)
return response.json()
def main():
st.title("🔗 LinkedIn Profile Fetch")
query_params = st.experimental_get_query_params()
auth_code = query_params.get("code", [None])[0]
if auth_code:
access_token = get_token(auth_code)
if access_token:
profile = fetch_profile(access_token)
st.success("Profile fetched successfully!")
st.json(profile)
else:
st.error("Failed to get access token.")
else:
auth_url = get_auth_url()
st.markdown(f"[🔐 Authorize with LinkedIn]({auth_url})")
if __name__ == "__main__":
main()
🚫 Certifications & Licenses Access (Important Note)
LinkedIn does not allow access to licensesAndCertifications through the public API. You must:
-
Apply to the LinkedIn Marketing or Learning Partner Program
-
Get access to
r_fullprofileand restricted endpoints likelicenses,certifications, etc.
✅ Workaround Options
-
LinkedIn Data Export (Manual User Upload)
Ask user to export their LinkedIn data:
https://www.linkedin.com/psettings/member-data→ Select JSON → Upload and parse theLicenses & certifications.json. -
Unofficial Puppeteer/Selenium-based scraper
Not recommended – violates TOS and risks ban.
Here's a LinkedIn Data Export JSON parser built with Flask that reads the exported ZIP, extracts the Licenses & Certifications, and displays them:
✅ Folder Structure
linkedin_parser_app/
├── app.py
├── templates/
│ └── index.html
├── uploads/
└── requirements.txt
📄 requirements.txt
Flask
python-dotenv
📄 templates/index.html
<!DOCTYPE html>
<html>
<head>
<title>LinkedIn Data Parser</title>
</head>
<body>
<h2>Upload LinkedIn Export ZIP</h2>
<form action="/" method="post" enctype="multipart/form-data">
<input type="file" name="zipfile" required>
<button type="submit">Upload & Parse</button>
</form>
{% if certifications %}
<h3>Licenses & Certifications</h3>
<ul>
{% for cert in certifications %}
<li><strong>{{ cert['name'] }}</strong> - {{ cert['authority'] }} ({{ cert['start_date'] }}{% if cert['end_date'] %} to {{ cert['end_date'] }}{% endif %})</li>
{% endfor %}
</ul>
{% endif %}
</body>
</html>
📄 app.py
from flask import Flask, request, render_template
import os, zipfile, json, tempfile
app = Flask(__name__)
UPLOAD_FOLDER = 'uploads'
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
def extract_certifications_from_json(json_data):
certs = []
for item in json_data:
certs.append({
"name": item.get("name"),
"authority": item.get("authority", {}).get("name", ""),
"start_date": item.get("starts_on", {}).get("year", ""),
"end_date": item.get("ends_on", {}).get("year", "")
})
return certs
@app.route("/", methods=["GET", "POST"])
def index():
certifications = []
if request.method == "POST":
zip_file = request.files["zipfile"]
if zip_file and zip_file.filename.endswith(".zip"):
with tempfile.TemporaryDirectory() as tmpdirname:
zip_path = os.path.join(tmpdirname, zip_file.filename)
zip_file.save(zip_path)
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall(tmpdirname)
cert_path = os.path.join(tmpdirname, 'Licenses & certifications.json')
if os.path.exists(cert_path):
with open(cert_path, 'r', encoding='utf-8') as f:
data = json.load(f)
certifications = extract_certifications_from_json(data)
return render_template("index.html", certifications=certifications)
if __name__ == "__main__":
app.run(debug=True)
✅ Usage
-
Ask users to download their LinkedIn data export (JSON).
-
Upload the ZIP via this app.
-
It parses and displays Licenses & Certifications directly.
Here is the REST API version with FastAPI to parse LinkedIn Data Export ZIP and return Licenses & Certifications in JSON format:
✅ Folder Structure
linkedin_fastapi_parser/
├── main.py
├── utils.py
├── requirements.txt
└── uploads/
📄 requirements.txt
fastapi
uvicorn
python-multipart
📄 utils.py
import zipfile, tempfile, json, os
def extract_certifications_from_zip(zip_file) -> list:
with tempfile.TemporaryDirectory() as tmpdir:
zip_path = os.path.join(tmpdir, "upload.zip")
with open(zip_path, "wb") as f:
f.write(zip_file.read())
with zipfile.ZipFile(zip_path, "r") as zip_ref:
zip_ref.extractall(tmpdir)
cert_path = os.path.join(tmpdir, "Licenses & certifications.json")
if not os.path.exists(cert_path):
return []
with open(cert_path, "r", encoding="utf-8") as f:
data = json.load(f)
return [
{
"name": c.get("name"),
"authority": c.get("authority", {}).get("name", ""),
"start_date": c.get("starts_on", {}).get("year", ""),
"end_date": c.get("ends_on", {}).get("year", "")
}
for c in data
]
📄 main.py
from fastapi import FastAPI, UploadFile, File, HTTPException
from utils import extract_certifications_from_zip
app = FastAPI()
@app.post("/upload")
async def upload_linkedin_zip(file: UploadFile = File(...)):
if not file.filename.endswith(".zip"):
raise HTTPException(status_code=400, detail="Only ZIP files are allowed.")
certifications = extract_certifications_from_zip(await file.read())
if not certifications:
raise HTTPException(status_code=404, detail="No certifications found in the ZIP.")
return {"certifications": certifications}
✅ Run the Server
uvicorn main:app --reload
Test at:
http://localhost:8000/docs → Use /upload with a LinkedIn ZIP file.

Comments