How to Extract Profile Data Correctly from Linkedin

meta ai

Almost all companies today rely on LinkedIn to extract candidate profiles during hiring or onboarding. However, despite widespread use, even large enterprises frequently fail to extract complete and accurate profile data. The result is broken or partial imports, dozens of mismatches and formatting errors, and missing sections like certifications, experience, or education. This often forces candidates to manually re-enter or correct the information—costing them time, creating frustration, and negatively impacting their experience.

To read LinkedIn profile details (including licenses and certifications) after authorization, follow this short and structured approach:

✅ Prerequisites

LinkedIn Developer Account
A registered LinkedIn app
OAuth 2.0 access token with r_liteprofile, r_emailaddress, and r_fullprofile (requires special permission)

🔐 OAuth Authorization (Basic Steps)

Redirect user to LinkedIn Auth URL:

https://www.linkedin.com/oauth/v2/authorization?response_type=code
&client_id=YOUR_CLIENT_ID
&redirect_uri=YOUR_REDIRECT_URI
&scope=r_liteprofile%20r_emailaddress%20r_fullprofile

Exchange code for access token:

POST https://www.linkedin.com/oauth/v2/accessToken
Content-Type: application/x-www-form-urlencoded

grant_type=authorization_code&
code=AUTHORIZATION_CODE&
redirect_uri=YOUR_REDIRECT_URI&
client_id=YOUR_CLIENT_ID&
client_secret=YOUR_CLIENT_SECRET

📥 API Call to Fetch Profile Data

⚠️ The Licenses & Certifications section is part of Member Profile API (v2), which requires LinkedIn Partner Program access.

Endpoint to fetch certifications (partner-only):

GET https://api.linkedin.com/v2/licenses
Authorization: Bearer ACCESS_TOKEN

Or using the profile projections endpoint (partner access):

GET https://api.linkedin.com/v2/me?projection=(id,firstName,lastName,licensesAndCertifications)
Authorization: Bearer ACCESS_TOKEN

📌 Note

Regular apps do not have access to r_fullprofile or licensesAndCertifications.
To access them, apply to LinkedIn Partner Program.

Here’s a complete Streamlit-based LinkedIn OAuth and profile fetch demo, including guidance on Partner access and alternatives.

📁 Folder Structure

linkedin_profile_app/
├── app.py
├── .env
└── requirements.txt

📄 .env

CLIENT_ID=your_linkedin_client_id
CLIENT_SECRET=your_linkedin_client_secret
REDIRECT_URI=http://localhost:8501

📄 requirements.txt

streamlit
requests
python-dotenv

📄 app.py

import streamlit as st
import requests
import os
from urllib.parse import urlencode
from dotenv import load_dotenv

load_dotenv()

CLIENT_ID = os.getenv("CLIENT_ID")
CLIENT_SECRET = os.getenv("CLIENT_SECRET")
REDIRECT_URI = os.getenv("REDIRECT_URI")

AUTH_URL = "https://www.linkedin.com/oauth/v2/authorization"
TOKEN_URL = "https://www.linkedin.com/oauth/v2/accessToken"
PROFILE_URL = "https://api.linkedin.com/v2/me"

SCOPES = "r_liteprofile r_emailaddress"

def get_auth_url():
    params = {
        "response_type": "code",
        "client_id": CLIENT_ID,
        "redirect_uri": REDIRECT_URI,
        "scope": SCOPES
    }
    return f"{AUTH_URL}?{urlencode(params)}"

def get_token(auth_code):
    data = {
        "grant_type": "authorization_code",
        "code": auth_code,
        "redirect_uri": REDIRECT_URI,
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET
    }
    response = requests.post(TOKEN_URL, data=data, headers={"Content-Type": "application/x-www-form-urlencoded"})
    return response.json().get("access_token")

def fetch_profile(access_token):
    headers = {"Authorization": f"Bearer {access_token}"}
    response = requests.get(PROFILE_URL, headers=headers)
    return response.json()

def main():
    st.title("🔗 LinkedIn Profile Fetch")

    query_params = st.experimental_get_query_params()
    auth_code = query_params.get("code", [None])[0]

    if auth_code:
        access_token = get_token(auth_code)
        if access_token:
            profile = fetch_profile(access_token)
            st.success("Profile fetched successfully!")
            st.json(profile)
        else:
            st.error("Failed to get access token.")
    else:
        auth_url = get_auth_url()
        st.markdown(f"[🔐 Authorize with LinkedIn]({auth_url})")

if __name__ == "__main__":
    main()

🚫 Certifications & Licenses Access (Important Note)

LinkedIn does not allow access to licensesAndCertifications through the public API. You must:

Apply to the LinkedIn Marketing or Learning Partner Program
Get access to r_fullprofile and restricted endpoints like licenses, certifications, etc.

✅ Workaround Options

LinkedIn Data Export (Manual User Upload)
Ask user to export their LinkedIn data:
https://www.linkedin.com/psettings/member-data → Select JSON → Upload and parse the Licenses & certifications.json.
Unofficial Puppeteer/Selenium-based scraper
Not recommended – violates TOS and risks ban.

Here's a LinkedIn Data Export JSON parser built with Flask that reads the exported ZIP, extracts the Licenses & Certifications, and displays them:

✅ Folder Structure

linkedin_parser_app/
├── app.py
├── templates/
│   └── index.html
├── uploads/
└── requirements.txt

📄 `requirements.txt`

Flask
python-dotenv

📄 `templates/index.html`

<!DOCTYPE html>
<html>
<head>
    <title>LinkedIn Data Parser</title>
</head>
<body>
    <h2>Upload LinkedIn Export ZIP</h2>
    <form action="/" method="post" enctype="multipart/form-data">
        <input type="file" name="zipfile" required>
        <button type="submit">Upload & Parse</button>
    </form>

    {% if certifications %}
        <h3>Licenses & Certifications</h3>
        <ul>
        {% for cert in certifications %}
            <li><strong>{{ cert['name'] }}</strong> - {{ cert['authority'] }} ({{ cert['start_date'] }}{% if cert['end_date'] %} to {{ cert['end_date'] }}{% endif %})</li>
        {% endfor %}
        </ul>
    {% endif %}
</body>
</html>

📄 `app.py`

from flask import Flask, request, render_template
import os, zipfile, json, tempfile

app = Flask(__name__)
UPLOAD_FOLDER = 'uploads'
os.makedirs(UPLOAD_FOLDER, exist_ok=True)

def extract_certifications_from_json(json_data):
    certs = []
    for item in json_data:
        certs.append({
            "name": item.get("name"),
            "authority": item.get("authority", {}).get("name", ""),
            "start_date": item.get("starts_on", {}).get("year", ""),
            "end_date": item.get("ends_on", {}).get("year", "")
        })
    return certs

@app.route("/", methods=["GET", "POST"])
def index():
    certifications = []
    if request.method == "POST":
        zip_file = request.files["zipfile"]
        if zip_file and zip_file.filename.endswith(".zip"):
            with tempfile.TemporaryDirectory() as tmpdirname:
                zip_path = os.path.join(tmpdirname, zip_file.filename)
                zip_file.save(zip_path)
                with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                    zip_ref.extractall(tmpdirname)
                
                cert_path = os.path.join(tmpdirname, 'Licenses & certifications.json')
                if os.path.exists(cert_path):
                    with open(cert_path, 'r', encoding='utf-8') as f:
                        data = json.load(f)
                        certifications = extract_certifications_from_json(data)
    return render_template("index.html", certifications=certifications)

if __name__ == "__main__":
    app.run(debug=True)

✅ Usage

Ask users to download their LinkedIn data export (JSON).
Upload the ZIP via this app.
It parses and displays Licenses & Certifications directly.

Here is the REST API version with FastAPI to parse LinkedIn Data Export ZIP and return Licenses & Certifications in JSON format:

✅ Folder Structure

linkedin_fastapi_parser/
├── main.py
├── utils.py
├── requirements.txt
└── uploads/

📄 `requirements.txt`

fastapi
uvicorn
python-multipart

📄 `utils.py`

import zipfile, tempfile, json, os

def extract_certifications_from_zip(zip_file) -> list:
    with tempfile.TemporaryDirectory() as tmpdir:
        zip_path = os.path.join(tmpdir, "upload.zip")
        with open(zip_path, "wb") as f:
            f.write(zip_file.read())

        with zipfile.ZipFile(zip_path, "r") as zip_ref:
            zip_ref.extractall(tmpdir)

        cert_path = os.path.join(tmpdir, "Licenses & certifications.json")
        if not os.path.exists(cert_path):
            return []

        with open(cert_path, "r", encoding="utf-8") as f:
            data = json.load(f)

        return [
            {
                "name": c.get("name"),
                "authority": c.get("authority", {}).get("name", ""),
                "start_date": c.get("starts_on", {}).get("year", ""),
                "end_date": c.get("ends_on", {}).get("year", "")
            }
            for c in data
        ]

📄 `main.py`

from fastapi import FastAPI, UploadFile, File, HTTPException
from utils import extract_certifications_from_zip

app = FastAPI()

@app.post("/upload")
async def upload_linkedin_zip(file: UploadFile = File(...)):
    if not file.filename.endswith(".zip"):
        raise HTTPException(status_code=400, detail="Only ZIP files are allowed.")
    
    certifications = extract_certifications_from_zip(await file.read())
    if not certifications:
        raise HTTPException(status_code=404, detail="No certifications found in the ZIP.")
    
    return {"certifications": certifications}

✅ Run the Server

uvicorn main:app --reload

Test at:
http://localhost:8000/docs → Use /upload with a LinkedIn ZIP file.

Search This Blog

Think Different