How to Extract Profile Data Correctly from Linkedin

 

                                                                         meta ai

Almost all companies today rely on LinkedIn to extract candidate profiles during hiring or onboarding. However, despite widespread use, even large enterprises frequently fail to extract complete and accurate profile data. The result is broken or partial imports, dozens of mismatches and formatting errors, and missing sections like certifications, experience, or education. This often forces candidates to manually re-enter or correct the information—costing them time, creating frustration, and negatively impacting their experience.

To read LinkedIn profile details (including licenses and certifications) after authorization, follow this short and structured approach:


✅ Prerequisites

  • LinkedIn Developer Account

  • A registered LinkedIn app

  • OAuth 2.0 access token with r_liteprofile, r_emailaddress, and r_fullprofile (requires special permission)


🔐 OAuth Authorization (Basic Steps)

  1. Redirect user to LinkedIn Auth URL:

https://www.linkedin.com/oauth/v2/authorization?response_type=code
&client_id=YOUR_CLIENT_ID
&redirect_uri=YOUR_REDIRECT_URI
&scope=r_liteprofile%20r_emailaddress%20r_fullprofile
  1. Exchange code for access token:

POST https://www.linkedin.com/oauth/v2/accessToken
Content-Type: application/x-www-form-urlencoded

grant_type=authorization_code&
code=AUTHORIZATION_CODE&
redirect_uri=YOUR_REDIRECT_URI&
client_id=YOUR_CLIENT_ID&
client_secret=YOUR_CLIENT_SECRET

📥 API Call to Fetch Profile Data

⚠️ The Licenses & Certifications section is part of Member Profile API (v2), which requires LinkedIn Partner Program access.

Endpoint to fetch certifications (partner-only):

GET https://api.linkedin.com/v2/licenses
Authorization: Bearer ACCESS_TOKEN

Or using the profile projections endpoint (partner access):

GET https://api.linkedin.com/v2/me?projection=(id,firstName,lastName,licensesAndCertifications)
Authorization: Bearer ACCESS_TOKEN            

📌 Note

  • Regular apps do not have access to r_fullprofile or licensesAndCertifications.

  • To access them, apply to LinkedIn Partner Program.


Here’s a complete Streamlit-based LinkedIn OAuth and profile fetch demo, including guidance on Partner access and alternatives.


📁 Folder Structure

linkedin_profile_app/
├── app.py
├── .env
└── requirements.txt

📄 .env

CLIENT_ID=your_linkedin_client_id
CLIENT_SECRET=your_linkedin_client_secret
REDIRECT_URI=http://localhost:8501

📄 requirements.txt

streamlit
requests
python-dotenv

📄 app.py

import streamlit as st
import requests
import os
from urllib.parse import urlencode
from dotenv import load_dotenv

load_dotenv()

CLIENT_ID = os.getenv("CLIENT_ID")
CLIENT_SECRET = os.getenv("CLIENT_SECRET")
REDIRECT_URI = os.getenv("REDIRECT_URI")

AUTH_URL = "https://www.linkedin.com/oauth/v2/authorization"
TOKEN_URL = "https://www.linkedin.com/oauth/v2/accessToken"
PROFILE_URL = "https://api.linkedin.com/v2/me"

SCOPES = "r_liteprofile r_emailaddress"

def get_auth_url():
    params = {
        "response_type": "code",
        "client_id": CLIENT_ID,
        "redirect_uri": REDIRECT_URI,
        "scope": SCOPES
    }
    return f"{AUTH_URL}?{urlencode(params)}"

def get_token(auth_code):
    data = {
        "grant_type": "authorization_code",
        "code": auth_code,
        "redirect_uri": REDIRECT_URI,
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET
    }
    response = requests.post(TOKEN_URL, data=data, headers={"Content-Type": "application/x-www-form-urlencoded"})
    return response.json().get("access_token")

def fetch_profile(access_token):
    headers = {"Authorization": f"Bearer {access_token}"}
    response = requests.get(PROFILE_URL, headers=headers)
    return response.json()

def main():
    st.title("🔗 LinkedIn Profile Fetch")

    query_params = st.experimental_get_query_params()
    auth_code = query_params.get("code", [None])[0]

    if auth_code:
        access_token = get_token(auth_code)
        if access_token:
            profile = fetch_profile(access_token)
            st.success("Profile fetched successfully!")
            st.json(profile)
        else:
            st.error("Failed to get access token.")
    else:
        auth_url = get_auth_url()
        st.markdown(f"[🔐 Authorize with LinkedIn]({auth_url})")

if __name__ == "__main__":
    main()

🚫 Certifications & Licenses Access (Important Note)

LinkedIn does not allow access to licensesAndCertifications through the public API. You must:


✅ Workaround Options

  1. LinkedIn Data Export (Manual User Upload)
    Ask user to export their LinkedIn data:
    https://www.linkedin.com/psettings/member-data → Select JSON → Upload and parse the Licenses & certifications.json.

  2. Unofficial Puppeteer/Selenium-based scraper
    Not recommended – violates TOS and risks ban.


Here's a LinkedIn Data Export JSON parser built with Flask that reads the exported ZIP, extracts the Licenses & Certifications, and displays them:


✅ Folder Structure

linkedin_parser_app/
├── app.py
├── templates/
│   └── index.html
├── uploads/
└── requirements.txt

📄 requirements.txt

Flask
python-dotenv

📄 templates/index.html

<!DOCTYPE html>
<html>
<head>
    <title>LinkedIn Data Parser</title>
</head>
<body>
    <h2>Upload LinkedIn Export ZIP</h2>
    <form action="/" method="post" enctype="multipart/form-data">
        <input type="file" name="zipfile" required>
        <button type="submit">Upload & Parse</button>
    </form>

    {% if certifications %}
        <h3>Licenses & Certifications</h3>
        <ul>
        {% for cert in certifications %}
            <li><strong>{{ cert['name'] }}</strong> - {{ cert['authority'] }} ({{ cert['start_date'] }}{% if cert['end_date'] %} to {{ cert['end_date'] }}{% endif %})</li>
        {% endfor %}
        </ul>
    {% endif %}
</body>
</html>

📄 app.py

from flask import Flask, request, render_template
import os, zipfile, json, tempfile

app = Flask(__name__)
UPLOAD_FOLDER = 'uploads'
os.makedirs(UPLOAD_FOLDER, exist_ok=True)

def extract_certifications_from_json(json_data):
    certs = []
    for item in json_data:
        certs.append({
            "name": item.get("name"),
            "authority": item.get("authority", {}).get("name", ""),
            "start_date": item.get("starts_on", {}).get("year", ""),
            "end_date": item.get("ends_on", {}).get("year", "")
        })
    return certs

@app.route("/", methods=["GET", "POST"])
def index():
    certifications = []
    if request.method == "POST":
        zip_file = request.files["zipfile"]
        if zip_file and zip_file.filename.endswith(".zip"):
            with tempfile.TemporaryDirectory() as tmpdirname:
                zip_path = os.path.join(tmpdirname, zip_file.filename)
                zip_file.save(zip_path)
                with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                    zip_ref.extractall(tmpdirname)
                
                cert_path = os.path.join(tmpdirname, 'Licenses & certifications.json')
                if os.path.exists(cert_path):
                    with open(cert_path, 'r', encoding='utf-8') as f:
                        data = json.load(f)
                        certifications = extract_certifications_from_json(data)
    return render_template("index.html", certifications=certifications)

if __name__ == "__main__":
    app.run(debug=True)

✅ Usage

  1. Ask users to download their LinkedIn data export (JSON).

  2. Upload the ZIP via this app.

  3. It parses and displays Licenses & Certifications directly.


Here is the REST API version with FastAPI to parse LinkedIn Data Export ZIP and return Licenses & Certifications in JSON format:


✅ Folder Structure

linkedin_fastapi_parser/
├── main.py
├── utils.py
├── requirements.txt
└── uploads/

📄 requirements.txt

fastapi
uvicorn
python-multipart

📄 utils.py

import zipfile, tempfile, json, os

def extract_certifications_from_zip(zip_file) -> list:
    with tempfile.TemporaryDirectory() as tmpdir:
        zip_path = os.path.join(tmpdir, "upload.zip")
        with open(zip_path, "wb") as f:
            f.write(zip_file.read())

        with zipfile.ZipFile(zip_path, "r") as zip_ref:
            zip_ref.extractall(tmpdir)

        cert_path = os.path.join(tmpdir, "Licenses & certifications.json")
        if not os.path.exists(cert_path):
            return []

        with open(cert_path, "r", encoding="utf-8") as f:
            data = json.load(f)

        return [
            {
                "name": c.get("name"),
                "authority": c.get("authority", {}).get("name", ""),
                "start_date": c.get("starts_on", {}).get("year", ""),
                "end_date": c.get("ends_on", {}).get("year", "")
            }
            for c in data
        ]

📄 main.py

from fastapi import FastAPI, UploadFile, File, HTTPException
from utils import extract_certifications_from_zip

app = FastAPI()

@app.post("/upload")
async def upload_linkedin_zip(file: UploadFile = File(...)):
    if not file.filename.endswith(".zip"):
        raise HTTPException(status_code=400, detail="Only ZIP files are allowed.")
    
    certifications = extract_certifications_from_zip(await file.read())
    if not certifications:
        raise HTTPException(status_code=404, detail="No certifications found in the ZIP.")
    
    return {"certifications": certifications}

✅ Run the Server

uvicorn main:app --reload

Test at:
http://localhost:8000/docs → Use /upload with a LinkedIn ZIP file.




Comments

Popular posts from this blog

Self-contained Raspberry Pi surveillance System Without Continue Internet

COBOT with GenAI and Federated Learning

AI in Education: Embracing Change for Future-Ready Learning