Think Different

Posts

Preparing a Dataset for Fine-Tuning Foundation Model

June 01, 2024

I am trying to preparing a Dataset for Fine-Tuning on Pathology Lab Data. 1. Dataset Collection - Sources: Gather data from pathology lab reports, medical journals, and any other relevant medical documents. - Format: Ensure that the data is in a readable format like CSV, JSON, or text files. 2. Data Preprocessing - Cleaning: Remove any irrelevant data, correct typos, and handle missing values. - Formatting: Convert the data into a format suitable for fine-tuning, usually pairs of input and output texts. - Example Format: - Input: "Patient exhibits symptoms of hyperglycemia." - Output: "Hyperglycemia" 3. Tokenization - Tokenize the text using the tokenizer that corresponds to the model you intend to fine-tune. Example Code for Dataset Preparation Using Pandas and Transformers for Preprocessing 1. Install Required Libraries: ...

Develop a Customize LLM Agent

May 31, 2024

Photo by MART PRODUCTION at pexel If you’re interested in customizing an agent for a specific task, one way to do this is to fine-tune the models on your dataset. For preparing dataset you can see this article . 1. Curate the Dataset - Using NeMo Curator: - Install NVIDIA NeMo: `pip install nemo_toolkit` - Use NeMo Curator to prepare your dataset according to your specific requirements. 2. Fine-Tune the Model - Using NeMo Framework: 1. Setup NeMo: ```python import nemo import nemo.collections.nlp as nemo_nlp ``` 2. Prepare the Data: ```python # Example to prepare dataset from nemo.collections.nlp.data.text_to_text import TextToTextDataset dataset = TextToTextDataset(file_path="path_to_your_dataset") ``` 3. Fine-Tune the Model: ```python ...

Code Auto Completion with Hugging Face LangChain and Phi3 SLM

May 30, 2024

Photo by energepic.com at pexel You can create your own coding auto-completion co-pilot using Hugging Face LangChain and Phi3 SLM ! Here's a breakdown of the steps involved: 1. Setting Up the Environment: Install the required libraries: Bash pip install langchain transformers datasets phi3 Download the Phi3 SLM model: Bash from transformers import AutoModelForSeq2SeqLM model_name = "princeton-ml/ph3_base" model = AutoModelForSeq2SeqLM.from_pretrained(model_name) 2. Preprocessing Code for LangChain: LangChain provides a AutoTokenizer class to preprocess code. Identify the programming language you want to support and install the corresponding tokenizer from Hugging Face. For example, for Python: Bash from langchain.llms import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( "openai/gpt-code-code" ) Define a function to preprocess code into LangChain format. This might involve splitting the code into tokens, adding special tokens (e.g., start/e...