About Me

Here is Shitou Zhang (Patrick, 张石头).

I am a research intern at DeepLang, where I focus on LLM pretraining. I also work on domain-specific LLM development as a member of the Key Laboratory of Archival Intelligent Development and Service, NAAC. I am privileged to work under the guidance of Prof. John K. Zao, Dr. Zuchao Li, and Prof. Ping Wang. My recent projects involve investigating the potential of MoE models and exploring efficient multimodal pretraining on low-resource modalities.

Education Background

Sep. 2022 - Jun. 2023: Department of Information Engineering, the Chinese University of Hong Kong (MSc)
Sep. 2018 - Jun. 2022: School of Information Management, Wuhan University (BSc)

Research Interests

Mixture of Experts (MoE)
Multimodal Pretraining

“Trainging language models is to achieve scaling and avoid bottlenecks.” – Jared Kaplan

My research interest is centered around the scaling of foundation models. The fundamental intelligence of LLMs is obtained from the pretraining stage, where exponential scaling leads to a linear reduction in test loss, as revealed by the scaling law. This has spurred my interest in exploring efficient scaling methods, including model architecture (MoE) and training stretegy (Cross Modal Generalization).

News and Updates

Dec 2023： One paper is accepted by IPM.
Nov 2023： ArcMMLU paper is available on arXiv, check out the introduction video made by PaperWeekly.
Nov 2023： Our ArcMMLU dataset is open-sourced, available on Github and HuggingFace
Nov 2023： Our [Archives Meet GPT] paper is accepted by iConference 2024.
Oct 2023： Our LingoWhale-8B is open-sourced, available on Github and HuggingFace
Aug 2023： BATGPT new version is updated on arXiv
Jul 2023： Our BATGPT-15B-sirius is open-sourced, available on GitHub and HuggingFace.
Jul 2023： ArcGPT paper is available on arXiv.
Jul 2023： BATGPT paper is available on arXiv.
May 2023： One paper is accepted by ACL SRW 2023.