[1] |
OPENAI, ACHIAM J, ADLER S, et al. GPT-4 Technical Report[EB/OL].(2023-03-08)[2023-07-09]. https://arxiv.org/abs/2303.08774.
|
[2] |
蔡睿, 葛军, 孙哲, 等. AI预训练大模型发展综述[J/OL]. 小型微型计算机系统, 2024, 5(15): 1-12. http://kns.cnki.net/kcms/detail/21.1106.tp.20230510.1900.010.html.
|
[3] |
吴双. AI大模型: 以“大规模预训练+微调”范式满足多元化需求[N]. 人民邮电, 2022-06-16(005).
|
[4] |
GUPTA T, ZAKI M, KRISHNAN N A, et al. MatSciBERT: A materials domain language model for text mining and information extraction[J]. npj Computational Materials, 2022, 8(1): 102-112.
|
[5] |
DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics, 2019: 4171-4186.
|
[6] |
TOUVRON H, MARTIN L, STONE K, et al. Llama 2: Open Foundation and Fine-Tuned Chat Models[EB/OL].(2023-07-09)[2023-07-09]. https://arxiv.org/abs/2307.09288.
|
[7] |
TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL].(2023-02-27)[2023-07-09]. https://arxiv.org/abs/2302.13971.
|
[8] |
YANG P, WANG J J, GAN R Y, et al. Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective[EB/OL].(2022-10-08)[2023-07-09]. https://arxiv.org/abs/2210.08590.
|
[9] |
程乐超. 视觉大模型参数高效微调技术应用与展望[J]. 人工智能, 2024(1): 54-65.
|
[10] |
HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]// Proceedings of the 36th International Conference on Machine Learning:Proceedings of Machine Learning Research, Vol. 97. Long Beach:PMLR, 2019: 2790-2799.
|
[11] |
HE J, ZHOU C, MA X, et al. Towards a unified view of parameter-efficient transfer learning[C]// International Conference on Learning Representations. 2022: https://openreview.net/forum?id=0RDcd5Axok.
|
[12] |
DING N, QIN Y J, YANG G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5(3):220-235.
|
[13] |
LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana: Association for Computational Linguistics, 2021: 3045-3059.
|
[14] |
SU Y, WANG X, QIN Y, et al. On transferability of prompt tuning for natural language processing[C]// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Seattle: Association for Computational Linguistics, 2022: 3949-3969.
|
[15] |
LI X L, LIANG P. Prefix-Tuning: optimizing continuous prompts for generation[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). Online: Association for Computational Linguistics, 2021: 4582-4597.
|
[16] |
KARIMI M R, HENDERSON J, RUDER S. Compacter: efficient low-rank hypercomplex adapter layers[J]. Advances in Neural Information Processing Systems, 2021, 34: 1022-1035.
|
[17] |
HU E J, YELONG S, WALLIS P, et al. LoRA: low-rank adaptation of large language models[C]// International Conference on Learning Representations.ICRL, 2022: https://openreview.net/forum?id=nZeVKeeFYf9.
|
[18] |
PEDRO C, SAYAK P. Using LoRA for efficient stable diffusion fine-tuning[EB/OL].(2023-01-26)[2023-07-09]. https://huggingface.co/blog/LoRA.
|
[19] |
丁鑫, 邹荣金, 潘志庚. 基于高效参数微调的生成式大模型领域适配技术[J]. 人工智能, 2023(4): 1-9.
|
[20] |
RAFI M N, MUAAZ M. Performance Evaluation of the LoRa Protocol in the context of Smart Meter[EB/OL].(2019-07-04)[2019-07-04]. https://arxiv.org/abs/1907.12355.
|
[21] |
SUN Y, WANG S, Li Y, et al. Ernie: Enhanced representation through knowledge integration[EB/OL].(2019-04-09)[2023-07-09]. https://arxiv.org/abs/1904.09223.
|
[22] |
SUN Y, WANG S, Li Y, et al. Ernie 2.0: A continual pre-training framework for language understanding[C]// Proceedings of the AAAI conference on artificial intelligence, vol. 34, No. 05. Palo Alto:AAAI Press, 2020: 8968-8975.
|
[23] |
SUN Y, WANG S, Li Y, et al. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation[EB/OL].(2021-07-02)[2023-07-09]. https://arxiv.org/abs/2107.02137.
|
[24] |
CHEN Z Y, XIE F K, WAN M, et al. MatChat: A large language model and application service platform for materials science[J]. Chinese Physics B, 2023, 32(11):208-213.
|
[25] |
AMINABADI R Y, RAJBHANDARI S, ZHANG M, et al. DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale[EB/OL].(2022-07-00)[2023-07-09]. https://arxiv.org/abs/2207.00032.
|
[26] |
RAJBHANDARI S, RASLEY J, RUWASE O, et al. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models[EB/OL].(2019-10-02)[2023-07-09]. https://arxiv.org/abs/1910.02054.
|
[27] |
SMITH S, PATWARY M, NORICK B, et al. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model[EB/OL].(2022-01-19)[2023-07-09]. https://arxiv.org/abs/2201.11990.
|
[28] |
RAJBHANDARI S, Li C, YAO Z, et al. DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale[EB/OL].(2022-01-05)[2023-07-09]. https://arxiv.org/abs/2201.05596.
|
[29] |
KISHORE P, SALIM R, TODD W, et al. Bleu:a Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, 2002: 311-318.
|