摘要:
目录概Adam-mini代码 Zhang Y., Chen C., Li Z., Ding T., Wu C., Ye Y., Luo Z. and Sun R. Adam-mini: Use fewer learning rates to gain more. arXiv preprint, 20 阅读全文
摘要:
目录概符号说明GaLore Zhao J., Zhang Z., Chen B., Wang Z., Anandkumar A. and Tian Y. GaLore: Memory-efficient llm training by gradient low-rank projection. IC 阅读全文
摘要:
目录概BAdam代码 Luo Q., Yu H. and Li X. BAdam: A memory efficient full parameter optimization method for large language models. arXiv preprint, 2024. 概 本文介 阅读全文
摘要:
目录概符号说明所有参数的 Hessian 矩阵Block-wise Hessian代码 Zhang Y., Chen C., Ding T., Li Z., Sun R. and Luo Z. Why transformers need adam: a hessian perspective. ar 阅读全文
摘要:
目录概AdaBelief代码 Zhuang J., Tang T., Ding Y., Tatikonda S., Dvornek N., Papademetris X. and Duncan J. S. AdaBelief Optimizer: Adapting stepsizes by the 阅读全文
摘要:
目录概符号说明BSARec (Beyond Self-Attention for Sequential Recommendation)代码 Shin Y., Choi J., Wi H. and Park N. An attentive inductive bias for sequential r 阅读全文
摘要:
目录概BACON代码 [Yang Z., Feng R., et al. BACON: Supercharge your vlm with bag-of-concept graph to mitigate hallucinations. 2024.] 概 本文提出了一种新的数据格式: BACON ( 阅读全文
摘要:
目录概RecAgentProfile moduleMemory moduleAction module Wang L, Zhang J., Yang H., Chen Z., Tang J., Zhang Z., Chen X., Lin Y., Sun H., Song R., Zhao W. X 阅读全文
摘要:
目录概CSHI (Controllable, Scalable, and Human-Involved)代码 Zhu L., Huang X. and Sang J. A llm-based controllable, scalable, human-involved user simulator 阅读全文
摘要:
目录概符号说明Homophily on Feature Aspect [1] Chen Y., Luo Y., Tang J., Yang L., Qiu S., Wang C. and Cao X. LSGNN: Towards general graph neural network in no 阅读全文
摘要:
目录概符号说明Dirichlet energy and Gradient-flowHeat equationGradient flows on graphs: th learnable caseAttraction and repulsionLow vs high frequency dominan 阅读全文
摘要:
目录概SAT代码 Chen D., O'Bray L. and Borgwardt K. Structure-aware transformer for graph representation learning. ICML, 2022. 概 Graph + Transformer + 修改 att 阅读全文
摘要:
目录概LLaVA代码 Liu H., Li C., Wu Q. and Lee Y. J. Visual Instruction Tuning. NeurIPS, 2023. 概 LLaVA. LLaVA LLaVA 希望用 LLM 推理模态特征, 想法很简单: 用 Vision Encoder 得 阅读全文
摘要:
目录概Mamba代码 Gu A. and Dao T. Mamba: Linear-time sequence modeling with selective state spaces. 2023. 概 Mamba. Mamba S4 和 S4D 虽然解决了 SSM 计算速度的问题, 但是有一个前提 阅读全文
摘要:
目录概H3代码 Fu D. Y., Dao T., Saab K. K., Thomas A. W., Rudra A. and Re C. Hungry hungry hippos: towards language modeling with state space models. 2022. 阅读全文
摘要:
目录概符号说明S4D代码 Gu A., Gupta A., Goel K. and Re C. On the parameterization and initialization of diagonal state space models. NeurIPS, 2022. 概 Mamba 系列第四 阅读全文