上一页 1 2 3 4 5 6 7 ··· 36 下一页
摘要: 目录概Adam-mini代码 Zhang Y., Chen C., Li Z., Ding T., Wu C., Ye Y., Luo Z. and Sun R. Adam-mini: Use fewer learning rates to gain more. arXiv preprint, 20 阅读全文
posted @ 2024-08-28 15:58 馒头and花卷 阅读(62) 评论(0) 推荐(0)
摘要: 目录概符号说明GaLore Zhao J., Zhang Z., Chen B., Wang Z., Anandkumar A. and Tian Y. GaLore: Memory-efficient llm training by gradient low-rank projection. IC 阅读全文
posted @ 2024-08-27 16:05 馒头and花卷 阅读(133) 评论(0) 推荐(0)
摘要: 目录概BAdam代码 Luo Q., Yu H. and Li X. BAdam: A memory efficient full parameter optimization method for large language models. arXiv preprint, 2024. 概 本文介 阅读全文
posted @ 2024-08-27 10:12 馒头and花卷 阅读(223) 评论(0) 推荐(0)
摘要: 目录概符号说明所有参数的 Hessian 矩阵Block-wise Hessian代码 Zhang Y., Chen C., Ding T., Li Z., Sun R. and Luo Z. Why transformers need adam: a hessian perspective. ar 阅读全文
posted @ 2024-08-26 17:13 馒头and花卷 阅读(118) 评论(0) 推荐(0)
摘要: 目录概符号说明MotivationNeo-GNN代码 Neo-GNNs: Neighborhood overlap-aware graph neural networks for link prediction. NeurIPS, 2021. 概 一种计算上相对高效的, 同时利用结构信息和特征信息的 阅读全文
posted @ 2024-08-25 15:08 馒头and花卷 阅读(139) 评论(0) 推荐(0)
摘要: 目录概基本的设定非凸优化凸优化强凸优化 概 近来对优化和收敛速度有了一些新的感悟, 特此一记. 这些感悟有的来自博客 (如 here), 有的来自书籍. 以往只是套一些收敛的模板, 这里我会讲一下如何从几何的角度去理解这些收敛性. 基本的设定 假设我们希望优化: \[\tag{1} \min_{x 阅读全文
posted @ 2024-07-18 20:19 馒头and花卷 阅读(179) 评论(0) 推荐(1)
摘要: 目录概符号说明MotivationFOBOS (Forward-Backward Splitting)RDA (Regularized Dual Averaging)FTRL-Proximal (Follow The Regularized Leader)FOBOS, RDA, FTRL-Proxi 阅读全文
posted @ 2024-07-16 09:27 馒头and花卷 阅读(80) 评论(0) 推荐(0)
摘要: 目录概AdaBelief代码 Zhuang J., Tang T., Ding Y., Tatikonda S., Dvornek N., Papademetris X. and Duncan J. S. AdaBelief Optimizer: Adapting stepsizes by the 阅读全文
posted @ 2024-07-10 17:05 馒头and花卷 阅读(77) 评论(0) 推荐(0)
摘要: 目录概符号说明BSARec (Beyond Self-Attention for Sequential Recommendation)代码 Shin Y., Choi J., Wi H. and Park N. An attentive inductive bias for sequential r 阅读全文
posted @ 2024-07-07 15:21 馒头and花卷 阅读(281) 评论(0) 推荐(0)
摘要: 目录概BACON代码 [Yang Z., Feng R., et al. BACON: Supercharge your vlm with bag-of-concept graph to mitigate hallucinations. 2024.] 概 本文提出了一种新的数据格式: BACON ( 阅读全文
posted @ 2024-07-05 10:34 馒头and花卷 阅读(42) 评论(0) 推荐(0)
摘要: 目录概RecAgentProfile moduleMemory moduleAction module Wang L, Zhang J., Yang H., Chen Z., Tang J., Zhang Z., Chen X., Lin Y., Sun H., Song R., Zhao W. X 阅读全文
posted @ 2024-07-03 16:19 馒头and花卷 阅读(210) 评论(0) 推荐(0)
摘要: 目录概CSHI (Controllable, Scalable, and Human-Involved)代码 Zhu L., Huang X. and Sang J. A llm-based controllable, scalable, human-involved user simulator 阅读全文
posted @ 2024-07-02 16:14 馒头and花卷 阅读(120) 评论(0) 推荐(0)
摘要: 目录概ID-GNN You J., Gomoes-Selman J., Ying R. and Leskovec J. Identity-aware graph neural networks. AAAI, 2021. 概 提出了一种能够超越 1-WL-Test 的 GNN. ID-GNN ID-G 阅读全文
posted @ 2024-07-01 15:02 馒头and花卷 阅读(50) 评论(0) 推荐(0)
摘要: 目录概符号说明Homophily on Feature Aspect [1] Chen Y., Luo Y., Tang J., Yang L., Qiu S., Wang C. and Cao X. LSGNN: Towards general graph neural network in no 阅读全文
posted @ 2024-06-30 20:59 馒头and花卷 阅读(26) 评论(0) 推荐(0)
摘要: 目录概符号说明Dirichlet energy and Gradient-flowHeat equationGradient flows on graphs: th learnable caseAttraction and repulsionLow vs high frequency dominan 阅读全文
posted @ 2024-06-19 17:11 馒头and花卷 阅读(133) 评论(0) 推荐(0)
摘要: 目录概SAT代码 Chen D., O'Bray L. and Borgwardt K. Structure-aware transformer for graph representation learning. ICML, 2022. 概 Graph + Transformer + 修改 att 阅读全文
posted @ 2024-06-17 11:14 馒头and花卷 阅读(106) 评论(0) 推荐(0)
摘要: 目录概LLaVA代码 Liu H., Li C., Wu Q. and Lee Y. J. Visual Instruction Tuning. NeurIPS, 2023. 概 LLaVA. LLaVA LLaVA 希望用 LLM 推理模态特征, 想法很简单: 用 Vision Encoder 得 阅读全文
posted @ 2024-06-14 11:34 馒头and花卷 阅读(56) 评论(0) 推荐(0)
摘要: 目录概Mamba代码 Gu A. and Dao T. Mamba: Linear-time sequence modeling with selective state spaces. 2023. 概 Mamba. Mamba S4 和 S4D 虽然解决了 SSM 计算速度的问题, 但是有一个前提 阅读全文
posted @ 2024-06-12 20:31 馒头and花卷 阅读(157) 评论(0) 推荐(0)
摘要: 目录概H3代码 Fu D. Y., Dao T., Saab K. K., Thomas A. W., Rudra A. and Re C. Hungry hungry hippos: towards language modeling with state space models. 2022. 阅读全文
posted @ 2024-06-12 17:23 馒头and花卷 阅读(126) 评论(0) 推荐(0)
摘要: 目录概符号说明S4D代码 Gu A., Gupta A., Goel K. and Re C. On the parameterization and initialization of diagonal state space models. NeurIPS, 2022. 概 Mamba 系列第四 阅读全文
posted @ 2024-06-12 14:40 馒头and花卷 阅读(169) 评论(0) 推荐(1)
上一页 1 2 3 4 5 6 7 ··· 36 下一页