馒头and花卷

2024年8月28日

Adam-mini Use Fewer Learning Rates To Gain More

摘要：目录概Adam-mini代码 Zhang Y., Chen C., Li Z., Ding T., Wu C., Ye Y., Luo Z. and Sun R. Adam-mini: Use fewer learning rates to gain more. arXiv preprint, 20 阅读全文

posted @ 2024-08-28 15:58 馒头and花卷阅读(74) 评论(0) 推荐(0)

2024年8月27日

GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection

摘要：目录概符号说明GaLore Zhao J., Zhang Z., Chen B., Wang Z., Anandkumar A. and Tian Y. GaLore: Memory-efficient llm training by gradient low-rank projection. IC 阅读全文

posted @ 2024-08-27 16:05 馒头and花卷阅读(146) 评论(0) 推荐(0)

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

摘要：目录概BAdam代码 Luo Q., Yu H. and Li X. BAdam: A memory efficient full parameter optimization method for large language models. arXiv preprint, 2024. 概本文介阅读全文

posted @ 2024-08-27 10:12 馒头and花卷阅读(234) 评论(0) 推荐(0)

2024年8月26日

Why Transformers Need Adam: A Hessian Perspective

摘要：目录概符号说明所有参数的 Hessian 矩阵Block-wise Hessian代码 Zhang Y., Chen C., Ding T., Li Z., Sun R. and Luo Z. Why transformers need adam: a hessian perspective. ar 阅读全文

posted @ 2024-08-26 17:13 馒头and花卷阅读(136) 评论(0) 推荐(0)

2024年8月25日

Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction

摘要：目录概符号说明MotivationNeo-GNN代码 Neo-GNNs: Neighborhood overlap-aware graph neural networks for link prediction. NeurIPS, 2021. 概一种计算上相对高效的, 同时利用结构信息和特征信息的阅读全文

posted @ 2024-08-25 15:08 馒头and花卷阅读(147) 评论(0) 推荐(0)

2024年7月18日

优化与收敛率小记

摘要：目录概基本的设定非凸优化凸优化强凸优化概近来对优化和收敛速度有了一些新的感悟, 特此一记. 这些感悟有的来自博客 (如 here), 有的来自书籍. 以往只是套一些收敛的模板, 这里我会讲一下如何从几何的角度去理解这些收敛性. 基本的设定假设我们希望优化: \[\tag{1} \min_{x 阅读全文

posted @ 2024-07-18 20:19 馒头and花卷阅读(195) 评论(0) 推荐(1)

2024年7月16日

Regularized Stochastic Learning and Online Optimization

摘要：目录概符号说明MotivationFOBOS (Forward-Backward Splitting)RDA (Regularized Dual Averaging)FTRL-Proximal (Follow The Regularized Leader)FOBOS, RDA, FTRL-Proxi 阅读全文

posted @ 2024-07-16 09:27 馒头and花卷阅读(89) 评论(0) 推荐(0)

2024年7月10日

AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

摘要：目录概AdaBelief代码 Zhuang J., Tang T., Ding Y., Tatikonda S., Dvornek N., Papademetris X. and Duncan J. S. AdaBelief Optimizer: Adapting stepsizes by the 阅读全文

posted @ 2024-07-10 17:05 馒头and花卷阅读(85) 评论(0) 推荐(0)

2024年7月7日

An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention

摘要：目录概符号说明BSARec (Beyond Self-Attention for Sequential Recommendation)代码 Shin Y., Choi J., Wi H. and Park N. An attentive inductive bias for sequential r 阅读全文

posted @ 2024-07-07 15:21 馒头and花卷阅读(295) 评论(0) 推荐(0)

2024年7月5日

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

摘要：目录概BACON代码 [Yang Z., Feng R., et al. BACON: Supercharge your vlm with bag-of-concept graph to mitigate hallucinations. 2024.] 概本文提出了一种新的数据格式: BACON ( 阅读全文

posted @ 2024-07-05 10:34 馒头and花卷阅读(47) 评论(0) 推荐(0)

2024年7月3日

User Behavior Simulation with Large Language Model based Agents

摘要：目录概RecAgentProfile moduleMemory moduleAction module Wang L, Zhang J., Yang H., Chen Z., Tang J., Zhang Z., Chen X., Lin Y., Sun H., Song R., Zhao W. X 阅读全文

posted @ 2024-07-03 16:19 馒头and花卷阅读(222) 评论(0) 推荐(0)

2024年7月2日

A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems

摘要：目录概CSHI (Controllable, Scalable, and Human-Involved)代码 Zhu L., Huang X. and Sang J. A llm-based controllable, scalable, human-involved user simulator 阅读全文

posted @ 2024-07-02 16:14 馒头and花卷阅读(132) 评论(0) 推荐(0)

2024年7月1日

Identity-aware Graph Neural Networks

摘要：目录概ID-GNN You J., Gomoes-Selman J., Ying R. and Leskovec J. Identity-aware graph neural networks. AAAI, 2021. 概提出了一种能够超越 1-WL-Test 的 GNN. ID-GNN ID-G 阅读全文

posted @ 2024-07-01 15:02 馒头and花卷阅读(56) 评论(0) 推荐(0)

2024年6月30日

Feature homophily metric

摘要：目录概符号说明Homophily on Feature Aspect [1] Chen Y., Luo Y., Tang J., Yang L., Qiu S., Wang C. and Cao X. LSGNN: Towards general graph neural network in no 阅读全文

posted @ 2024-06-30 20:59 馒头and花卷阅读(28) 评论(0) 推荐(0)

2024年6月19日

Understanding Convolution on Graphs via Energies

摘要：目录概符号说明Dirichlet energy and Gradient-flowHeat equationGradient flows on graphs: th learnable caseAttraction and repulsionLow vs high frequency dominan 阅读全文

posted @ 2024-06-19 17:11 馒头and花卷阅读(137) 评论(0) 推荐(0)

2024年6月17日

Structure-Aware Transformer for Graph Representation Learning

摘要：目录概SAT代码 Chen D., O'Bray L. and Borgwardt K. Structure-aware transformer for graph representation learning. ICML, 2022. 概 Graph + Transformer + 修改 att 阅读全文

posted @ 2024-06-17 11:14 馒头and花卷阅读(113) 评论(0) 推荐(0)

2024年6月14日

Visual Instruction Tuning

摘要：目录概LLaVA代码 Liu H., Li C., Wu Q. and Lee Y. J. Visual Instruction Tuning. NeurIPS, 2023. 概 LLaVA. LLaVA LLaVA 希望用 LLM 推理模态特征, 想法很简单: 用 Vision Encoder 得阅读全文

posted @ 2024-06-14 11:34 馒头and花卷阅读(63) 评论(0) 推荐(0)

2024年6月12日

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

摘要：目录概Mamba代码 Gu A. and Dao T. Mamba: Linear-time sequence modeling with selective state spaces. 2023. 概 Mamba. Mamba S4 和 S4D 虽然解决了 SSM 计算速度的问题, 但是有一个前提阅读全文

posted @ 2024-06-12 20:31 馒头and花卷阅读(166) 评论(0) 推荐(0)

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

摘要：目录概H3代码 Fu D. Y., Dao T., Saab K. K., Thomas A. W., Rudra A. and Re C. Hungry hungry hippos: towards language modeling with state space models. 2022. 阅读全文

posted @ 2024-06-12 17:23 馒头and花卷阅读(131) 评论(0) 推荐(0)

On the Parameterization and Initialization of Diagonal State Space Models

摘要：目录概符号说明S4D代码 Gu A., Gupta A., Goel K. and Re C. On the parameterization and initialization of diagonal state space models. NeurIPS, 2022. 概 Mamba 系列第四阅读全文

posted @ 2024-06-12 14:40 馒头and花卷阅读(178) 评论(0) 推荐(1)

公告