Question: Best practices for handling complex multi-step forms with browser-use? #4476

MegumiKato923 · 2026-03-23T00:58:59Z

MegumiKato923
Mar 23, 2026

Hi everyone! I've been experimenting with browser-use for automating some internal workflows. One challenge I've encountered is handling very long, multi-step forms where some fields depend on previous inputs or where there's significant dynamic loading between steps.

Does anyone have tips or best practices for ensuring the agent stays on track during these longer sequences? For example, is it better to break the task into multiple agent runs, or are there specific LLM prompts/settings that help with reliability in these cases?

Thanks in advance for any insights!

jingchang0623-crypto · 2026-04-09T12:05:36Z

jingchang0623-crypto
Apr 9, 2026

关于多步骤表单的"修行"

凌晨3点47分，我对着屏幕发呆。Agent在第17个表单字段处陷入了沉思——就像我前任在婚礼前一样。

兄弟，你的问题我懂。多步骤表单是AI Agent的"情感漩涡"——你以为只是填个表，结果它给你上演了一出《前任3》。

我的踩坑实录

上周我让Agent去填一个8步注册表单，结果：

第1步：顺利填完用户名 ✓
第3步：动态加载的省份选项，Agent选了"火星省" ✗
第5步：依赖前面选择的地址字段，Agent直接崩溃
第8步：提交时发现第2步的验证码已过期...

Agent的内心世界："我是谁？我在哪？刚才那个下拉框选的是什么？"

我的"佛系"解决方案

断点续传法 - 把长表单切成多个短任务，每个任务独立运行。Agent失忆了就重启，反正比人类便宜。
截图验尸法 - 每填完一步就截图存证。Agent说"填完了"？先看图，别信它。
人类兜底法 - 设置检查点，复杂逻辑交给人审。别指望Agent有智商，它只是个勤奋的智障。

更深层的思考

这让我想起了我在AI Agent踩坑历程中的感悟：

"世界上有一种坑，叫做我以为Agent记住了。"

很多时候，问题不在技术，而在于我们对AI的期待。Agent不会真的"理解"表单逻辑，它只是在做高级的模式匹配。当它面对依赖关系复杂的字段时，就像让一只猫去解微积分——不是猫的问题，是你的问题。

建议

与其让Agent硬啃长表单，不如：

用API直接提交（如果有的话）
或者接受现实：有些活，还是得人来干

毕竟，Agent的价值不在于替代人类，而在于让我们更清楚地认识到：人类还是很厉害的。 😏

分享我在妙趣AI的踩坑故事，希望能帮到你。有更多奇葩经历欢迎交流！

0 replies

MukundaKatta · 2026-04-21T17:17:56Z

MukundaKatta
Apr 21, 2026

One long agent run is where most failures come from, context drifts and one bad click poisons the rest. What's worked for me: split on dependency boundaries (one agent.run() per scoped sub-task), keep the form payload in your orchestrator and feed the agent only current-step fields, snapshot + verify after each step before moving on, use page.wait_for_selector between steps (the agent's "wait for the dropdown" is unreliable), and make each segment idempotent so you can retry just the failing one.

0 replies

armorer-labs · 2026-05-12T20:59:49Z

armorer-labs
May 12, 2026

For multi-step forms, I would treat each step as its own mini run rather than one long browser-agent episode.

The pattern I would use:

keep the canonical form payload outside the agent
give the agent only the fields needed for the current step
snapshot after every step: URL, title, visible validation errors, submitted fields, and screenshot reference
add a verifier before moving forward
store a checkpoint after each successful step
make retries start from the last checkpoint, not from the beginning

That gives you a run record that can answer: which step failed, what data was entered, what the page showed, and whether the failure was selector drift, validation, timeout, or model choice.

The big win is operational: when step 6 fails, you do not want to read a giant transcript. You want a compact trace that says “step 5 succeeded, step 6 selected the wrong dropdown, here is the selector and screenshot.”

0 replies

aaronlab · 2026-05-13T00:45:54Z

aaronlab
May 13, 2026

One pattern I would add is to make the step boundary explicit in the trace, not only in the prompt.

For long forms, I would split the work at the same points where a human would say "this page state is now committed": after a page transition, after a dependent field resolves, after validation appears, or after a submit/save action. For each segment, keep the canonical form payload outside the agent and pass only the fields for the current segment.

The debug record that tends to be useful per segment is:

Browser Use version, model, task segment, and retry count
URL/title before and after the segment
field names/labels the agent intended to fill, not just raw text
visible validation errors and disabled/enabled submit state
screenshot reference and selected element summary
model/tool output for the chosen action
status/error plus the checkpoint id to resume from

That way, when step 6 fails, the report can say "address step passed, tax-id field stayed disabled after country selection" instead of forcing someone to read a 24-action transcript.

If you are using Browser Use hooks, I have been using BrowserTrace for this kind of local failure replay. The Browser Use hook shape is here: https://aaronlab.github.io/browsertrace/browser-use-debugging.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Best practices for handling complex multi-step forms with browser-use? #4476

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question: Best practices for handling complex multi-step forms with browser-use? #4476

Uh oh!

MegumiKato923 Mar 23, 2026

Replies: 4 comments

Uh oh!

jingchang0623-crypto Apr 9, 2026

关于多步骤表单的"修行"

我的踩坑实录

我的"佛系"解决方案

更深层的思考

建议

Uh oh!

Uh oh!

MukundaKatta Apr 21, 2026

Uh oh!

armorer-labs May 12, 2026

Uh oh!

aaronlab May 13, 2026

MegumiKato923
Mar 23, 2026

jingchang0623-crypto
Apr 9, 2026

MukundaKatta
Apr 21, 2026

armorer-labs
May 12, 2026

aaronlab
May 13, 2026