Skip to content

fix: replace mutable default input_kwargs={} with None in HuggingfaceEngine#10477

Open
kuishou68 wants to merge 1 commit into
hiyouga:mainfrom
kuishou68:fix/issue-10476-mutable-default-input-kwargs
Open

fix: replace mutable default input_kwargs={} with None in HuggingfaceEngine#10477
kuishou68 wants to merge 1 commit into
hiyouga:mainfrom
kuishou68:fix/issue-10476-mutable-default-input-kwargs

Conversation

@kuishou68
Copy link
Copy Markdown
Contributor

Closes #10476

Problem

Four static methods in HuggingfaceEngine use input_kwargs: Optional[dict[str, Any]] = {} as a default parameter. This is a Python anti-pattern: the same dict object is shared across all invocations that omit the argument. Since each method calls .pop() on it, the shared dict is drained after the first call — subsequent callers relying on the default receive an already-mutated (empty) dict.

Affected methods:

  • _process_args
  • _chat
  • _stream_chat
  • _get_scores

Fix

Change the default to None and initialise input_kwargs = {} at the top of each function body, following the standard Python idiom.

…ceEngine (Closes hiyouga#10476)

Co-authored-by: lingxiu58 <86288566+lingxiu58@users.noreply.github.com>
Signed-off-by: Cocoon-Break <54054995+kuishou68@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces mutable default arguments for input_kwargs with None and adds support for the minicpmv4_6 model type. The review feedback highlights that input_kwargs should be copied to prevent side effects from .pop() calls mutating the caller's dictionary. Additionally, the reviewer suggests making downsample_mode configurable per request via input_kwargs rather than relying on a global environment variable.

Comment on lines +210 to +211
if getattr(model.config, "model_type", None) == "minicpmv4_6":
gen_kwargs["downsample_mode"] = os.getenv("DOWNSAMPLE_MODE", "16x")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues with this new block:

  1. Using os.getenv for a generation parameter like downsample_mode makes it globally fixed for the process, preventing per-request configuration. It would be better to allow this to be passed via input_kwargs with the environment variable as a fallback.
  2. If minicpmv4_6 is a multimodal model architecture similar to minicpmv, it likely requires the same input_ids and tokenizer keys in gen_kwargs that are set in the preceding block (lines 202-208). By keeping this check separate, those necessary configurations are skipped, which may lead to runtime errors or incorrect model behavior.
Suggested change
if getattr(model.config, "model_type", None) == "minicpmv4_6":
gen_kwargs["downsample_mode"] = os.getenv("DOWNSAMPLE_MODE", "16x")
if getattr(model.config, "model_type", None) == "minicpmv4_6":
gen_kwargs["downsample_mode"] = input_kwargs.pop("downsample_mode", os.getenv("DOWNSAMPLE_MODE", "16x"))

Comment on lines +87 to +88
if input_kwargs is None:
input_kwargs = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this change correctly addresses the shared mutable default argument issue, it still allows the input_kwargs dictionary to be mutated if one is provided by the caller. Since this method calls .pop() on the dictionary later (e.g., lines 123-133), it will have the side effect of draining the caller's dictionary. It is safer to create a local copy to avoid these side effects.

        input_kwargs = input_kwargs.copy() if input_kwargs is not None else {}

Comment on lines +231 to +232
if input_kwargs is None:
input_kwargs = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the issue in _process_args, this method should copy input_kwargs to prevent the .pop() calls in the subsequent _process_args call from mutating the dictionary passed by the caller.

        input_kwargs = input_kwargs.copy() if input_kwargs is not None else {}

Comment on lines +288 to +289
if input_kwargs is None:
input_kwargs = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the issue in _process_args, this method should copy input_kwargs to prevent the .pop() calls in the subsequent _process_args call from mutating the dictionary passed by the caller.

        input_kwargs = input_kwargs.copy() if input_kwargs is not None else {}

Comment on lines +329 to +330
if input_kwargs is None:
input_kwargs = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method calls input_kwargs.pop("max_length", None) on line 331. To avoid mutating the dictionary provided by the caller, it is recommended to work on a copy.

        input_kwargs = input_kwargs.copy() if input_kwargs is not None else {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: mutable default argument input_kwargs={} in HuggingfaceEngine static methods

1 participant