Skip to content

Rewrite, new file-entities.txt to debug entity collisions#313

Draft
alfsb wants to merge 2 commits into
php:masterfrom
alfsb:simple-file-entities
Draft

Rewrite, new file-entities.txt to debug entity collisions#313
alfsb wants to merge 2 commits into
php:masterfrom
alfsb:simple-file-entities

Conversation

@alfsb

@alfsb alfsb commented Jun 29, 2026

Copy link
Copy Markdown
Member

This PR makes two changes

  1. Another rewrite of file-entities.php
  2. Creates a new file-entities.txt file on manual build.

The rewriting is incidental, but was a consequence of simplifying and heavily documenting the file. But it is rewrite anyway, so inspecting the final file may be easier than the diff.

The second point is the important one. The script will now generate a file-entities.txt inside doc-base/temp, to make analysing DTD entity collisions easier in the future.

The PR is opened as a draft, for now, as testing is not completed. I expect no funcional changes, but it will take time until all testing is done. Anyways, reviews and comments are welcome. In fact, there are two points where I would ask for additional comments, even after this PR is merged.

RFC 1

The script generates an anomalous entity, global.function-index, that maps to funcindex.xml inside doc-base. I would like hear comments about moving this file to doc-en, marking it as <?do-not-translate?>, so it would be possible to remove this hardcoded path from doc-base.

RFC 2

AFAIK, the file entities infrastructure always replaced underscores to hyphens. This always surprises me, as it makes some file entity names to mismatch the real file paths on disk. But I do not know why this replacement takes place.

I would like to hear comments about removing this replacement. It would be necessary to add the disappearing entities into doc-en/entities/entitites-remove.ent, and also change file entities on doc-en as match folder and file names from hyphens to underlines.

The reason to make this change now is that file-entities.php is now idempotent and and has a very clean implementation, but this replacing may cause subtle miscomputation and hard to debug failures, because any replacements may cause silent collisions, for example:

func-name.xml       // collides with
func_name.xml

dir-name/file.xml   // collides with
dir_name/file.xml

whereas they are very possible and distinct files and directories, in both cases, but will map to the same file entity in the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant