wp-mirror-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wp-mirror-list] [Xmldatadumps-l] Template expansion inconsistency


From: wp mirror
Subject: Re: [Wp-mirror-list] [Xmldatadumps-l] Template expansion inconsistency
Date: Sat, 22 Feb 2014 17:40:58 -0500

Dear Ariel,

I added a function to WP-MIRROR 0.7 what cleans up <title>.  It
removes the following namespace words from page titles:

Category: 24575
Template: 15082
Wikipedia: 4072
MediaWiki: 520
Help: 108
Module: 27

The number beside each namespace word indicates the number of <title>s
found in the dump file `simplewiki-20140220-pages-articles.xml.bz2'
that were cleaned up. MediaWiki now does a much better job of
rendering the mirror.

Still, it would be nice if the dump files could be fixed.

Sincerely Yours,
Kent

On 2/21/14, wp mirror <address@hidden> wrote:
> Dear Ariel,
>
> 0) Problem
>
> The dump files contain a great number of pages where the page_title
> contains the namespace. These page_titles are imported (via mwxml2sql
> and wp-mirror) into my database.  One consequence: most templates are
> not expanded by MediaWiki; rather, they are rendered as red-links.
>
> 1) Example
>
> (shell)$ rsync
> ftpmirror.your.org::wikimedia-dumps/simplewiki/20140220/simplewiki-20140220-pages-articles.xml.bz2
> .
> (shell)$ bunzip2 simplewiki-20140220-pages-articles.xml.bz2
> (shell)$ cat simplewiki-20140220-pages-articles.xml | grep "Template:" |
> head
>     <title>Template:Stub</title>
>     <title>Template:NPOV</title>
>     <title>Template:Disputed</title>
>     <title>Template:Disambiguation</title>
>     <title>Template:TOC</title>
>     <title>Template:Uw-test1</title>
>     <title>Template:1911</title>
>     <title>Template:Please do not change this line</title>
>     <title>Template:Solar System</title>
>     <title>Template:Months</title>
> (shell)$ cat simplewiki-20140220-pages-articles.xml | grep
> "<title>Category:" | head
>     <title>Category:Computer science</title>
>     <title>Category:Sports</title>
>     <title>Category:Athletics</title>
>     <title>Category:Body parts</title>
>     <title>Category:Tools</title>
>     <title>Category:Movies</title>
>     <title>Category:Grammar</title>
>     <title>Category:Mathematics</title>
>     <title>Category:Alphabet</title>
>     <title>Category:Countries</title>
> (shell)$ cat simplewiki-20140220-pages-articles.xml | grep
> "<title>Help:" | head
>     <title>Help:How to change pages</title>
>     <title>Help:Minor change</title>
>     <title>Help:User settings</title>
>     <title>Help:Writing articles for Wikipedia</title>
>     <title>Help:Contents</title>
>     <title>Help:Revert a page</title>
>     <title>Help:Editing</title>
>     <title>Help:How to use images</title>
>     <title>Help:How to write simple English articles</title>
>     <title>Help:User preferences help</title>
>
> 2) Solution
>
> I would like your advice as to where the solution should be attempted:
>
> a) Should the dump file generating process be fixed?
> b) Should `mwxml2sql' be altered to edit the <title> content?
> c) Should `wp-mirror' be altered to edit the <title> content?
> d) Should `wp-mirror' be able to detect and correct such `page_title'
> content in the underlying database?
>
> Sincerely Yours,
> Kent
>
> On 2/21/14, gnosygnu <address@hidden> wrote:
>> Hi. I believe the problem is with the import of the [[Template]] pages
>> into the page table
>>
>> Your SQL output shows the following:
>>
>> page_title: Template:Ndash
>>
>> Instead, the page_title should just be "Ndash", not "Template:Ndash".
>> Note that the page is already marked as page_namespace = 10. Also,
>> note that no other namespace (Category, Help, Project, etc) will have
>> a "page_title" with the namespace name in front of it. i.e.: Category
>> "Earth" will be in the page table with a page_title of "Earth" not
>> "Category:Earth"
>>
>> MediaWiki has code that takes {{Template:A}} and makes it effectively
>> the same as {{A}}. Note that this is just regular page transclusion
>> via namespace. You can do "{{Category:Earth}}" and it will transclude
>> the contents of the page "Category:Earth"
>>
>> Hope this helps.
>>
>>
>> On Fri, Feb 21, 2014 at 5:21 PM, wp mirror <address@hidden> wrote:
>>> Dear Sir or Madam,
>>>
>>> I am not sure to which person or list I should address this question to.
>>>
>>> 0) Objective
>>>
>>> I am in the process of building DEB packages for: WP-MIRROR 0.7, the
>>> latest development version of MediaWiki 1.23, and a set of MediaWiki
>>> extensions.
>>>
>>> The objective is to this:  That a page rendered by a mirror should
>>> look the same a that page rendered by the WMF site.
>>>
>>> 1) Problem
>>>
>>> In the process of testing mirrors, I noticed that many templates were
>>> not expanding, and instead being rendered as red-links.
>>>
>>> 2) Example
>>>
>>> To illustrate, consider the Ndash template, which appears on many
>>> pages such as <http://simple.wikipedia.org/wiki/August>.  It appears
>>> in the underlying database:
>>>
>>> mysql> select page_id,page_title,rev_len,old_text from
>>> simplewiki.page,simplewiki.revision,simplewiki.text where
>>> page_id=rev_page and rev_text_id=old_id and page_title like
>>> 'Template:Ndash' limit 10\G
>>> *************************** 1. row ***************************
>>>    page_id: 132985
>>> page_title: Template:Ndash
>>>    rev_len: 65
>>>   old_text: &ndash;<noinclude>
>>> [[Category:Formatting templates]]
>>> </noinclude>
>>> 1 row in set (0.25 sec)
>>>
>>> 3) Special:ExpandTemplates
>>>
>>> To test the above example ``Template:Ndash'', I use
>>> Special:ExpandTemplates.
>>>
>>> 3.1) Input text
>>>
>>> Today is the {{CURRENTDAY}} day.</br>
>>> This server is {{SERVER}}, script path {{SCRIPTPATH}}, current MW
>>> version {{CURRENTVERSION}}.</br>
>>> This site is {{SITENAME}}. Full page name is {{FULLPAGENAME}}.</br>
>>> <table>
>>> <tr><th>Template</th><th>Expanded</th><th>page_id</th><th>rev_len</th></tr>
>>> <tr><td>Ndash</td><td>{{Ndash}}</td><td>{{PAGEID:
>>> Ndash}}</td><td>{{PAGESIZE: Ndash}}</td></tr>
>>> <tr><td>Template:Ndash</td><td>{{Template:Ndash}}</td>
>>>     <td>{{PAGEID: Template:Ndash}}</td><td>{{PAGESIZE:
>>> Template:Ndash}}</td></tr>
>>> <tr><td>Template:Template:Ndash</td><td>{{Template:Template:Ndash}}</td>
>>>      <td>{{PAGEID: Template:Template:Ndash}}</td><td>{{PAGESIZE:
>>> Template:Template:Ndash}}</td></tr>
>>> </table>
>>>
>>> 3.2) <http://simple.wikipedia.site/wiki/Special:ExpandTemplates> Preview
>>>
>>> Here is the result from the WMF site:
>>>
>>> Today is the 21 day.
>>> This server is //simple.wikipedia.org, script path /w, current MW
>>> version 1.23wmf14 (f8b9201).
>>> This site is Wikipedia. Full page name is My template.
>>> Template        Expanded        page_id rev_len
>>> Ndash   -       0       0
>>> Template:Ndash  -       132985  65
>>> Template:Template:Ndash Template:Template:Ndash         0       0
>>>
>>> Both {{Ndash}} and {{Template:Ndash}} expand as expected.
>>>
>>> 3.3) <http://simple.wikipedia.site/wiki/Special:ExpandTemplates> Preview
>>>
>>> Here is the result from the mirrored site:
>>>
>>> Today is the 21 day.
>>> This server is http://simple.wikipedia.site, script path /w, current
>>> MW version 1.23alpha.
>>> This site is simplewiki. Full page name is My template.
>>> Template        Expanded        page_id rev_len
>>> Ndash   Template:Ndash  0       0
>>> Template:Ndash  Template:Ndash  0       0
>>> Template:Template:Ndash -       132985  65
>>>
>>> Only {{Template:Template:Ndash}} expands!
>>>
>>> 4) Question
>>>
>>> Why do I need to prepend an extra ``Template:'' to make the templates
>>> work for the mirror?
>>>
>>> Better yet: Could someone tell me where in the MediaWiki core I can
>>> find the code that takes the template (e.g. {{Ndash}} or
>>> {{Template:Ndash}}) and converts it into an SQL query that SELECTs the
>>> template expansion from the underlying database?
>>>
>>> Sincerely Yours,
>>> Kent
>>>
>>> _______________________________________________
>>> Xmldatadumps-l mailing list
>>> address@hidden
>>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]