DatabaseManagerCli is a command line application you can use to import data from the XML dumps into MySQL.

You can download a binary from the downloads section or build it yourself.

Configuration

The configuration for DatabaseManagerCli is stored in a simple text filed named Config.txt. A sample configuration is available in Config.Example.txt. You can copy that file to Config.txt and edit the relevant portions - you should have a working connection to MySQL. If you only have some XMLs and not others, you can fill only those fields, but do not remove the lines from the config file - all lines should be present.

Importing data

Once you've setup your Config.txt, you can run DatabaseManagerCli without arguments to get a list of possible commands. Commands can be executed in bulk by simply passing all the actions you want to peform, for example
DatabaseManagerCli import-labels import-artists import-masters import-releases

Errors during import

By far the most frequent cause for failed imports in the middle of the operation is corrupt data. If you encounter an error during import (and not in the beginning or the end), please redownload the archive, extract it (preferably on a different storage device) and try again. If the issue persists, file an issue describing your exact configuration.

Last edited Aug 22, 2012 at 6:40 PM by karamanolev, version 11

Comments

jbeary Oct 8, 2014 at 4:43 PM 
@Jeffro2

Albeit a little late, thanks for the legwork. I've encountered the same issues as you had and made changes to the DB schema file as you did. I also noted that the 9/2014 'artists' data file still has a problem with the artist associated with ID 2254915. Same issue now also appears on records, 2256630, 2311306, 2638101, 3520148, 732572, 1980438, 1980444, 1987104, 2024252, 2039092, 2045339, 2046110, 2048686, 2063666, 2064094, 2065501, 2065536, 2069484, 2069486, 2079045, 2087753, 2116766, 2133674, 2138940, 2203096, 2205367, 2254877. All say 'needs vote'.

I think this is due to the insertion method that may hide the actual record from being displayed. If you search for the tags <name>‹br› you will come across all of this bad data. I'm going to try reporting this with the hopes that someone at discogs would trim these records out.

Also, I'm only up to using the binary and have loaded up the entire project but have yet to look it over. I think there's a lack of try catch statements because if there's a failure on anything in the console window, probably some instance is still holding a lock on the file. I had a file lock issue even when trying to save to another file. Maybe this hasn't happened to you but I suggest rebooting if there's any failure.

@san3, if you do encounter a unhandled exception import. Do one file at a time and let us know what file is giving you trouble. I had at least no issues with the labels file so start there and work though the records in the artists file removing any where there's a <name>‹br› string. These will blow up your import every time as Jeffro2 pointed out.

Had some Windows permission errors even when running the console window as administrator when trying to import from a different drive.

Hope this helps you all.

TO the authors: THANK YOU for doing the hard work! Been wanting to get this data into a DB I can work with for years!!!

jbeary Oct 8, 2014 at 5:11 AM 
@Jeffro2

Albeit a little late, thanks for the legwork. I've encountered the same issues as you had and made changes to the DB schema file as you did. I also noted that the 9/2014 'artists' data file still has a problem with the artist associated with ID 2254915. Same issue now also appears on records, 2256630, 2311306, 2638101, 3520148, 732572, 1980438, 1980444, 1987104, 2024252, 2039092, 2045339, 2046110, 2048686, 2063666, 2064094, 2065501, 2065536, 2069484, 2069486, 2079045, 2087753, 2116766, 2133674, 2138940, 2203096, 2205367, 2254877. All say 'needs vote'.

I think this is due to the insertion method that may hide the actual record from being displayed. If you search for the tags <name>‹br› you will come across all of this bad data. I'm going to try reporting this with the hopes that someone at discogs would trim these records out.

Also, I'm only up to using the binary and have loaded up the entire project but have yet to look it over. I think there's a lack of try catch statements because if there's a failure on anything in the console window, probably some instance is still holding a lock on the file. I had a file lock issue even when trying to save to another file. Maybe this hasn't happened to you but I suggest rebooting if there's any failure.

@san3, if you do encounter a unhandled exception import. Do one file at a time and let us know what file is giving you trouble. I had at least no issues with the labels file so start there and work though the records in the artists file removing any where there's a <name>‹br› string. These will blow up your import every time as Jeffro2 pointed out.

Had some Windows permission errors even when running the console window as administrator when trying to import from a different drive.

Hope this helps you all.

TO the authors: THANK YOU for doing the hard work! Been wanting to get this data into a DB I can work with for years!!!

Jeffro2 Jun 2, 2013 at 5:14 AM 
Found another problem. It looks like the Discogs schema changed a little, and that broke the import for the releases XML file. Discogs has added some new fields for release identifiers. Content contained in these tags seems to typically include information on how to properly identify a specific release. etched text in the outgroove of a vinyl record, for example. I don't know how to code in C# (is that what this is written in?), so, I can't fix it, but if you write some code to remove the "identifiers" tags in the XML release dump file, the data should import with no problem (without that information, of course).

Sample of one of the identifiers from the release XML dump file:
</track>
</tracklist>
<identifiers>
<identifier type="Matrix / Runout" value="SONOPRESS R-6448/ WISE001 A" />
<identifier type="Mastering SID Code" value="IFPI L024" />
<identifier type="Mould SID Code" value="IFPI 0779" />
</identifiers>
</release>

Jeffro2 May 31, 2013 at 6:40 PM 
Also had to change the column definitions in the master and release tables so that null values for the 'country' columns were permisable. It appears that the data for country isn;t always in the XML file for releases and master files.

So, the Schema now looks like:

CREATE TABLE `masters` (
`id` int(11) NOT NULL,
`main_release` int(11) NOT NULL,
`title` mediumtext COLLATE utf8_unicode_ci NOT NULL,
`joined_artists` mediumtext COLLATE utf8_unicode_ci NOT NULL,
`country` mediumtext COLLATE utf8_unicode_ci,
`year` int(11) NOT NULL,
`notes` mediumtext COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `releases` (
`id` int(11) NOT NULL,
`master_id` int(11) NOT NULL,
`status` int(11) NOT NULL,
`title` mediumtext COLLATE utf8_unicode_ci NOT NULL,
`joined_artists` mediumtext COLLATE utf8_unicode_ci NOT NULL,
`country` mediumtext COLLATE utf8_unicode_ci,
`releasedate` mediumtext COLLATE utf8_unicode_ci NOT NULL,
`notes` mediumtext COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;



Now, that got me to a successful load of the latest data, all the way up to the last file for releases. I'm getting the following error message when importing teh releases XML file and I don;t know how to fix this:

C:\DatabaseManagerCLI>DatabaseManagerCLI import-releases
Creating index on artists...
Importing data...
Releases: 0.00%

Unhandled Exception: System.FormatException: Unknown Identifier type.
at DiscogsNet.Model.DataReader.ParseReleaseIdentifierType(String identifierType)
at DiscogsNet.Model.DataReader2.ReadReleaseIdentifier()
at DiscogsNet.Model.DataReader2.ReadReleaseIdentifiers()
at DiscogsNet.Model.DataReader2.ReadRelease()
at DiscogsNet.FileReading.ReleaseReader2.Read()
at DiscogsNet.FileReading.ReleaseReader2.<Enumerated>d__1.MoveNext()
at DatabaseManagerCLi.Program.ImportReleases()
at DatabaseManagerCLi.Program.ExecuteCommand(String command)
at DatabaseManagerCLi.Program.Run(String[] args)
at DatabaseManagerCLi.Program.Main(String[] args)



Any idea how to fix this?

Jeffro2 May 31, 2013 at 3:52 PM 
There does seem to be some garbage data in the latest artist dump file discogs_20130523_artists.xml. I removed an entry that, based on some content I won't paste here, looks rather dubious in terms of both content and structure. The entry following artist 'Larry Moon' contains. the following (with 'snip' representing extraneous data:

"<artist><id>2254915</id><name>‹br›GoQQAFFggfgо̶nа̶т̶g___|&lt;&lt; [...]</name><profile>Full title is: &lt;b&gt;‹br›GoQQAFFggfgо̶nа̶т̶g___|&lt;&lt;⌡⎛⎜⎧⎦⎳⎲𝔢𝔞𝔤𝔳𝔬⁡⃟⃝ℑѴℜ℘ℭ℮ℹ⅊↭⇮∏∝∰∳⊍⊍⊗‹/br›&lt;/b&gt;.</profile><data_quality>Correct</data_quality><aliases><name>3.141592653589793238462643</name><name>Alex Ischenko</name>

---snip-----


</artist>"


After I removed that from the file, the entire Artist XML file loaded without error.

Next problem I encountered (and I am still working on) seems to involve some missing 'country' values in the releases and/or masterreleases XML files. Data missing on those fields is a problem because the tables for that data are setup to not allow NULL's. I've set those tables up to allow a null in the country column in both tables that countain a column for 'country.' Will try to report back here if anyone else is interested.

san23 Jan 3, 2013 at 8:12 PM 
Error during import. Database November 2012. Mysql version 5.5
I am getting following error. please help me. I have already tried redownloading the archive for 3 times and it did not help

Unhandled Exception: MySql.Data.MySqlClient.MySqlException: Incorrect string val
ue: '\xF0\x9D\x94\xA2\xF0\x9D...' for column 'profile' at row 1
at MySql.Data.MySqlClient.MySqlStream.ReadPacket()
at MySql.Data.MySqlClient.NativeDriver.GetResult(Int32& affectedRow, Int32& i
nsertedId)
at MySql.Data.MySqlClient.Driver.GetResult(Int32 statementId, Int32& affected
Rows, Int32& insertedId)
at MySql.Data.MySqlClient.Driver.NextResult(Int32 statementId)
at MySql.Data.MySqlClient.MySqlDataReader.NextResult()
at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior
)
at MySql.Data.MySqlClient.MySqlCommand.ExecuteNonQuery()
at DiscogsNet.Database.ArtistInserter.Insert(Artist artist)
at DatabaseManagerCli.Program.ImportArtists()
at DatabaseManagerCli.Program.ExecuteCommand(String command)
at DatabaseManagerCli.Program.Run(String[] args)
at DatabaseManagerCli.Program.Main(String[] args)